Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-11 Thread Amit Kapila
On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan and...@dunslane.net wrote:



 This generally looks good, but I have a couple of questions before I
commit it.

 First, why is the new option for the  BASE_BACKUP command of the
Streaming Replication protcol TAR? It seems rather misleading. Shouldn't
it be something like TABLESPACEMAP?


The reason to keep new option's name as TAR was that tablespace_map
was generated for that format type, but I agree with you that something
like TABLESPACEMAP suits better, so I have changed it to
TABLESPACE_MAP.  Putting '_' in name makes it somewhat consistent
with other names and filename it generates with this new option.


 Second, these lines in xlog.c seem wrong:

 else if ((ch == '\n' || ch == '\r')  prev_ch == '\\')
 str[i-1] = '\n';

 It looks to me like we should be putting ch in the string, not
arbitrarily transforming \r into \n.


You are right, I have changed it as per your suggestion.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_basebackup_to_include_symlink_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Etsuro Fujita
On 2015/05/11 8:50, Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 [ EvalPlanQual-v6.patch ]
 
 I've started to study this in a little more detail, and I'm not terribly
 happy with some of the API decisions in it.

Thanks for taking the time to review the patch!

 In particular, I find the addition of void *fdw_state to ExecRowMark
 to be pretty questionable.  That does not seem like a good place to keep
 random state.  (I realize that WHERE CURRENT OF keeps some state in
 ExecRowMark, but that's a crock not something to emulate.)  ISTM that in
 most scenarios, the state that LockForeignRow/FetchForeignRow would like
 to get at is probably the FDW state associated with the ForeignScanState
 that the tuple came from.  Which this API doesn't help with particularly.
 I wonder if we should instead add a ScanState* field and expect the
 core code to set that up (ExecOpenScanRelation could do it with minor
 API changes...).

Sorry, I don't understand clearly what you mean, but that (the idea of
expecting the core to set it up) sounds inconsistent with your comment
on the earlier version of the API BeginForeignFetch [1].

 I'm also a bit tempted to pass the TIDs to LockForeignRow and
 FetchForeignRow as Datums not ItemPointers.  We have the Datum format
 available already at the call sites, so this is free as far as the core
 code is concerned, and would only cost another line or so for the FDWs.
 This is by no means sufficient to allow FDWs to use some other type than
 tid for row identifiers; but it would be a down payment on that problem,
 and at least would avoid nailing the rowids-are-tids assumption into yet
 another global API.

That is a good idea.

 Also, as I mentioned, I'd be a whole lot happier if we had a way to test
 this...

Attached is a postgres_fdw patch that I used for the testing.  If you
try it, edit postgresGetForeignRowMarkType as necessary.  I have to
confess that I did the testing only in the normal conditions by the patch.

Sorry for the delay.  I took a vacation until yesterday.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/14504.1428446...@sss.pgh.pa.us
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 88,93  typedef struct PgFdwRelationInfo
--- 88,95 
   *
   * 1) SELECT statement text to be sent to the remote server
   * 2) Integer list of attribute numbers retrieved by the SELECT
+  * 3) SELECT statement text to be sent to the remote server
+  * 4) Integer list of attribute numbers retrieved by the SELECT
   *
   * These items are indexed with the enum FdwScanPrivateIndex, so an item
   * can be fetched with list_nth().  For example, to get the SELECT statement:
***
*** 98,104  enum FdwScanPrivateIndex
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs
  };
  
  /*
--- 100,110 
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs,
! 	/* SQL statement to execute remotely (as a String node) */
! 	FdwScanPrivateSelectSql2,
! 	/* Integer list of attribute numbers retrieved by SELECT */
! 	FdwScanPrivateRetrievedAttrs2
  };
  
  /*
***
*** 186,191  typedef struct PgFdwModifyState
--- 192,223 
  } PgFdwModifyState;
  
  /*
+  * Execution state for fetching/locking foreign rows.
+  */
+ typedef struct PgFdwFetchState
+ {
+ 	Relation	rel;			/* relcache entry for the foreign table */
+ 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
+ 
+ 	/* for remote query execution */
+ 	PGconn	   *conn;			/* connection for the fetch */
+ 	char	   *p_name;			/* name of prepared statement, if created */
+ 
+ 	/* extracted fdw_private data */
+ 	char	   *query;			/* text of SELECT command */
+ 	List	   *retrieved_attrs;	/* attr numbers retrieved by SELECT */
+ 
+ 	/* info about parameters for prepared statement */
+ 	int			p_nums;			/* number of parameters to transmit */
+ 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
+ 
+ 	HeapTuple	locked_tuple;
+ 
+ 	/* working memory context */
+ 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ } PgFdwFetchState;
+ 
+ /*
   * Workspace for analyzing a foreign table.
   */
  typedef struct PgFdwAnalyzeState
***
*** 276,281  static TupleTableSlot *postgresExecForeignDelete(EState *estate,
--- 308,320 
  static void postgresEndForeignModify(EState *estate,
  		 ResultRelInfo *resultRelInfo);
  static int	postgresIsForeignRelUpdatable(Relation rel);
+ static RowMarkType postgresGetForeignRowMarkType(LockClauseStrength strength);
+ static bool postgresLockForeignRow(EState *estate,
+    ExecRowMark *erm,
+    ItemPointer tupleid);
+ static HeapTuple 

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-11 Thread Robert Haas
On Sun, May 10, 2015 at 11:07 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-05-10 22:51:33 -0400, Robert Haas wrote:
  And there's definitely some things
  around that pretty much only still exist because changing them would
  break too much stuff.

 Such as what?

 Without even thinking about it:
 * linitial vs lfirst vs lnext. That thing still induces an impedance
   mismatch when reading code for me, and I believe a good number of
   other people.
 * Two 'string buffer' APIs with essentially only minor differences.
 * A whole bunch of libpq APIs. Admittedly that's a bit more exposed than
   lots of backend only things.
 * The whole V0 calling convention that makes it so much easier to get
   odd crashes.

 Admittedly that's all I could come up without having to think. But I do
 vaguely remember a lot of things we did not do because of bwcompat
 concerns.

I see your point, but I don't think it really detracts from mine.  The
fact that we have a few inconsistently-named list functions is not
preventing any core development project that would otherwise get
completed to instead not get completed.  Nor is any of that other
stuff, except maybe the libpq API, but that's a lot (not just a bit)
more exposed.

Also, I'd actually be in favor of looking for a way to identify the
StringInfo and PQexpBuffer stuff - and of partially deprecating the V0
calling convention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unaddressable bytes in BRIN

2015-05-11 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 What I think this means is that during an index build
 brin_minmax_add_value() called numeric_lt() which detoasted one of its
 input values; later, brin_doinsert() inserts a tuple containing the
 value, but it tries to use more bytes than were allocated.  I haven't
 had time to actually study what is going on here, but wanted to archive
 this publicly.  (Value detoasting evidently plays a role here, but I
 don't know how.)

I went ahead and added this to the 9.5 open items list.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-11 Thread Abhijit Menon-Sen
Hi Andres.

I've attached an updated patch for pgstatbloat, as well as a patch to
replace two uses of BuildTupleFromCStrings() elsewhere in pgstattuple.

I've made the changes you mentioned in your earlier mail, except that I
have not changed the name pending further suggestions about what would
be the best name.

Also:

At 2015-05-09 15:36:49 +0530, a...@2ndquadrant.com wrote:

 At 2015-05-09 02:20:51 +0200, and...@anarazel.de wrote:
 
  I haven't checked, but I'm not sure that it's safe/meaningful to
  call PageGetHeapFreeSpace() on a new page.
 
 OK, I'll check and fix if necessary.

You're right, PageGetHeapFreeSpace() isn't safe on a new page. I've
added a guard to that call in the attached patch, but I'm not sure
that's the right thing to do.

Should I copy the orphaned new-page handling from lazy_scan_heap? What
about for empty pages? Neither feels like a very good thing to do, but
then neither does skipping the new page silently. Should I count the
space it would have free if it were initialised, but leave the page
alone for VACUUM to deal with? Or just leave it as it is?

-- Abhijit
From 421267bba8394255feed8f9b9424d25d0be9f979 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Mon, 11 May 2015 15:54:48 +0530
Subject: Make pgstattuple use heap_form_tuple instead of
 BuildTupleFromCStrings

---
 contrib/pgstattuple/pgstatindex.c | 43 ++---
 contrib/pgstattuple/pgstattuple.c | 45 ++-
 2 files changed, 37 insertions(+), 51 deletions(-)

diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index a2ea5d7..608f729 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -257,7 +257,8 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 	{
 		TupleDesc	tupleDesc;
 		int			j;
-		char	   *values[10];
+		Datum		values[10];
+		bool		nulls[10];
 		HeapTuple	tuple;
 
 		/* Build a tuple descriptor for our result type */
@@ -265,33 +266,31 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 			elog(ERROR, return type must be a row type);
 
 		j = 0;
-		values[j++] = psprintf(%d, indexStat.version);
-		values[j++] = psprintf(%d, indexStat.level);
-		values[j++] = psprintf(INT64_FORMAT,
-			   (indexStat.root_pages +
-indexStat.leaf_pages +
-indexStat.internal_pages +
-indexStat.deleted_pages +
-indexStat.empty_pages) * BLCKSZ);
-		values[j++] = psprintf(%u, indexStat.root_blkno);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.internal_pages);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.leaf_pages);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.empty_pages);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.deleted_pages);
+		values[j++] = Int32GetDatum(indexStat.version);
+		values[j++] = Int32GetDatum(indexStat.level);
+		values[j++] = Int64GetDatum((indexStat.root_pages +
+	 indexStat.leaf_pages +
+	 indexStat.internal_pages +
+	 indexStat.deleted_pages +
+	 indexStat.empty_pages) * BLCKSZ);
+		values[j++] = Int32GetDatum(indexStat.root_blkno);
+		values[j++] = Int32GetDatum(indexStat.internal_pages);
+		values[j++] = Int32GetDatum(indexStat.leaf_pages);
+		values[j++] = Int32GetDatum(indexStat.empty_pages);
+		values[j++] = Int32GetDatum(indexStat.deleted_pages);
 		if (indexStat.max_avail  0)
-			values[j++] = psprintf(%.2f,
-   100.0 - (double) indexStat.free_space / (double) indexStat.max_avail * 100.0);
+			values[j++] = Float8GetDatum(100.0 - (double) indexStat.free_space / (double) indexStat.max_avail * 100.0);
 		else
-			values[j++] = pstrdup(NaN);
+			values[j++] = CStringGetDatum(NaN);
 		if (indexStat.leaf_pages  0)
-			values[j++] = psprintf(%.2f,
-   (double) indexStat.fragments / (double) indexStat.leaf_pages * 100.0);
+			values[j++] = Float8GetDatum((double) indexStat.fragments / (double) indexStat.leaf_pages * 100.0);
 		else
-			values[j++] = pstrdup(NaN);
+			values[j++] = CStringGetDatum(NaN);
 
-		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
-	   values);
+		for (j = 0; j  10; j++)
+			nulls[j] = false;
 
+		tuple = heap_form_tuple(tupleDesc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
 	}
 
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index c3a8b1d..552f188 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -85,28 +85,20 @@ static Datum
 build_pgstattuple_type(pgstattuple_type *stat, FunctionCallInfo fcinfo)
 {
 #define NCOLUMNS	9
-#define NCHARS		32
 
 	HeapTuple	tuple;
-	char	   *values[NCOLUMNS];
-	char		values_buf[NCOLUMNS][NCHARS];
+	Datum		values[NCOLUMNS];
+	bool		nulls[NCOLUMNS];
 	int			i;
 	double		tuple_percent;
 	double		dead_tuple_percent;
 	double		free_percent;	/* free/reusable space in % */
 	TupleDesc	tupdesc;
-	AttInMetadata *attinmeta;
 
 	/* Build a tuple descriptor for our result type 

Re: [HACKERS] multixacts woes

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 12:56 AM, Noah Misch n...@leadboat.com wrote:
 On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote:
 On Sun, May 10, 2015 at 1:40 PM, Noah Misch n...@leadboat.com wrote:
  I don't know whether this deserves prompt remediation, but if it does, I 
  would
  look no further than the hard-coded 25% figure.  We permit users to operate
  close to XID wraparound design limits.  GUC maximums force an 
  anti-wraparound
  vacuum at no later than 93.1% of design capacity.  XID assignment warns at
  99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
  pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
  50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
  bulkiest mandatory cushion yet, an anti-wraparound vacuum when
  pg_multixact/members is just 25% full.

 That's certainly one possible approach.  I had discounted it because
 you can't really get more than a small multiple out of it, but getting
 2-3x more room might indeed be enough to help some people quite a bit.
 Just raising the threshold from 25% to say 40% would buy back a
 healthy amount.

 Right.  It's fair to assume that the new VACUUM burden would be discountable
 at a 90+% threshold, because the installations that could possibly find it
 expensive are precisely those experiencing corruption today.  These reports
 took eighteen months to appear, whereas some corruption originating in commit
 0ac5ad5 saw reports within three months.  Therefore, sites burning
 pg_multixact/members proportionally faster than both pg_multixact/offsets and
 XIDs must be unusual.  Bottom line: if we do need to reduce VACUUM burden
 caused by the commits you cited upthread, we almost certainly don't need more
 than a 4x improvement.

I looked into the approach of adding a GUC called
autovacuum_multixact_freeze_max_members to set the threshold.  I
thought to declare it this way:

{
+   {autovacuum_multixact_freeze_max_members,
PGC_POSTMASTER, AUTOVACUUM,
+   gettext_noop(# of multixact members at which
autovacuum is forced to prevent multixact member wraparound.),
+   NULL
+   },
+   autovacuum_multixact_freeze_max_members,
+   20, 1000, 40,
+   NULL, NULL, NULL
+   },

Regrettably, I think that's not going to work, because 40
overflows int.  We will evidently need to denote this GUC in some
other units, unless we want to introduce config_int64.

Given your concerns, and the need to get a fix for this out the door
quickly, what I'm inclined to do for the present is go bump the
threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
changing anything else.  Your analysis shows that this is more in line
with the existing policy for multixact IDs than what I did, and it
will reduce the threat of frequent wraparound scans.  Now, it will
also increase the chances of somebody hitting the wall before
autovacuum can bail them out.  But maybe not that much.  If we need
75% of the multixact member space to complete one cycle of
anti-wraparound vacuums, we're actually very close to the point where
the system just cannot work.  If that's one big table, we're done.

Also, if somebody does have a workload where the auto-clamping doesn't
provide them with enough headroom, they can still improve things by
reducing autovacuum_multixact_freeze_max_age to a value less than the
value to which we're auto-clamping it.  If they need an effective
value of less than 10 million they are out of luck, but if that is the
case then there is a good chance that they are hosed anyway - an
anti-wraparound vacuum every 10 million multixacts sounds awfully
painful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
All,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  if (lockmode == AccessShareLock)
  aclresult = pg_class_aclcheck(reloid, GetUserId(),
ACL_SELECT);
  +   else if (lockmode == RowExclusiveLock)
  +   aclresult = pg_class_aclcheck(reloid, GetUserId(),
  +ACL_INSERT | ACL_UPDATE | ACL_DELETE | 
  ACL_TRUNCATE);
  else
  aclresult = pg_class_aclcheck(reloid, GetUserId(),
ACL_UPDATE | ACL_DELETE | 
  ACL_TRUNCATE);
 
 Perhaps it would be better to refactor with a local variable for the
 aclmask and just one instance of the pg_class_aclcheck call.  Also, I'm
 pretty sure that the documentation work needed is more extensive
 than the actual patch ;-).  Otherwise, I don't see a problem with this.

Now for a blast from the past...  This came up again on IRC recently and
reminded me that I ran into the same issue a couple years back.  Updated
patch includes the refactoring suggested and includes documentation.

Not going to be back-patched, as discussed with Robert.

Barring objections, I'll push this later today.

Thanks!

Stephen
From d2b0fbc9fd8c0783f870fef3c901f7f8891c6e90 Mon Sep 17 00:00:00 2001
From: Stephen Frost sfr...@snowman.net
Date: Mon, 11 May 2015 09:14:49 -0400
Subject: [PATCH] Allow LOCK TABLE .. ROW EXCLUSIVE MODE with INSERT

INSERT acquires RowExclusiveLock during normal operation and therefore
it makes sense to allow LOCK TABLE .. ROW EXCLUSIVE MODE to be executed
by users who have INSERT rights on a table (even if they don't have
UPDATE or DELETE).

Not back-patching this as it's a behavior change which, strictly
speaking, loosens security restrictions.

Per discussion with Tom and Robert (circa 2013).
---
 doc/src/sgml/ref/lock.sgml  |  8 +---
 src/backend/commands/lockcmds.c | 12 
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index 913afe7..b946eab 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -161,9 +161,11 @@ LOCK [ TABLE ] [ ONLY ] replaceable class=PARAMETERname/replaceable [ * ]
 
para
 literalLOCK TABLE ... IN ACCESS SHARE MODE/ requires literalSELECT/
-privileges on the target table.  All other forms of commandLOCK/
-require table-level literalUPDATE/, literalDELETE/, or
-literalTRUNCATE/ privileges.
+privileges on the target table.  literalLOCK TABLE ... IN ROW EXCLUSIVE
+MODE/ requires literalINSERT/, literalUPDATE/, literalDELETE/,
+or literalTRUNCATE/ privileges on the target table.  All other forms of
+commandLOCK/ require table-level literalUPDATE/, literalDELETE/,
+or literalTRUNCATE/ privileges.
/para
 
para
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index bdec2ff..a167082 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -169,13 +169,17 @@ static AclResult
 LockTableAclCheck(Oid reloid, LOCKMODE lockmode)
 {
 	AclResult	aclresult;
+	AclMode		aclmask;
 
 	/* Verify adequate privilege */
 	if (lockmode == AccessShareLock)
-		aclresult = pg_class_aclcheck(reloid, GetUserId(),
-	  ACL_SELECT);
+		aclmask = ACL_SELECT;
+	else if (lockmode == RowExclusiveLock)
+		aclmask = ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE;
 	else
-		aclresult = pg_class_aclcheck(reloid, GetUserId(),
-	  ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE);
+		aclmask = ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE;
+
+	aclresult = pg_class_aclcheck(reloid, GetUserId(), aclmask);
+
 	return aclresult;
 }
-- 
1.9.1



signature.asc
Description: Digital signature


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 On 2015/05/11 8:50, Tom Lane wrote:
 In particular, I find the addition of void *fdw_state to ExecRowMark
 to be pretty questionable.  That does not seem like a good place to keep
 random state.  (I realize that WHERE CURRENT OF keeps some state in
 ExecRowMark, but that's a crock not something to emulate.)  ISTM that in
 most scenarios, the state that LockForeignRow/FetchForeignRow would like
 to get at is probably the FDW state associated with the ForeignScanState
 that the tuple came from.  Which this API doesn't help with particularly.
 I wonder if we should instead add a ScanState* field and expect the
 core code to set that up (ExecOpenScanRelation could do it with minor
 API changes...).

 Sorry, I don't understand clearly what you mean, but that (the idea of
 expecting the core to set it up) sounds inconsistent with your comment
 on the earlier version of the API BeginForeignFetch [1].

Well, the other way that it could work would be for the FDW's
BeginForeignScan routine to search for a relevant ExecRowMark entry
(which, per that previous discussion, it'd likely need to do anyway) and
then plug a back-link to the ForeignScanState into the ExecRowMark.
But I don't see any very good reason to make every FDW that's concerned
with this do that, rather than doing it once in the core code.  I'm also
thinking that having a link to the source scan node there might be useful
someday for regular tables as well as FDWs.

 Also, as I mentioned, I'd be a whole lot happier if we had a way to test
 this...

 Attached is a postgres_fdw patch that I used for the testing.  If you
 try it, edit postgresGetForeignRowMarkType as necessary.  I have to
 confess that I did the testing only in the normal conditions by the patch.

Thanks, this is helpful.  However, it pretty much proves my point about
wanting the back-link --- it seems like init_foreign_fetch_state, for
example, is uselessly repeating a lot of the setup already done for
the scan node itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tzdata and 9.4.2, etc.

2015-05-11 Thread Thom Brown
On 4 May 2015 at 22:42, David Fetter da...@fetter.org wrote:
 Folks,

 We are now three tzdata changes behind.  There are bugs in pg_dump
 which create real restore errors for people using PostGIS, one of our
 most popular extensions.

 Can we please wrap and ship 9.4.2, etc., and do it soon?

+1

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Noah Misch
On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote:
 Given your concerns, and the need to get a fix for this out the door
 quickly, what I'm inclined to do for the present is go bump the
 threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
 changing anything else.

+1

 Your analysis shows that this is more in line
 with the existing policy for multixact IDs than what I did, and it
 will reduce the threat of frequent wraparound scans.  Now, it will
 also increase the chances of somebody hitting the wall before
 autovacuum can bail them out.  But maybe not that much.  If we need
 75% of the multixact member space to complete one cycle of
 anti-wraparound vacuums, we're actually very close to the point where
 the system just cannot work.  If that's one big table, we're done.

Agreed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tzdata and 9.4.2, etc.

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 10:13 AM, Thom Brown t...@linux.com wrote:
 We are now three tzdata changes behind.  There are bugs in pg_dump
 which create real restore errors for people using PostGIS, one of our
 most popular extensions.

 Can we please wrap and ship 9.4.2, etc., and do it soon?

 +1

I'm more concerned about the latest round of multixact bugs than I am
about the tzdata, but I think that's at a point now where we can go
ahead and schedule a release.  And I think we should.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Tom Lane
I wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 On 2015/05/11 8:50, Tom Lane wrote:
 I wonder if we should instead add a ScanState* field and expect the
 core code to set that up (ExecOpenScanRelation could do it with minor
 API changes...).

 Sorry, I don't understand clearly what you mean, but that (the idea of
 expecting the core to set it up) sounds inconsistent with your comment
 on the earlier version of the API BeginForeignFetch [1].

 Well, the other way that it could work would be for the FDW's
 BeginForeignScan routine to search for a relevant ExecRowMark entry
 (which, per that previous discussion, it'd likely need to do anyway) and
 then plug a back-link to the ForeignScanState into the ExecRowMark.
 But I don't see any very good reason to make every FDW that's concerned
 with this do that, rather than doing it once in the core code.  I'm also
 thinking that having a link to the source scan node there might be useful
 someday for regular tables as well as FDWs.

And on the third hand ... in view of the custom/foreign join pushdown
stuff that just went in, it would be a mistake to assume that there
necessarily is a distinct source scan node associated with each
ExecRowMark.  The FDW can presumably find all the ExecRowMarks associated
with the rels that a join ForeignScan is scanning, but we can't assume
that ExecOpenScanRelation will be able to set up those links, and the FDW
might not want a simple link to the join scan node anyway.  So it's
probably best to leave it as an unspecified void* instead of trying to
nail down the meaning more precisely.

I still don't much like calling it fdw_state though, because I don't
think it should be documented as only for the use of FDWs.  (Custom scans
might have a use for a pointer field here too, for example.)  Maybe just
call it void *extra and document it as being for the use of whatever
plan node is sourcing the relation's tuples.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres GSSAPI Encryption

2015-05-11 Thread Robbie Harwood
Stephen Frost sfr...@snowman.net writes:

 Robbie,

 * Robbie Harwood (rharw...@redhat.com) wrote:

 We'd I think also want a new kind of HBA entry (probably something along
 the lines of `hostgss` to contrast with `hostssl`), but I'm not sure
 what we'd want to do for the counterpart of `hostnossl` (`hostnogss`?
 But then do we need `hostnogssnossl`?  Is this even a useful setting to
 have?), so that will probably require broader discussion.

 Are you intending to support GSSAPI encryption *without* using the
 GSSAPI authentication mechanism?  If not, maybe we can come up with a
 way to have an option to the GSSAPI auth mechanism that behaves the way
 we want for GSSAPI encryption.  Perhaps something like:

 encryption = (none | request | require)

 If you need an option for it to still be able to fall-thru to the next
 pg_hba line, that'd be more difficult, but is that really a sensible
 use-case?

That's a good point.  I don't see GSSAPI encryption without GSSAPI
authentication as a particularly compelling use case, so a setting like
that (with default presumably to request for backward compatibility)
seems perfect.

As for fall-through on failure, I don't really know what's reasonable
here.  My understanding of the way it works currently suggests that it
would be *expected* to fall-through based on the way other things
behave, though it's certainly less effort on my part if it does not.

 Finally, I think there are a couple different ways the protocol could
 look.  I'd ideally like to opportunistically encrypt all
 GSSAPI-authenticated connections and fallback to non-encrypted when the
 other end doesn't support it (possibly dropping the connection if this
 causes it to not match HBA), but I'm not sure if this will work with the
 existing code.  A (less-nice) option could be to add a new
 authentication method for gss-encryption, which feels aesthetically
 misplaced.  The approach used for SSL today could probably be adopted as
 a third approach, but I don't really see a gain from doing it this way
 since encryption happens after authentication (opposite of SSL) in
 GSSAPI.

 I'd definitely like to see us opportunistically encrypt all GSSAPI
 authenticated connections also.  The main case to consider is how to
 support migrating users who are currently using GSSAPI + SSL to GSSAPI
 auth+encryption, and how to support them if they want to continue to use
 GSSAPI just for user auth and use SSL for encryption.

So if we go with the encryption option, we might not need a specific
entry for GSS hosts.  For the SSL encryption/GSS auth case, we'd want it
to work the way it does now where you specify hostssl and then gss
as the method.  Then for the GSS for auth and encryption, one would use
a normal host entry, then specify gss as the method, and then set
encryption=require.

One consequence of this would be that you can do double encryption by
specifying hostssl, then gss with encrypt = require.  I don't
think this is something more desirable than just one or the other, but
unless it's actually a problem (and I don't think it is) to have around,
I think the setup would work nicely.  We could also default encrypt to
none when hostssl is specified if it is a problem.

Thanks!
--Robbie


signature.asc
Description: PGP signature


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Michael Paquier
On Mon, May 11, 2015 at 10:32 PM, Stephen Frost sfr...@snowman.net wrote:
 Now for a blast from the past...  This came up again on IRC recently and
 reminded me that I ran into the same issue a couple years back.  Updated
 patch includes the refactoring suggested and includes documentation.

 Barring objections, I'll push this later today.

Small suggestion: a test case in src/test/isolation?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
 On Mon, May 11, 2015 at 10:32 PM, Stephen Frost sfr...@snowman.net wrote:
  Now for a blast from the past...  This came up again on IRC recently and
  reminded me that I ran into the same issue a couple years back.  Updated
  patch includes the refactoring suggested and includes documentation.
 
  Barring objections, I'll push this later today.
 
 Small suggestion: a test case in src/test/isolation?

This is entirely a permissions-related change and src/test/isolation is
for testing concurrent behavior, not about testing permissions.

I'm not saying that we shouldn't have more tests there, but it'd not
be appropriate for this particular patch.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] initdb start server recommendation

2015-05-11 Thread Bruce Momjian
On Fri, May  1, 2015 at 10:14:13AM -0400, Bruce Momjian wrote:
 Currently initdb outputs suggested text on starting the server:
 
   Success. You can now start the database server using:
   
   /u/pgsql/bin/postgres -D /u/pgsql/data
   or
   /u/pgsql/bin/pg_ctl -D /u/pgsql/data -l logfile start
 
 I am now thinking pg_ctl should be recommended first.  At the time this
 text was written pg_ctl was new.

I have applied a patch to initdb to only recommend the pg_ctl method.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-11 Thread Abhijit Menon-Sen
At 2015-05-11 19:15:47 +0200, and...@anarazel.de wrote:

 TBH, I'd rather not touch unrelated things right now. We're pretty
 badly behind...

OK. That patch is independent; just ignore it.

 I don't really care how it's named, as long as it makes clear that
 it's not an exact measurement.

Not having heard any better suggestions, I picked pgstatapprox as a
compromise between length and familiarity/consistency with pgstattuple.

  Should I count the space it would have free if it were initialised,
  but leave the page alone for VACUUM to deal with?

And this is what the attached patch does.

I also cleaned up a few things that I didn't like but had left alone to
make the code look similar to pgstattuple. In particular, build_tuple()
now does nothing but build a tuple from values calculated earlier in
pgstatapprox_heap().

Thank you.

-- Abhijit

P.S. What, if anything, should be done about the complicated and likely
not very useful skip-only-min#-blocks logic in lazy_scan_heap?
From 0a99fc61d36e3a3ff4003db95e5c299a5f07a850 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Fri, 26 Dec 2014 12:37:13 +0530
Subject: Add pgstatapprox to pgstattuple

---
 contrib/pgstattuple/Makefile   |   4 +-
 contrib/pgstattuple/pgstatapprox.c | 277 +
 contrib/pgstattuple/pgstattuple--1.2--1.3.sql  |  18 ++
 .../{pgstattuple--1.2.sql = pgstattuple--1.3.sql} |  18 +-
 contrib/pgstattuple/pgstattuple.control|   2 +-
 doc/src/sgml/pgstattuple.sgml  | 135 ++
 6 files changed, 450 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pgstattuple/pgstatapprox.c
 create mode 100644 contrib/pgstattuple/pgstattuple--1.2--1.3.sql
 rename contrib/pgstattuple/{pgstattuple--1.2.sql = pgstattuple--1.3.sql} (72%)

diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile
index 862585c..6083dab 100644
--- a/contrib/pgstattuple/Makefile
+++ b/contrib/pgstattuple/Makefile
@@ -1,10 +1,10 @@
 # contrib/pgstattuple/Makefile
 
 MODULE_big	= pgstattuple
-OBJS		= pgstattuple.o pgstatindex.o $(WIN32RES)
+OBJS		= pgstattuple.o pgstatindex.o pgstatapprox.o $(WIN32RES)
 
 EXTENSION = pgstattuple
-DATA = pgstattuple--1.2.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
+DATA = pgstattuple--1.3.sql pgstattuple--1.2--1.3.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
 PGFILEDESC = pgstattuple - tuple-level statistics
 
 REGRESS = pgstattuple
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
new file mode 100644
index 000..46f5bc0
--- /dev/null
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -0,0 +1,277 @@
+/*
+ * contrib/pgstattuple/pgstatapprox.c
+ * Fast bloat estimation functions
+ */
+
+#include postgres.h
+
+#include access/visibilitymap.h
+#include access/transam.h
+#include access/xact.h
+#include access/multixact.h
+#include access/htup_details.h
+#include catalog/namespace.h
+#include funcapi.h
+#include miscadmin.h
+#include storage/bufmgr.h
+#include storage/freespace.h
+#include storage/procarray.h
+#include storage/lmgr.h
+#include utils/builtins.h
+#include utils/tqual.h
+#include commands/vacuum.h
+
+PG_FUNCTION_INFO_V1(pgstatapprox);
+
+typedef struct output_type
+{
+	uint64		table_len;
+	uint64		scanned_percent;
+	uint64		tuple_count;
+	uint64		tuple_len;
+	double		tuple_percent;
+	uint64		dead_tuple_count;
+	uint64		dead_tuple_len;
+	double		dead_tuple_percent;
+	uint64		free_space;
+	double		free_percent;
+} output_type;
+
+#define NUM_OUTPUT_COLUMNS 10
+
+/*
+ * This function takes an already open relation and scans its pages,
+ * skipping those that have the corresponding visibility map bit set.
+ * For pages we skip, we find the free space from the free space map
+ * and approximate tuple_len on that basis. For the others, we count
+ * the exact number of dead tuples etc.
+ *
+ * This scan is loosely based on vacuumlazy.c:lazy_scan_heap(), but
+ * we do not try to avoid skipping single pages.
+ */
+
+static void
+pgstatapprox_heap(Relation rel, output_type *stat)
+{
+	BlockNumber scanned,
+nblocks,
+blkno;
+	Buffer		vmbuffer = InvalidBuffer;
+	BufferAccessStrategy bstrategy;
+	TransactionId OldestXmin;
+	uint64		misc_count = 0;
+
+	OldestXmin = GetOldestXmin(rel, true);
+	bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	nblocks = RelationGetNumberOfBlocks(rel);
+	scanned = 0;
+
+	for (blkno = 0; blkno  nblocks; blkno++)
+	{
+		Buffer		buf;
+		Page		page;
+		OffsetNumber offnum,
+	maxoff;
+		Size		freespace;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * If the page has only visible tuples, then we can find out the
+		 * free space from the FSM and move on.
+		 */
+
+		if (visibilitymap_test(rel, blkno, vmbuffer))
+		{
+			freespace = GetRecordedFreeSpace(rel, blkno);
+			stat-tuple_len += BLCKSZ - freespace;
+			stat-free_space += freespace;
+			continue;
+		}
+
+		buf 

[HACKERS] Multixid hindsight design

2015-05-11 Thread Heikki Linnakangas
I'd like to discuss how we should've implemented the infamous 9.3 
multixid/row-locking stuff, and perhaps still should in 9.6. Hindsight 
is always 20/20 - I'll readily admit that I didn't understand the 
problems until well after the release - so this isn't meant to bash 
what's been done. Rather, let's think of the future.


The main problem with the infamous multixid changes was that it made 
pg_multixact a permanent, critical, piece of data. Without it, you 
cannot decipher whether some rows have been deleted or not. The 9.3 
changes uncovered pre-existing issues with vacuuming and wraparound, but 
the fact that multixids are now critical turned those the otherwise 
relatively harmless bugs into data loss.


We have pg_clog, which is a similar critical data structure. That's a 
pain too - you need VACUUM and you can't easily move tables from one 
cluster to another for example - but we've learned to live with it. But 
we certainly don't need any more such data structures.


So the lesson here is that having a permanent pg_multixact is not nice, 
and we should get rid of it. Here's how to do that:



Looking at the tuple header, the CID and CTID fields are only needed, 
when either xmin or xmax is running. Almost: in a HOT-updated tuple, 
CTID is required even after xmax has committed, but since it's a HOT 
update, the new tuple is always on the same page so you only need the 
offsetnumber part. That leaves us with 8 bytes that are always available 
for storing ephemeral information. By ephemeral, I mean that it is 
only needed when xmin or xmax is in-progress. After that, e.g. after a 
shutdown, it's never looked at.


Let's add a new SLRU, called Tuple Ephemeral Data (TED). It is addressed 
by a 64-bit pointer, which means that it never wraps around. That 64-bit 
pointer is stored in the tuple header, in those 8 ephemeral bytes 
currently used for CID and CTID. Whenever a tuple is deleted/updated and 
locked at the same time, a TED entry is created for it, in the new SLRU, 
and the pointer to the entry is put on the tuple. In the TED entry, we 
can use as many bytes as we need to store the ephemeral data. It would 
include the CID (or possibly both CMIN and CMAX separately, now that we 
have the space), CTID, and the locking XIDs. The list of locking XIDs 
could be stored there directly, replacing multixids completely, or we 
could store a multixid there, and use the current pg_multixact system to 
decode them. Or we could store the multixact offset in the TED, 
replacing the multixact offset SLRU, but keep the multixact member SLRU 
as is.


The XMAX stored on the tuple header would always be a real transaction 
ID, not a multixid. Hence locked-only tuples don't need to be frozen 
afterwards.


The beauty of this would be that the TED entries can be zapped at 
restart, just like pg_subtrans, and pg_multixact before 9.3. It doesn't 
need to be WAL-logged, and we are free to change its on-disk layout even 
in a minor release.


Further optimizations are possible. If the TED entry fits in 8 bytes, it 
can be stored directly in the tuple header. Like today, if a tuple is 
locked but not deleted/updated, you only need to store the locker XID, 
and you can store the locking XID directly on the tuple. Or if it's 
deleted and locked, CTID is not needed, only CID and locker XID, so you 
can store those direcly on the tuple. Plus some spare bits to indicate 
what is stored. And if the XMIN is older than global-xmin, you could 
also steal the XMIN field for storing TED data, making it possible to 
store 12 bytes directly in the tuple header. Plus some spare bits again 
to indicate that you've done that.



Now, given where we are, how do we get there? Upgrade is a pain, because 
even if we no longer generate any new multixids, we'll have to be able 
to decode them after pg_upgrade. Perhaps condense pg_multixact into a 
simpler pg_clog-style bitmap at pg_upgrade, to make it small and simple 
to read, but it would nevertheless be a fair amount of code just to deal 
with pg_upgraded databases.


I think this is worth doing, even after we've fixed all the acute 
multixid bugs, because this would be more robust in the long run. It 
would also remove the need to do anti-wraparound multixid vacuums, and 
the newly-added tuning knobs related to that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Heikki Linnakangas

On 05/12/2015 12:00 AM, Bruce Momjian wrote:

Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
allow primary-key-column-only locks.  1.7 years later, we are still
dealing with bugs related to this feature.  Obviously, something is
wrong.

There were many 9.3 minor releases containing multi-xacts fixes, and
these fixes have extended into 9.4.  After the first few bug-fix
releases, I questioned whether we needed to revert or rework the
feature, but got no positive response.  Only in the past few weeks have
we got additional people involved.


The revert or rework ship had already sailed at that point. I don't 
think we had much choice than just soldier through the bugs after the 
release.



I think we now know that our inaction didn't serve us well.  The
question is how can we identify chronic problems and get resources
involved sooner.  I feel we have been asleep at the wheel to some
extent on this.


Yeah. I think the problem was that no-one realized that this was a 
significant change to the on-disk format. It was deceptively 
backwards-compatible. When it comes to permanent on-disk structures, we 
should all be more vigilant in the review.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 02:00 PM, Bruce Momjian wrote:


I think we now know that our inaction didn't serve us well.  The
question is how can we identify chronic problems and get resources
involved sooner.  I feel we have been asleep at the wheel to some
extent on this.


Here are some options

Slow down the release cycle
	The shortness of the release cycle puts a preference on adding features 
versus providing a more mature outcome.


or

Increase the release cycle
	Moving to a Ubuntu style release cycle would allow us to have a window 
to scratch the itch but with the eventual (and known) release of 
something that is LTS.


JD  



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
On Mon, May 11, 2015 at 02:11:48PM -0700, Joshua Drake wrote:
 
 On 05/11/2015 02:00 PM, Bruce Momjian wrote:
 
 I think we now know that our inaction didn't serve us well.  The
 question is how can we identify chronic problems and get resources
 involved sooner.  I feel we have been asleep at the wheel to some
 extent on this.
 
 Here are some options
 
 Slow down the release cycle
   The shortness of the release cycle puts a preference on adding
 features versus providing a more mature outcome.
 
 or
 
 Increase the release cycle
   Moving to a Ubuntu style release cycle would allow us to have a
 window to scratch the itch but with the eventual (and known) release
 of something that is LTS.

The releases themselves are not the problem, but rather the volume of
bugs and our slowness in getting additional people involved to remove
data corruption bugs more quickly and systematically.  Our reputation
for reliability has been harmed by this inactivity.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 9:47 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-05-10 16:01:53 -0400, Tom Lane wrote:
 The cause of the problem seems to be that the UPDATE performs a HOT update
 of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT
 updated by (0,3).  When unique_key_recheck() is invoked for (0,2), it
 believes, correctly, that it has to perform the recheck anyway ... but it
 tells check_exclusion_constraint that the check is being performed for
 (0,2).  So the index search inside check_exclusion_constraint finds the
 live tuple at (0,3) and thinks that is a conflict.

 Heh, it's curious that this wasn't found up until now. I also wonder if
 this might be related to the spurious violations Peter G. has been
 observing...

I don't think so. Speculative insertion relies on the assumption that
the speculatively inserted tuple isn't MVCC visible to other sessions.
I actually prototyped an implementation that avoided the historic
unprincipled deadlocks of exclusion constraints (a known limitation
since they were added), by making *UPDATE* also do a speculative
insertion, and by making even non-UPSERT INSERTs insert speculatively.

This almost worked, but when time came to back out of a speculative
insertion on an UPDATE due to a conflict from a concurrent session,
the implementation couldn't handle it - it was just a mess to try and
figure out how that was supposed to work with heap_update(), and so
that prototype was scrapped.

For the benefit of those not closely involved in the ON CONFLICT
project, I should point out that ON CONFLICT will not accept a
deferred index as an arbiter index.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
allow primary-key-column-only locks.  1.7 years later, we are still
dealing with bugs related to this feature.  Obviously, something is
wrong.

There were many 9.3 minor releases containing multi-xacts fixes, and
these fixes have extended into 9.4.  After the first few bug-fix
releases, I questioned whether we needed to revert or rework the
feature, but got no positive response.  Only in the past few weeks have
we got additional people involved.

I think we now know that our inaction didn't serve us well.  The
question is how can we identify chronic problems and get resources
involved sooner.  I feel we have been asleep at the wheel to some
extent on this.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
All,

* Stephen Frost (sfr...@snowman.net) wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Stephen Frost sfr...@snowman.net writes:
   if (lockmode == AccessShareLock)
   aclresult = pg_class_aclcheck(reloid, GetUserId(),
 ACL_SELECT);
   +   else if (lockmode == RowExclusiveLock)
   +   aclresult = pg_class_aclcheck(reloid, GetUserId(),
   +ACL_INSERT | ACL_UPDATE | ACL_DELETE | 
   ACL_TRUNCATE);
   else
   aclresult = pg_class_aclcheck(reloid, GetUserId(),
 ACL_UPDATE | ACL_DELETE | 
   ACL_TRUNCATE);
  
  Perhaps it would be better to refactor with a local variable for the
  aclmask and just one instance of the pg_class_aclcheck call.  Also, I'm
  pretty sure that the documentation work needed is more extensive
  than the actual patch ;-).  Otherwise, I don't see a problem with this.
 
 Now for a blast from the past...  This came up again on IRC recently and
 reminded me that I ran into the same issue a couple years back.  Updated
 patch includes the refactoring suggested and includes documentation.
 
 Not going to be back-patched, as discussed with Robert.
 
 Barring objections, I'll push this later today.

Done, finally.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Michael Paquier
On Tue, May 12, 2015 at 4:01 AM, Stephen Frost sfr...@snowman.net wrote:
 * Michael Paquier (michael.paqu...@gmail.com) wrote:
 On Mon, May 11, 2015 at 10:32 PM, Stephen Frost sfr...@snowman.net wrote:
  Now for a blast from the past...  This came up again on IRC recently and
  reminded me that I ran into the same issue a couple years back.  Updated
  patch includes the refactoring suggested and includes documentation.
 
  Barring objections, I'll push this later today.

 Small suggestion: a test case in src/test/isolation?

 This is entirely a permissions-related change and src/test/isolation is
 for testing concurrent behavior, not about testing permissions.

 I'm not saying that we shouldn't have more tests there, but it'd not
 be appropriate for this particular patch.

Perhaps. Note that we could have tests of this type though in lock.sql:
create role foo login;
create table aa (a int);
grant insert on aa TO foo;
\c postgres foo
begin;
insert into aa values (1);
lock table aa in row exclusive mode; -- should pass

Just mentioning it for the sake of the archives, I cannot work on that for now.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling trust/ident authentication configure option

2015-05-11 Thread Robert Haas
On Thu, May 7, 2015 at 4:57 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, May 7, 2015 at 11:02 AM, Stephen Frost sfr...@snowman.net wrote:
  I realize it's not going to be popular, but I'd love to have 'trust'
  only allowed if a command-line option is passed to the postmaster or
  something along those lines.  It's really got no business being an
  option for a network service like PG.

 I disagree wholeheartedly.  There is such a thing as a trusted network.

 Likely a good topic of conversation to be had in Ottawa. :)  I agree
 that there are trusted networks, but the ones that I work with still
 expect network services to require authentication and authorization.
 Perhaps they're not really trusted then, from your perspective.  On
 the other hand, I suppose if you use pg_hba to limit which accounts can
 be logged into with 'trust' then you might be able to have, say, a
 read-only user/database that anyone could see.  That's a pretty narrow
 case though and I'd rather we figure out how to address it directly and
 more specifically (no-password login roles?) than the broad
 disable-all-authentication trust method.

Let's suppose that you have an application server and a DB server
running on the same node.  That turns out to be too much load, so you
move the application server to a separate machine and connect the two
machines with a crossover cable, or a VLAN that has nothing else on
it.  To me, it's quite sane to want connections on that network to
proceed without authentication or authorization.  If you've got to
open up the database more than that then, yes, you need authentication
and authorization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
 On Tue, May 12, 2015 at 4:01 AM, Stephen Frost sfr...@snowman.net wrote:
  * Michael Paquier (michael.paqu...@gmail.com) wrote:
  On Mon, May 11, 2015 at 10:32 PM, Stephen Frost sfr...@snowman.net wrote:
   Now for a blast from the past...  This came up again on IRC recently and
   reminded me that I ran into the same issue a couple years back.  Updated
   patch includes the refactoring suggested and includes documentation.
  
   Barring objections, I'll push this later today.
 
  Small suggestion: a test case in src/test/isolation?
 
  This is entirely a permissions-related change and src/test/isolation is
  for testing concurrent behavior, not about testing permissions.
 
  I'm not saying that we shouldn't have more tests there, but it'd not
  be appropriate for this particular patch.
 
 Perhaps. Note that we could have tests of this type though in lock.sql:
 create role foo login;
 create table aa (a int);
 grant insert on aa TO foo;
 \c postgres foo
 begin;
 insert into aa values (1);
 lock table aa in row exclusive mode; -- should pass

Yeah, it might not be bad to have tests for all the different lock types
and make sure that the permissions match up.  I'd probably put those
tests into 'permissions.sql' instead though.

 Just mentioning it for the sake of the archives, I cannot work on that for 
 now.

Ditto.  I'm trying to work through the postgres_fdw UPDATE push-down
patch now..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-05-11 Thread David Steele
On 5/8/15 3:29 PM, Alvaro Herrera wrote:
 Here is a complete version.  Barring serious problems, I intend to
 commit this on Monday.

I have reviewed and tested this patch and everything looks good to me.
It also looks like you added better coverage for schema DDL, which is a
welcome addition.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] multixacts woes

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 10:24 AM, Josh Berkus wrote:


In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
this instead of adding a new GUC?  We already have a bunch of freezing
GUCs which fewer than 1% of our user base has any idea how to set.


That is a documentation problem not a user problem. Although I agree 
that yet another GUC for an obscure feature that should be internally 
intelligent is likely the wrong direction.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-11 Thread Andrew Dunstan


On 05/11/2015 02:02 AM, Amit Kapila wrote:
On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net wrote:




 This generally looks good, but I have a couple of questions before I 
commit it.


 First, why is the new option for the  BASE_BACKUP command of the 
Streaming Replication protcol TAR? It seems rather misleading. 
Shouldn't it be something like TABLESPACEMAP?



The reason to keep new option's name as TAR was that tablespace_map
was generated for that format type, but I agree with you that something
like TABLESPACEMAP suits better, so I have changed it to
TABLESPACE_MAP.  Putting '_' in name makes it somewhat consistent
with other names and filename it generates with this new option.


 Second, these lines in xlog.c seem wrong:

 else if ((ch == '\n' || ch == '\r')  prev_ch == '\\')
 str[i-1] = '\n';

 It looks to me like we should be putting ch in the string, not 
arbitrarily transforming \r into \n.



You are right, I have changed it as per your suggestion.





OK, I have cleaned this up a bit - I had already started so I didn't 
take your latest patch but instead applied relevant changes to my 
changeset. Here is my latest version.


In testing I notice that now pg_baseback -F t leaves it completely up 
to the user on all platforms to create the relevant links in pg_tblspc/. 
It includes the tablespace_map file in base.tar, but that's really just 
informational. I think we need to add something to the pg_basebackup 
docs about that, at the very least (and it will also need to be a 
release note item.)


cheers

andrew


diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index e25e0d0..def43a2 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -836,8 +836,11 @@ SELECT pg_start_backup('label');
  functionpg_start_backup/ creates a firsttermbackup label/ file,
  called filenamebackup_label/, in the cluster directory with
  information about your backup, including the start time and label
- string.  The file is critical to the integrity of the backup, should
- you need to restore from it.
+ string.  The function also creates a firsttermtablespace map/ file,
+ called filenametablespace_map/, in the cluster directory with
+ information about tablespace symbolic links in filenamepg_tblspc//
+ if one or more such link is present.  Both files are critical to the
+ integrity of the backup, should you need to restore from it.
 /para
 
 para
@@ -965,17 +968,20 @@ SELECT pg_stop_backup();
 
para
 It's also worth noting that the functionpg_start_backup/ function
-makes a file named filenamebackup_label/ in the database cluster
-directory, which is removed by functionpg_stop_backup/.
-This file will of course be archived as a part of your backup dump file.
-The backup label file includes the label string you gave to
-functionpg_start_backup/, as well as the time at which
-functionpg_start_backup/ was run, and the name of the starting WAL
-file.  In case of confusion it is therefore possible to look inside a
-backup dump file and determine exactly which backup session the dump file
-came from.  However, this file is not merely for your information; its
-presence and contents are critical to the proper operation of the system's
-recovery process.
+makes files named filenamebackup_label/ and
+filenametablesapce_map/ in the database cluster directory,
+which are removed by functionpg_stop_backup/.  These files will of
+course be archived as a part of your backup dump file.  The backup label
+file includes the label string you gave to functionpg_start_backup/,
+as well as the time at which functionpg_start_backup/ was run, and
+the name of the starting WAL file.  In case of confusion it is therefore
+possible to look inside a backup dump file and determine exactly which
+backup session the dump file came from.  The tablespace map file includes
+the symbolic link names as they exist in the directory
+filenamepg_tblspc// and the full path of each symbolic link.
+These files are not merely for your information; their presence and
+contents are critical to the proper operation of the system's recovery
+process.
/para
 
para
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fb39731..24d43d9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16591,11 +16591,12 @@ SELECT set_config('log_statement_stats', 'off', false);
 functionpg_start_backup/ accepts an
 arbitrary user-defined label for the backup.  (Typically this would be
 the name under which the backup dump file will be stored.)  The function
-writes a backup label file (filenamebackup_label/) into the
-database cluster's data directory, performs a checkpoint,
-and then returns the backup's starting transaction log location as text.
-The user 

Re: [HACKERS] Fixing busted citext function declarations

2015-05-11 Thread David E. Wheeler
On May 11, 2015, at 5:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Me too.  Something fell through the cracks rather badly there :-(.
 Would you check your commit history to see if anything else got missed?

Let’s see…

In 
https://github.com/theory/citext/commit/4030b4e1ad9fd9f994a6cdca1126a903682acae4
 I copied your use of specifying the full path to pg_catalog function, which is 
still in core.

In 
https://github.com/theory/citext/commit/c24132c098a822f5a8669ed522e747e01e1c0835,
 I made some tweaks based on you change you made to some version of my patch. 
Most are minor, or just for functions needed for 8.4 and not later versions.

In 
https://github.com/theory/citext/commit/2c7e997fd60e2b708d06c128e5fd2db51c7a9f33,
 I added a cast to bpchar, which is in core.

In 
https://github.com/theory/citext/commit/cf988024d18a6ddd9a8146ab8cabfe6e0167ba26
 and 
https://github.com/theory/citext/commit/22f91a0d50003a0c1c27d1fbf0bb5c0a1e3a3cad
 I switched from VARSIZE_ANY_EXHDR() to strlen() at your suggestion. Also still 
there.


Anyway, those are all from 2008 and pretty much just copy changes you made to 
core. The return value of regexp_matches() is the only significant change since 
then. So I think we’re good.

Best,

David\

smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
On Tue, May 12, 2015 at 12:29:56AM +0300, Heikki Linnakangas wrote:
 On 05/12/2015 12:00 AM, Bruce Momjian wrote:
 Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
 allow primary-key-column-only locks.  1.7 years later, we are still
 dealing with bugs related to this feature.  Obviously, something is
 wrong.
 
 There were many 9.3 minor releases containing multi-xacts fixes, and
 these fixes have extended into 9.4.  After the first few bug-fix
 releases, I questioned whether we needed to revert or rework the
 feature, but got no positive response.  Only in the past few weeks have
 we got additional people involved.
 
 The revert or rework ship had already sailed at that point. I

True.

 don't think we had much choice than just soldier through the bugs
 after the release.

The problem is we soldiered on without adding any resources to the
problem or doing a systematic review once it became clear one was
necessary.

 I think we now know that our inaction didn't serve us well.  The
 question is how can we identify chronic problems and get resources
 involved sooner.  I feel we have been asleep at the wheel to some
 extent on this.
 
 Yeah. I think the problem was that no-one realized that this was a
 significant change to the on-disk format. It was deceptively
 backwards-compatible. When it comes to permanent on-disk structures,
 we should all be more vigilant in the review.

Yes, and the size/age of the patch helped mask problems too.  Are these
the lessons we need to learn?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2015-05-11 Thread Andres Freund
On 2015-04-30 05:35:26 +0100, Andrew Gierth wrote:
  Andres == Andres Freund and...@anarazel.de writes:

  Andres This is not a real review.  I'm just scanning through the
  Andres patch, without reading the thread, to understand if I see
  Andres something worthy of controversy. While scanning I might have
  Andres a couple observations or questions.

   + * A list of grouping sets which is structurally equivalent to a 
 ROLLUP
   + * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a 
 single pass over
   + * ordered data.  We do this by keeping a separate set of 
 transition values
   + * for each grouping set being concurrently processed; for each 
 input tuple
   + * we update them all, and on group boundaries we reset some 
 initial subset
   + * of the states (the list of grouping sets is ordered from most 
 specific to
   + * least specific).  One AGG_SORTED node thus handles any number 
 of grouping
   + * sets as long as they share a sort order.

  Andres Found initial subset not very clear, even if I probably
  Andres guessed the right meaning.

 How about:

  *  [...], and on group boundaries we reset those states
  *  (starting at the front of the list) whose grouping values have
  *  changed (the list of grouping sets is ordered from most specific to
  *  least specific).  One AGG_SORTED node thus handles any number [...]

sounds good.

   + * To handle multiple grouping sets that _don't_ share a sort 
 order, we use
   + * a different strategy.  An AGG_CHAINED node receives rows in 
 sorted order
   + * and returns them unchanged, but computes transition values 
 for its own
   + * list of grouping sets.  At group boundaries, rather than 
 returning the
   + * aggregated row (which is incompatible with the input rows), 
 it writes it
   + * to a side-channel in the form of a tuplestore.  Thus, a 
 number of
   + * AGG_CHAINED nodes are associated with a single AGG_SORTED 
 node (the
   + * chain head), which creates the side channel and, when it 
 has returned
   + * all of its own data, returns the tuples from the tuplestore 
 to its own
   + * caller.

  Andres This paragraph deserves to be expanded imo.

 OK, but what in particular needs clarifying?

I'm not sure ;). I obviously was a bit tired...

  Andres Are you intending to resolve this before an eventual commit?
...
  Andres Possibly after the 'structural' issues are resolved? Or do you
  Andres think this can safely be put of for another release?

 I think the feature is useful even without AGG_HASHED, even though that
 means it can sometimes be beaten on performance by using UNION ALL of
 many separate GROUP BYs; but I'd defer to the opinions of others on that
 point.

I agree.

  Andres * Arguably this converts the execution *tree* into a DAG. Tom
  Andres seems to be rather uncomfortable with that. I am wondering
  Andres whether this really is a big deal - essentially this only
  Andres happens in a relatively 'isolated' part of the tree right?
  Andres I.e. if those chained together nodes were considered one node,
  Andres there would not be any loops?  Additionally, the way
  Andres parametrized scans works already essentially violates the
  Andres tree paradigma somewhat.

 The major downsides as I see them with the current approach are:

 1. It makes plans (and hence explain output) nest very deeply if you
 have complex grouping sets (especially cubes with high dimensionality).

That doesn't concern me overly much. If we feel the need to fudge the
explain output we certainly can.

 2. A union-based approach would have a chance of including AGG_HASHED
 support without any significant code changes, just by using one HashAgg
 node per qualifying grouping set. However, this would be potentially
 significantly slower than teaching HashAgg to do multiple grouping sets,
 and memory usage would be an issue.

Your however imo pretty much disqualifies that as an argument.

 The obvious alternative is this:

   - CTE x
  - entire input subplan here
   - Append
  - GroupAggregate
 - Sort
- CTE Scan x
  - GroupAggregate
 - Sort
- CTE Scan x
  - HashAggregate
 - CTE Scan x
  [...]

 which was basically what we expected to do originally. But all of the
 existing code to deal with CTE / CTEScan is based on the assumption that
 each CTE has a rangetable entry before planning starts, and it is
 completely non-obvious how to arrange to generate such CTEs on the fly
 while planning.  Tom offered in December to look into that aspect for
 us, and of course we've heard nothing about it since.

I find Noah's argument about this kind of structure pretty
convincing. We'd either increase the number of reads, or baloon the
amount of memory if we'd manage to find a structure that'd allow several
of the aggregates to be computed at the same 

Re: [HACKERS] Final Patch for GROUPING SETS

2015-05-11 Thread Andres Freund

I think the problem is just that for each variable, in each grouping
set - a very large number in a large cube - we're recursing through the
whole ChainAggregate tree, as each Var just points to a var one level
lower.

For small values of very large, that is. Had a little thinko there. Its still 
fault of recursing down all these levels, doing nontrivial work each time.

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Amit Kapila
On Tue, May 12, 2015 at 4:55 AM, Andres Freund and...@anarazel.de wrote:

 On 2015-05-11 19:04:32 -0400, Tom Lane wrote:
  I think there's nobody, or at least very few people, who are getting
  paid to find/fix bugs rather than write cool new features.  This is
  problematic.  It doesn't help when key committers are overwhelmed by
  trying to process other peoples' patches.  (And no, I'm not sure that
  appoint more committers would improve matters.  What we've got is
  too many barely-good-enough patches.  Tweaking the process to let those
  into the tree faster will not result in better quality.)

 +many

 Except perhaps that I'd expand find/fix bugs to include review and
 integrate patches. Because I think few people are paid to do that
 either.

Well said and another thing to add to your point is helping in supporting
the other people's ideas by providing usecase and or much more robust
design that can be accepted in community.
I think one of the reasons for the same is that there is no reasonable
guarantee that if a person spends good amount of time on review, helping
other patches in design phase and fixing bugs, his feature patch/es will be
given more priority which makes it difficult to bargain with one's manager
or company to get more time to involve in such activities.  I think if the
current process of development includes some form of prioritization for
the feature patches by people who spend more time in helping other
patches/maintenance, then it can improve the situation.  Currently, we
do have some system in CF process which suggest that a person has
to review equal number and complexity of patches as he or she submits
for others to review, but I am not sure if that is followed strictly and is
sufficient.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Alvaro Herrera
Robert Haas wrote:
 On Mon, May 11, 2015 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I think there's nobody, or at least very few people, who are getting
  paid to find/fix bugs rather than write cool new features.  This is
  problematic.  It doesn't help when key committers are overwhelmed by
  trying to process other peoples' patches.  (And no, I'm not sure that
  appoint more committers would improve matters.  What we've got is
  too many barely-good-enough patches.  Tweaking the process to let those
  into the tree faster will not result in better quality.)
 
 I agree, although generally I think committers are responsible for
 fixing what they commit, and I've certainly dropped everything a few
 times to do so.  And people who will someday become committers are
 generally the sorts of people who do that, too.  Perhaps we've relied
 overmuch on that in some cases - e.g. I really haven't paid much
 attention to the multixact stuff until lately, because I assumed that
 it was Alvaro's problem.  And maybe that's not right.  But I know that
 when a serious bug is found in something I committed, I expect that if
 anyone else fixes it, that's a bonus.

For the record, I share the responsibility over committed items
principle, and I adhere to it to as full an extent as possible.
Whenever possible I try to enlist the submitter's help for a fix, but if
they do not respond I consider whatever fix to be on me.  (I have
dropped everything to get fixes done, on several occasions.)

As for multixacts, since it's what brings up this thread, many of you
realize that the amount of time I have spent fixing issues post-facto is
enormous.  If I had a glimpse of the effort that the bugfixing would
cost, I would have certainly dropped it -- spending more time on it
before commit was out of the question.  I appreciate the involvement of
others in the fixes that became necessary.

One lesson I have learned from all this is to try to limit the
additional complexity from any individual patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

2015-05-11 Thread digoal zhou
   PostgreSQL (=9.4) trend to smooth buffer write smooth in a
checkpoint_completion_target (checkpoint_timeout or checkpoint_segments),
but when we use synchronous_commit=off, there is a little problem for
the checkpoint_segments
target, because xlog write fast(for full page write which the first page
write after checkpoint), so checkpointer cann't sleep and write buffer not
smooth.
   There is an test:
# stap -DMAXSKIPPED=10 -v 1 -e '
global s_var, e_var, stat_var;

/* probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid,
int); */
probe process(/opt/pgsql/bin/postgres).mark(smgr__md__read__start) {
  s_var[pid(),1] = gettimeofday_us()
}

/* probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int,
int, int); */
probe process(/opt/pgsql/bin/postgres).mark(smgr__md__read__done) {
  e_var[pid(),1] = gettimeofday_us()
  if ( s_var[pid(),1]  0 )
stat_var[pid(),1]  e_var[pid(),1] - s_var[pid(),1]
}

/* probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid,
int); */
probe process(/opt/pgsql/bin/postgres).mark(smgr__md__write__start) {
  s_var[pid(),2] = gettimeofday_us()
}

/* probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int,
int, int); */
probe process(/opt/pgsql/bin/postgres).mark(smgr__md__write__done) {
  e_var[pid(),2] = gettimeofday_us()
  if ( s_var[pid(),2]  0 )
stat_var[pid(),2]  e_var[pid(),2] - s_var[pid(),2]
}

probe process(/opt/pgsql/bin/postgres).mark(buffer__sync__start) {
  printf(buffer__sync__start num_buffers: %d, dirty_buffers: %d\n,
$NBuffers, $num_to_write)
}

probe process(/opt/pgsql/bin/postgres).mark(checkpoint__start) {
  printf(checkpoint start\n)
}

probe process(/opt/pgsql/bin/postgres).mark(checkpoint__done) {
  printf(checkpoint done\n)
}

probe timer.s(1) {
  foreach ([v1,v2] in stat_var +) {
if ( @count(stat_var[v1,v2]) 0 ) {
  printf(r1_or_w2 %d, pid: %d, min: %d, max: %d, avg: %d, sum: %d,
count: %d\n, v2, v1, @min(stat_var[v1,v2]), @max(stat_var[v1,v2]),
@avg(stat_var[v1,v2]), @sum(stat_var[v1,v2]), @count(stat_var[v1,v2]))
}
  }

printf(--end-\n)
  delete s_var
  delete e_var
  delete stat_var
}'



Use the test table and data:
create table tbl(id primary key,info text,crt_time timestamp);
insert into tbl select generate_series(1,5000),now(),now();


Use pgbench test it.
$ vi test.sql
\setrandom id 1 5000
update tbl set crt_time=now() where id = :id ;


$ pgbench -M prepared -n -r -f ./test.sql -P 1 -c 28 -j 28 -T 1
When on schedule checkpoint occure , the tps:
progress: 255.0 s, 58152.2 tps, lat 0.462 ms stddev 0.504
progress: 256.0 s, 31382.8 tps, lat 0.844 ms stddev 2.331
progress: 257.0 s, 14615.5 tps, lat 1.863 ms stddev 4.554
progress: 258.0 s, 16258.4 tps, lat 1.652 ms stddev 4.139
progress: 259.0 s, 17814.7 tps, lat 1.526 ms stddev 4.035
progress: 260.0 s, 14573.8 tps, lat 1.825 ms stddev 5.592
progress: 261.0 s, 16736.6 tps, lat 1.600 ms stddev 5.018
progress: 262.0 s, 19060.5 tps, lat 1.448 ms stddev 4.818
progress: 263.0 s, 20553.2 tps, lat 1.290 ms stddev 4.146
progress: 264.0 s, 26223.0 tps, lat 1.042 ms stddev 3.711
progress: 265.0 s, 31953.0 tps, lat 0.836 ms stddev 2.837
progress: 266.0 s, 43396.1 tps, lat 0.627 ms stddev 1.615
progress: 267.0 s, 50487.8 tps, lat 0.533 ms stddev 0.647
progress: 268.0 s, 53537.7 tps, lat 0.502 ms stddev 0.598
progress: 269.0 s, 54259.3 tps, lat 0.496 ms stddev 0.624
progress: 270.0 s, 56139.8 tps, lat 0.479 ms stddev 0.524

The parameters for onschedule checkpoint:
checkpoint_segments = 512
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9

stap's output :
there is 156467 dirty blocks, we can see the buffer write per second, write
buffer is not smooth between time target.
but between xlog target.
156467/(4.5*60*0.9) = 579.5 write per second.


checkpoint start
buffer__sync__start num_buffers: 262144, dirty_buffers: 156467
r1_or_w2 2, pid: 19848, min: 41, max: 1471, avg: 49, sum: 425291, count:
8596
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 153, avg: 49, sum: 450597, count: 9078
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 643, avg: 51, sum: 429193, count: 8397
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 1042, avg: 55, sum: 449091, count:
8097
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 254, avg: 52, sum: 296668, count: 5617
--end-
r1_or_w2 2, pid: 19848, min: 39, max: 171, avg: 54, sum: 321027, count: 5851
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 138, avg: 60, sum: 300056, count: 4953
--end-
r1_or_w2 2, pid: 19848, min: 42, max: 

[HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-11 Thread Craig Ringer
Hi all

For some time I've wanted a way to SET SESSION AUTHORISATION or SET
ROLE in a way that cannot simply be RESET, so that a connection may be
handed to a less-trusted service or application to do some work with.

This is most useful for connection pools, where it's currently necessary to
have a per-user pool, to trust users not to do anything naughty, or to
filter the functions and commands they can run through a whitelist to stop
them trying to escalate rights to the pooler user.

In the short term I'd like to:

* Add a WITH COOKIE option to SET SESSION AUTHORIZATION, which takes an
app-generated code (much like PREPARE TRANSACTION does).

* Make DISCARD ALL, RESET SESSION AUTHORIZATION, etc, also take WITH
COOKIE. If session authorization was originally set with a cookie value,
the same cookie value must be passed or an ERROR will be raised when RESET
is attempted.

* A second SET SESSION AUTHORIZATION without a prior RESET would be
rejected with an ERROR if the first SET used a cookie value.

This can be done without protocol-level changes and with no backward
compatibility impact to existing applications. Any objections?


These commands will appear in the logs if log_statement = 'all', but the
codes are transient cookie values, not passwords. They'll be visible in
pg_stat_activity but only to the privileged user. It'll probably be
necessary to clear the last command string when executing SET SESSION
AUTHORIZATION so the new user can't snoop the cookie value from a
concurrent connection, but that should be simple enough.

As is currently the case, poolers will still have to use a superuser
connection if they want to pool across users.




In the longer term I want to add a protocol-level equivalent that lets a
session switch session authorization or role, for efficiency and log-spam
reasons.

I'm also interested in a way to allow SET SESSION AUTHORIZATION to a list
of permitted roles when run as a non-superuser, for connection pool use.
SET ROLE might do, but it's more visible to apps, wheras SET SESSION
AUTHORIZATION really makes the connection appear to become the target
user.

That's later though - first,


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] Patch to improve a few appendStringInfo* calls

2015-05-11 Thread Peter Eisentraut
On 4/11/15 6:25 AM, David Rowley wrote:
 I've attached a small patch which just fixes a few appendStringInfo*
 calls that are not quite doing things the way that it was intended. 

done



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 04:04 PM, Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:

On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:

What I am arguing is that the release cycle is at least a big part
of the problem. We are trying to get so many new features that bugs
are increasing and quality is decreasing.



Now that is an interesting observation --- are we too focused on patches
and features to realize when we need to seriously revisit an issue?


I think there's nobody, or at least very few people, who are getting
paid to find/fix bugs rather than write cool new features.  This is
problematic.  It doesn't help when key committers are overwhelmed by
trying to process other peoples' patches.  (And no, I'm not sure that
appoint more committers would improve matters.  What we've got is
too many barely-good-enough patches.  Tweaking the process to let those
into the tree faster will not result in better quality.)


Exactly correct.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixing busted citext function declarations

2015-05-11 Thread Tom Lane
David E. Wheeler da...@justatheory.com writes:
 On May 5, 2015, at 9:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In
 http://www.postgresql.org/message-id/bn1pr04mb37467aa1d412223b3d4a595df...@bn1pr04mb374.namprd04.prod.outlook.com
 it's revealed that the citext extension misdeclares its versions of
 regexp_matches(): they should return SETOF text[] but they're marked
 as returning just text[].

 I wanted to make sure my backport was fixed for this, but it turns out it was 
 already fixed as of this commit:

   https://github.com/theory/citext/commit/99c925f

 Note that I credited you for the spot --- way back in October 2009! Pretty 
 confused how the same change wasn’t made to the core contrib module back 
 then.

Me too.  Something fell through the cracks rather badly there :-(.
Would you check your commit history to see if anything else got missed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 04:18 PM, Simon Riggs wrote:

On 11 May 2015 at 23:47, Bruce Momjian br...@momjian.us
mailto:br...@momjian.us wrote:

On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
 The releases themselves are not the problem, but rather the volume of
 bugs and our slowness in getting additional people involved to remove
 data corruption bugs more quickly and systematically.  Our reputation
 for reliability has been harmed by this inactivity.
 

 What I am arguing is that the release cycle is at least a big part
 of the problem. We are trying to get so many new features that bugs
 are increasing and quality is decreasing.

Now that is an interesting observation --- are we too focused on patches
and features to realize when we need to seriously revisit an issue?


I think we are unused to bugs. We have a much lower bug rate than any
other system.


True we are used to having extremely high quality releases but if you 
look at the release notes since say 9.2, we are seeing a much larger 
increase in bug rates.


It is true that generally speaking our bug rate is low in comparison to 
other databases. That said, I think we are also resting on some laurels 
here per my previous paragraph.




I think we seriously need to review our policy of adding major new
features and have them enabled by default with no parameter to disable
them. In the early years of PostgreSQL everything had an off switch,
e.g. stats, bgwriter and even autovacuum defaulted to off for many years.


That's interesting although I am unsure of the cost of such a thing.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-11 Thread Kouhei Kaigai
Hi,

Let me back to the original concern and show possible options
we can take here. At least, the latest master is not usable to
implement custom-join logic without either of these options.

option-1)
Revert create_plan_recurse() to non-static function for extensions.
It is the simplest solution, however, it is still gray zone which
functions shall be public and whether we deal with the non-static
functions as a stable API or not.
IMO, we shouldn't treat non-static functions as stable APIs, even
if it can be called by extensions not only codes in another source
file. In fact, we usually changes definition of non-static functions
even though we know extensions uses. It is role of extension to
follow up the feature across major version.


option-2)
Tom's suggestion. Add a new list field of Path nodes on CustomPath,
then create_customscan_plan() will call static create_plan_recurse()
function to construct child plan nodes.
Probably, the attached patch will be an image of this enhancement,
but not tested yet, of course. Once we adopt this approach, I'll
adjust my PG-Strom code towards the new interface within 2 weeks
and report problems if any.


option-3)
Enforce authors of custom-scan provider to copy and paste createplan.c.
I really don't want this option and nobody will be happy.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Kaigai Kouhei(海外 浩平)
 Sent: Monday, May 11, 2015 12:48 PM
 To: 'Andres Freund'; Robert Haas
 Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada;
 pgsql-hackers@postgreSQL.org
 Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
 
  On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
   On Sun, May 10, 2015 at 8:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This commit reverts create_plan_recurse() as static function.
Yes.  I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful.  (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)
 
  Wasn't there a submitted use case? IIRC Kaigai had referenced some
  pg-strom (?) code using it?
 
  I'm failing to see how create_plan_recurse() being exposed externally is
  related to having gotten committed without submitted use-cases.  Even
  if submitted, presumably as simple as possible code, doesn't use it,
  that's not a proof that less simple code does not need it.
 
 Yes, PG-Strom code uses create_plan_recurse() to construct child plan
 node of the GPU accelerated custom-join logic, once it got chosen.
 Here is nothing special. It calls create_plan_recurse() as built-in
 join path doing on the underlying inner/outer paths.
 It is not difficult to submit as a working example, however, its total
 code size (excludes GPU code) is 25KL at this moment.
 
 I'm not certain whether it is a simple example.
 
   Your unwillingness to make functions global or to stick PGDLLIMPORT
   markings on variables that people want access to is hugely
   handicapping extension authors.  Many people have complained about
   that on multiple occasions.  Frankly, I find it obstructionist and
   petty.
 
  While I don't find the tone of the characterization super helpful, I do
  tend to agree that we're *far* too conservative on that end.  I've now
  seen a significant number of extension that copied large swathes of code
  just to cope with individual functions not being available.  And even
  cases where that lead to minor forks with such details changed.
 
 I may have to join the members?
 
  I know that I'm fearful of asking for functions being made
  public. Because it'll invariably get into a discussion of merits that's
  completely out of proportion with the size of the change.  And if I, who
  has been on the list for a while now, am afraid in that way, you can
  be sure that others won't even dare to ask, lest argue their way
  through.
 
  I think the problem is that during development the default often is to
  create function as static if they're used only in one file. Which is
  fine. But it really doesn't work if it's a larger battle to change
  single incidences.  Besides the pain of having to wait for the next
  major release...
 
 Thanks,
 --
 NEC Business Creation Division / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com



custom-join-children.patch
Description: custom-join-children.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 02:15 PM, Bruce Momjian wrote:


On Mon, May 11, 2015 at 02:11:48PM -0700, Joshua Drake wrote:





Here are some options

Slow down the release cycle
The shortness of the release cycle puts a preference on adding
features versus providing a more mature outcome.

or

Increase the release cycle
Moving to a Ubuntu style release cycle would allow us to have a
window to scratch the itch but with the eventual (and known) release
of something that is LTS.


The releases themselves are not the problem, but rather the volume of
bugs and our slowness in getting additional people involved to remove
data corruption bugs more quickly and systematically.  Our reputation
for reliability has been harmed by this inactivity.



What I am arguing is that the release cycle is at least a big part of 
the problem. We are trying to get so many new features that bugs are 
increasing and quality is decreasing.


If we change the release cycle it will encourage an increase in eyeballs 
on code we are developing because people aren't going to be in such a 
rush to get this feature done for this release.


Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Tom Lane
On further consideration ...

I don't much like the division of labor between LockForeignRow and
FetchForeignRow.  In principle that would lead to not one but two
extra remote accesses per locked row in SELECT FOR UPDATE, at least
in the case that an EvalPlanQual recheck is required.  (I see that
in your prototype patch for postgres_fdw you attempt to avoid that
by saving a retrieved tuple in LockForeignRow and then returning it
in FetchForeignRow, but that seems extremely ugly and bug-prone,
since there is nothing in the API spec insisting that those calls match
up one-to-one.)  The fact that locking and fetching a tuple are separate
operations in the heapam API is a historical artifact that probably
doesn't make sense to duplicate in the FDW API.

Another problem is that we fail to detect whether an EvalPlanQual recheck
is required during a SELECT FOR UPDATE on a remote table, which we
certainly ought to do if the objective is to make postgres_fdw semantics
match the local ones.

What I'm inclined to do is merge LockForeignRow and FetchForeignRow
into one operation, which would have the semantics of returning a
palloc'd tuple, or NULL on a SKIP LOCKED failure, and it would be expected
to acquire a lock according to erm-markType (in particular, no lock just
a re-fetch for ROW_MARK_REFERENCE).  In addition it needs some way of
reporting that the returned row is an updated version rather than the
original.  Probably just an extra output boolean would do for that.

This'd require some refactoring in nodeLockRows, because we'd want to
be able to hang onto the returned tuple without necessarily provoking
an EvalPlanQual cycle, but it doesn't look too bad.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multixid hindsight design

2015-05-11 Thread Tom Lane
Heikki Linnakangas hlinn...@iki.fi writes:
 So the lesson here is that having a permanent pg_multixact is not nice, 
 and we should get rid of it. Here's how to do that:

That would be cool, but ...

 Looking at the tuple header, the CID and CTID fields are only needed, 
 when either xmin or xmax is running. Almost: in a HOT-updated tuple, 
 CTID is required even after xmax has committed, but since it's a HOT 
 update, the new tuple is always on the same page so you only need the 
 offsetnumber part.

I think this is totally wrong.  HOT update or not, you need the forward
link represented by ctid not just until xmin/xmax commit, but until
they're in the past according to all active snapshots.  That's so that
other transactions can chain forward to the current version of a tuple
which they found according to their snapshots.

It might be you can salvage the idea anyway, since it's still true that
the forward links wouldn't be needed after a crash and restart.  But the
argument as stated is wrong.

(There's also the question of forensic requirements, although I'm aware
that it's barely worth bringing that up since nobody else here seems to
put much weight on that.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
 What I am arguing is that the release cycle is at least a big part
 of the problem. We are trying to get so many new features that bugs
 are increasing and quality is decreasing.

 Now that is an interesting observation --- are we too focused on patches
 and features to realize when we need to seriously revisit an issue?

I think there's nobody, or at least very few people, who are getting
paid to find/fix bugs rather than write cool new features.  This is
problematic.  It doesn't help when key committers are overwhelmed by
trying to process other peoples' patches.  (And no, I'm not sure that
appoint more committers would improve matters.  What we've got is
too many barely-good-enough patches.  Tweaking the process to let those
into the tree faster will not result in better quality.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Simon Riggs
On 11 May 2015 at 23:47, Bruce Momjian br...@momjian.us wrote:

 On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
  The releases themselves are not the problem, but rather the volume of
  bugs and our slowness in getting additional people involved to remove
  data corruption bugs more quickly and systematically.  Our reputation
  for reliability has been harmed by this inactivity.
  
 
  What I am arguing is that the release cycle is at least a big part
  of the problem. We are trying to get so many new features that bugs
  are increasing and quality is decreasing.

 Now that is an interesting observation --- are we too focused on patches
 and features to realize when we need to seriously revisit an issue?


I think we are unused to bugs. We have a much lower bug rate than any other
system.

I think we seriously need to review our policy of adding major new features
and have them enabled by default with no parameter to disable them. In the
early years of PostgreSQL everything had an off switch, e.g. stats,
bgwriter and even autovacuum defaulted to off for many years.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] deparsing utility commands

2015-05-11 Thread Alvaro Herrera
David Steele wrote:

 I have reviewed and tested this patch and everything looks good to me.
 It also looks like you added better coverage for schema DDL, which is a
 welcome addition.

Thanks -- I have pushed this now.

My hope is to get this test module extended quite a bit, not only to
cover existing commands, but also so that it causes future changes to
cause failure unless command collection is considered.  (In a previous
version of this patch, there was a test mode that ran everything in the
serial_schedule of regular regression tests.  That was IMV a good way to
ensure that new commands were also tested to run under command
collection.  That hasn't been enabled in the new test module, and I
think it's necessary.)

If anyone wants to contribute to the test module so that more is
covered, that would be much appreciated.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
 The releases themselves are not the problem, but rather the volume of
 bugs and our slowness in getting additional people involved to remove
 data corruption bugs more quickly and systematically.  Our reputation
 for reliability has been harmed by this inactivity.
 
 
 What I am arguing is that the release cycle is at least a big part
 of the problem. We are trying to get so many new features that bugs
 are increasing and quality is decreasing.

Now that is an interesting observation --- are we too focused on patches
and features to realize when we need to seriously revisit an issue?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixing busted citext function declarations

2015-05-11 Thread David E. Wheeler
Tom,

On May 5, 2015, at 9:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 In
 http://www.postgresql.org/message-id/bn1pr04mb37467aa1d412223b3d4a595df...@bn1pr04mb374.namprd04.prod.outlook.com
 it's revealed that the citext extension misdeclares its versions of
 regexp_matches(): they should return SETOF text[] but they're marked
 as returning just text[].

I wanted to make sure my backport was fixed for this, but it turns out it was 
already fixed as of this commit:

  https://github.com/theory/citext/commit/99c925f

Note that I credited you for the spot --- way back in October 2009! Pretty 
confused how the same change wasn’t made to the core contrib module back then.

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Andres Freund
On 2015-05-11 19:04:32 -0400, Tom Lane wrote:
 I think there's nobody, or at least very few people, who are getting
 paid to find/fix bugs rather than write cool new features.  This is
 problematic.  It doesn't help when key committers are overwhelmed by
 trying to process other peoples' patches.  (And no, I'm not sure that
 appoint more committers would improve matters.  What we've got is
 too many barely-good-enough patches.  Tweaking the process to let those
 into the tree faster will not result in better quality.)

+many

Except perhaps that I'd expand find/fix bugs to include review and
integrate patches. Because I think few people are paid to do that
either.  I now partially am (which obviously isn't sufficient). There's
no way it's possible to e.g. work on integrating something like upsert
in a reasonable timeframe otherwise.

The lack of paid time to integrate stuff properly also leads to part of
the quality problem, besides delaying stuff.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Leftovers after dcae5fac (improve speed of make check-world) in git tree with check-world

2015-05-11 Thread Michael Paquier
Hi all,

Since dcae5fac, we use a central path ($GIT_ROOT/tmp_install) to store
the installation deliverables during tests. Each module uses a
.gitignore with more or less those entries:
/log/
/results/
/tmp_check/

First note that /log/ is not used anymore, replaced by /tmp_check/log/
so we could just remove it. Also, as we do not ignore regression.diffs
and regression.out on purpose to check easily what has failed, what
about removing /tmp_check/ as well in those .gitignore files? It now
contains pgdata/ and log/, so it looks useful to me to show it with
git status to track it in case of failure as well.
Thoughts? Attached is a patch with all those files updated.
-- 
Michael
diff --git a/contrib/btree_gin/.gitignore b/contrib/btree_gin/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/btree_gin/.gitignore
+++ b/contrib/btree_gin/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/btree_gist/.gitignore b/contrib/btree_gist/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/btree_gist/.gitignore
+++ b/contrib/btree_gist/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/citext/.gitignore b/contrib/citext/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/citext/.gitignore
+++ b/contrib/citext/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/cube/.gitignore b/contrib/cube/.gitignore
index cb4c989..a6484a0 100644
--- a/contrib/cube/.gitignore
+++ b/contrib/cube/.gitignore
@@ -1,6 +1,4 @@
 /cubeparse.c
 /cubescan.c
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/dblink/.gitignore b/contrib/dblink/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/dblink/.gitignore
+++ b/contrib/dblink/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/dict_int/.gitignore b/contrib/dict_int/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/dict_int/.gitignore
+++ b/contrib/dict_int/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/dict_xsyn/.gitignore b/contrib/dict_xsyn/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/dict_xsyn/.gitignore
+++ b/contrib/dict_xsyn/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/earthdistance/.gitignore b/contrib/earthdistance/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/earthdistance/.gitignore
+++ b/contrib/earthdistance/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/file_fdw/.gitignore b/contrib/file_fdw/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/file_fdw/.gitignore
+++ b/contrib/file_fdw/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/hstore/.gitignore b/contrib/hstore/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/hstore/.gitignore
+++ b/contrib/hstore/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/hstore_plperl/.gitignore b/contrib/hstore_plperl/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/hstore_plperl/.gitignore
+++ b/contrib/hstore_plperl/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/hstore_plpython/.gitignore b/contrib/hstore_plpython/.gitignore
index ce6fab9..351dd41 100644
--- a/contrib/hstore_plpython/.gitignore
+++ b/contrib/hstore_plpython/.gitignore
@@ -1,6 +1,4 @@
 # Generated subdirectories
 /expected/python3/
-/log/
 /results/
 /sql/python3/
-/tmp_check/
diff --git a/contrib/intarray/.gitignore b/contrib/intarray/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/intarray/.gitignore
+++ b/contrib/intarray/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/ltree/.gitignore b/contrib/ltree/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/ltree/.gitignore
+++ b/contrib/ltree/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/ltree_plpython/.gitignore b/contrib/ltree_plpython/.gitignore
index ce6fab9..351dd41 100644
--- a/contrib/ltree_plpython/.gitignore
+++ b/contrib/ltree_plpython/.gitignore
@@ -1,6 +1,4 @@
 # Generated subdirectories
 /expected/python3/
-/log/
 /results/
 /sql/python3/
-/tmp_check/
diff --git a/contrib/pg_trgm/.gitignore b/contrib/pg_trgm/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/pg_trgm/.gitignore
+++ b/contrib/pg_trgm/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/pgcrypto/.gitignore b/contrib/pgcrypto/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/pgcrypto/.gitignore
+++ b/contrib/pgcrypto/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git 

Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think there's nobody, or at least very few people, who are getting
 paid to find/fix bugs rather than write cool new features.  This is
 problematic.  It doesn't help when key committers are overwhelmed by
 trying to process other peoples' patches.  (And no, I'm not sure that
 appoint more committers would improve matters.  What we've got is
 too many barely-good-enough patches.  Tweaking the process to let those
 into the tree faster will not result in better quality.)

I agree, although generally I think committers are responsible for
fixing what they commit, and I've certainly dropped everything a few
times to do so.  And people who will someday become committers are
generally the sorts of people who do that, too.  Perhaps we've relied
overmuch on that in some cases - e.g. I really haven't paid much
attention to the multixact stuff until lately, because I assumed that
it was Alvaro's problem.  And maybe that's not right.  But I know that
when a serious bug is found in something I committed, I expect that if
anyone else fixes it, that's a bonus.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Andres Freund
HI,

Don't have time to go through this in depth.

On 2015-05-11 18:53:22 -0700, Peter Geoghegan wrote:
 Note that the patch proposes to de-support CREATE RULE with an
 alternative action involving ON CONFLICT DO UPDATE. Such a rule seems
 particularly questionable to me, and I'm pretty sure it was understood
 that only ON CONFLICT DO NOTHING should be supported as an action for
 a rule (recall that INSERT statements with ON CONFLICT can, in
 general, never target relations with rules). At least, I believe
 Heikki said that.

 Deparsing with an inference clause is now correctly supported.  However,
 user-defined rules with ON CONFLICT DO UPDATE are now formally
 disallowed/unsupported.  It seemed there would be significant complexity
 involved in making this work correctly with the EXCLUDED.*
 pseudo-relation, which was not deemed worthwhile.  Such a user-defined
 rule seems very questionable anyway.

Do I understand correctly that you cut this out because there
essentially was a ruleutils bug with with EXCLUDED? If so, I don't find
that acceptable. I'm pretty strictly convincded that independent of rule
support, we shouldn't add things to insert queries that get_query_def
can't deparse.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Andres Freund
On 2015-05-11 19:33:02 -0700, Peter Geoghegan wrote:
 On Mon, May 11, 2015 at 7:22 PM, Peter Geoghegan p...@heroku.com wrote:
  You just get an error within the
  optimizer following rewriting of a query involving the application of
  a rule that specifies an ON CONFLICT DO UPDATE alternative/DO INSTEAD
  action. I found it would say: variable not found in subplan target
  lists.
 
 BTW, that error was only raised when the EXCLUDED.* pseudo-alias was
 actually used, which is why our previous testing missed it.

You should try to understand why it's failing. Just prohibiting the
rules, without understanding what's actually going on, could very well
hide a real bug.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 7:34 PM, Andres Freund and...@anarazel.de wrote:
 You should try to understand why it's failing. Just prohibiting the
 rules, without understanding what's actually going on, could very well
 hide a real bug.

It's not as if I have no idea. ReplaceVarsFromTargetList() is probably
quite confused by all this, because the passed nomatch_varno argument
is often rt_index -- but what about EXCLUDED.*? adjustJoinTreeList()
does not know anything about EXCLUDED.* either. I see little practical
reason to make the rewriter do any better.

When I debugged the problem of the optimizer raising that target
lists error with a rule with an action with EXCLUDED.* (within
setrefs.c's fix_join_expr_mutator()), it looked like an off-by-one
issue here:

/* If it's for acceptable_rel, adjust and return it */
if (var-varno == context-acceptable_rel)
{
var = copyVar(var);
var-varno += context-rtoffset;
if (var-varnoold  0)
var-varnoold += context-rtoffset;
return (Node *) var;
}

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
Attached patch fixes some issues with ON CONFLICT DO UPDATE/DO
NOTHING. There is a commit message which explains the changes at a
high level. The only real bug fix is around deparsing by ruleutils.c.

Note that the patch proposes to de-support CREATE RULE with an
alternative action involving ON CONFLICT DO UPDATE. Such a rule seems
particularly questionable to me, and I'm pretty sure it was understood
that only ON CONFLICT DO NOTHING should be supported as an action for
a rule (recall that INSERT statements with ON CONFLICT can, in
general, never target relations with rules). At least, I believe
Heikki said that.

Thoughts?
-- 
Peter Geoghegan
From 349a0a1fee6d86de8c5cc4120474ddc48aeb43e0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan peter.geoghega...@gmail.com
Date: Mon, 11 May 2015 15:37:54 -0700
Subject: [PATCH] Fixes to a variety of minor ON CONFLICT issues

Deparsing with an inference clause is now correctly supported.  However,
user-defined rules with ON CONFLICT DO UPDATE are now formally
disallowed/unsupported.  It seemed there would be significant complexity
involved in making this work correctly with the EXCLUDED.*
pseudo-relation, which was not deemed worthwhile.  Such a user-defined
rule seems very questionable anyway.

In passing, re-factor InferenceElem representation of opclass, to defer
opfamily lookup for Relation index matching until index inference proper
(i.e., within the optimizer).  This is done for the convenience of the
new ruleutils.c code, but independently make senses.

Finally, fix a few typos, and rename a variable -- candidates seemed
like a misnomer for the return value of infer_arbiter_indexes().
---
 contrib/pg_stat_statements/pg_stat_statements.c |  3 +-
 doc/src/sgml/ref/create_rule.sgml   |  5 ++
 src/backend/executor/nodeModifyTable.c  |  2 +-
 src/backend/nodes/copyfuncs.c   |  3 +-
 src/backend/nodes/equalfuncs.c  |  3 +-
 src/backend/nodes/outfuncs.c|  3 +-
 src/backend/nodes/readfuncs.c   |  3 +-
 src/backend/optimizer/plan/setrefs.c|  3 +-
 src/backend/optimizer/util/plancat.c| 34 +++
 src/backend/parser/parse_clause.c   | 16 ++
 src/backend/parser/parse_utilcmd.c  |  5 ++
 src/backend/utils/adt/ruleutils.c   | 75 -
 src/include/nodes/primnodes.h   |  3 +-
 src/test/regress/expected/insert_conflict.out   |  2 +-
 src/test/regress/expected/rules.out | 67 --
 src/test/regress/sql/rules.sql  | 32 +++
 16 files changed, 177 insertions(+), 82 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6abe3f0..f97cc2c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2637,8 +2637,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 InferenceElem *ie = (InferenceElem *) node;
 
 APP_JUMB(ie-infercollid);
-APP_JUMB(ie-inferopfamily);
-APP_JUMB(ie-inferopcinputtype);
+APP_JUMB(ie-inferopclass);
 JumbleExpr(jstate, ie-expr);
 			}
 			break;
diff --git a/doc/src/sgml/ref/create_rule.sgml b/doc/src/sgml/ref/create_rule.sgml
index 53fdf56..46a8a7c 100644
--- a/doc/src/sgml/ref/create_rule.sgml
+++ b/doc/src/sgml/ref/create_rule.sgml
@@ -281,6 +281,11 @@ UPDATE mytable SET name = 'foo' WHERE id = 42;
match the condition literalid = 42/literal.  This is an
implementation restriction that might be fixed in future releases.
   /para
+  para
+   Presently, a rule alternative action cannot contain literalON
+   CONFLICT DO UPDATE/literal.  This is an implementation
+   restriction that might be fixed in future releases.
+  /para
  /refsect1
 
  refsect1
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 34435c7..55a1cc7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -414,7 +414,7 @@ ExecInsert(ModifyTableState *mtstate,
    estate, true, specConflict,
    arbiterIndexes);
 
-			/* adjust the tuple's state accordingly */
+			/* adjust the tuple's state */
 			if (!specConflict)
 heap_finish_speculative(resultRelationDesc, tuple);
 			else
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 76b63af..957c2bc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1803,8 +1803,7 @@ _copyInferenceElem(const InferenceElem *from)
 
 	COPY_NODE_FIELD(expr);
 	COPY_SCALAR_FIELD(infercollid);
-	COPY_SCALAR_FIELD(inferopfamily);
-	COPY_SCALAR_FIELD(inferopcinputtype);
+	COPY_SCALAR_FIELD(inferopclass);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e032142..f26626e 100644
--- a/src/backend/nodes/equalfuncs.c
+++ 

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 6:53 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch fixes some issues with ON CONFLICT DO UPDATE/DO
 NOTHING. There is a commit message which explains the changes at a
 high level.

I just realized that there is a silly bug here. Attached is a fix that
applies on top of the original.


-- 
Peter Geoghegan
From 1b32558d188762eb5c7214ea5ae042897e7d004f Mon Sep 17 00:00:00 2001
From: Peter Geoghegan peter.geoghega...@gmail.com
Date: Mon, 11 May 2015 19:02:12 -0700
Subject: [PATCH 2/2] Add closing bracket

---
 src/backend/utils/adt/ruleutils.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 355acc9..aa21693 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -7768,6 +7768,9 @@ get_rule_expr(Node *node, deparse_context *context,
 get_rule_expr((Node *) iexpr-expr,
 			  context, showimplicit);
 
+if (need_parens)
+	appendStringInfoChar(buf, ')');
+
 context-varprefix = varprefix;
 
 if (iexpr-infercollid)
@@ -7782,8 +7785,6 @@ get_rule_expr(Node *node, deparse_context *context,
 
 	get_opclass_name(inferopclass, inferopcinputtype, buf);
 }
-if (need_parens)
-	appendStringInfoChar(buf, ')');
 			}
 			break;
 
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 7:11 PM, Andres Freund and...@anarazel.de wrote:
 Do I understand correctly that you cut this out because there
 essentially was a ruleutils bug with with EXCLUDED? If so, I don't find
 that acceptable. I'm pretty strictly convincded that independent of rule
 support, we shouldn't add things to insert queries that get_query_def
 can't deparse.

No, that wasn't the reason -- deparsing itself works fine. That code
remains within ruleutils.c because I agree with this principle of
yours.

I tested the deparsing logic before making the case added fail as
unsupported. If you remove the new ereport() call that relates to
non-suppport of ON CONFLICT DO UPDATE as a rule action, then the
CREATE RULE still succeeds, and you can deparse the query just fine
(by quering pg_rules, typically). You just get an error within the
optimizer following rewriting of a query involving the application of
a rule that specifies an ON CONFLICT DO UPDATE alternative/DO INSTEAD
action. I found it would say: variable not found in subplan target
lists.

I can't say that I explored the question very thoroughly, but it seems
bothersome to have more than one relation involved in a Query involved
in rewriting.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 7:22 PM, Peter Geoghegan p...@heroku.com wrote:
 You just get an error within the
 optimizer following rewriting of a query involving the application of
 a rule that specifies an ON CONFLICT DO UPDATE alternative/DO INSTEAD
 action. I found it would say: variable not found in subplan target
 lists.

BTW, that error was only raised when the EXCLUDED.* pseudo-alias was
actually used, which is why our previous testing missed it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Michael Paquier
On Tue, May 12, 2015 at 4:53 AM, Stephen Frost wrote:
 Yeah, it might not be bad to have tests for all the different lock types
 and make sure that the permissions match up.  I'd probably put those
 tests into 'permissions.sql' instead though.

You mean privileges.sql, right? I just wrote a patch with that. I'll
create a new thread with the patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Etsuro Fujita
On 2015/05/12 7:42, Tom Lane wrote:
 On further consideration ...

Thanks for the consideration!

 I don't much like the division of labor between LockForeignRow and
 FetchForeignRow.  In principle that would lead to not one but two
 extra remote accesses per locked row in SELECT FOR UPDATE, at least
 in the case that an EvalPlanQual recheck is required.  (I see that
 in your prototype patch for postgres_fdw you attempt to avoid that
 by saving a retrieved tuple in LockForeignRow and then returning it
 in FetchForeignRow, but that seems extremely ugly and bug-prone,
 since there is nothing in the API spec insisting that those calls match
 up one-to-one.)  The fact that locking and fetching a tuple are separate
 operations in the heapam API is a historical artifact that probably
 doesn't make sense to duplicate in the FDW API.

I understand your concern about the postgres_fdw patch.  However, I
think we should divide this into the two routines as the core patch
does, because I think that would allow an FDW author to implement these
routines so as to improve the efficiency for scenarios that seldom need
fetching, by not retrieving and saving locked tuples in LockForeignRow.

 Another problem is that we fail to detect whether an EvalPlanQual recheck
 is required during a SELECT FOR UPDATE on a remote table, which we
 certainly ought to do if the objective is to make postgres_fdw semantics
 match the local ones.

I think that is interesting in theory, but I'm not sure that is worth
much in practice.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improving regression tests to check LOCK TABLE and table permissions

2015-05-11 Thread Michael Paquier
Hi all,

As mentioned in this thread, it would be good to have regression tests
to test the interactions with permissions and LOCK TABLE:
http://www.postgresql.org/message-id/20150511195335.ge30...@tamriel.snowman.net
Attached is a patch achieving that.
Note that it does some checks on the modes SHARE ACCESS, ROW EXCLUSIVE
and ACCESS EXCLUSIVE to check all the code paths of
LockTableAclCheck@lockcmds.c.

I'll add an entry in the next CF to keep track of it.
Regards,
-- 
Michael
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 64a9330..c0cd9fa 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -1569,3 +1569,86 @@ DROP USER regressuser4;
 DROP USER regressuser5;
 DROP USER regressuser6;
 ERROR:  role regressuser6 does not exist
+-- permissions with LOCK TABLE
+CREATE USER locktable_user;
+CREATE TABLE lock_table (a int);
+-- LOCK TABLE and SELECT permission
+GRANT SELECT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+\c
+REVOKE SELECT ON lock_table FROM locktable_user;
+-- LOCK TABLE and INSERT permission
+GRANT INSERT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+\c
+REVOKE INSERT ON lock_table FROM locktable_user;
+-- LOCK TABLE and UPDATE permission
+GRANT UPDATE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should pass
+COMMIT;
+\c
+REVOKE UPDATE ON lock_table FROM locktable_user;
+-- LOCK TABLE and DELETE permission
+GRANT DELETE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should pass
+COMMIT;
+\c
+REVOKE DELETE ON lock_table FROM locktable_user;
+-- LOCK TABLE and TRUNCATE permission
+GRANT TRUNCATE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should pass
+COMMIT;
+\c
+REVOKE TRUNCATE ON lock_table FROM locktable_user;
+-- clean up
+DROP TABLE lock_table;
+DROP USER locktable_user;
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 22b54a2..c1837c4 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -975,3 +975,87 @@ DROP USER regressuser3;
 DROP USER regressuser4;
 DROP USER regressuser5;
 DROP USER regressuser6;
+
+
+-- permissions with LOCK TABLE
+CREATE USER locktable_user;
+CREATE TABLE lock_table (a int);
+
+-- LOCK TABLE and SELECT permission
+GRANT SELECT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should fail
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ROLLBACK;
+\c
+REVOKE SELECT ON lock_table FROM locktable_user;
+
+-- LOCK TABLE and INSERT permission
+GRANT INSERT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ROLLBACK;
+\c
+REVOKE INSERT ON lock_table FROM locktable_user;
+
+-- LOCK TABLE and UPDATE permission
+GRANT UPDATE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- 

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-11 Thread Amit Kapila
On Tue, May 12, 2015 at 5:50 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 05/11/2015 02:02 AM, Amit Kapila wrote:

 On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan and...@dunslane.net
mailto:and...@dunslane.net wrote:
 
 
 
  This generally looks good, but I have a couple of questions before I
commit it.
 
  First, why is the new option for the  BASE_BACKUP command of the
Streaming Replication protcol TAR? It seems rather misleading. Shouldn't
it be something like TABLESPACEMAP?
 

 The reason to keep new option's name as TAR was that tablespace_map
 was generated for that format type, but I agree with you that something
 like TABLESPACEMAP suits better, so I have changed it to
 TABLESPACE_MAP.  Putting '_' in name makes it somewhat consistent
 with other names and filename it generates with this new option.


  Second, these lines in xlog.c seem wrong:
 
  else if ((ch == '\n' || ch == '\r')  prev_ch == '\\')
  str[i-1] = '\n';
 
  It looks to me like we should be putting ch in the string, not
arbitrarily transforming \r into \n.
 

 You are right, I have changed it as per your suggestion.




 OK, I have cleaned this up a bit - I had already started so I didn't take
your latest patch but instead applied relevant changes to my changeset.
Here is my latest version.

 In testing I notice that now pg_baseback -F t leaves it completely up
to the user on all platforms to create the relevant links in pg_tblspc/. It
includes the tablespace_map file in base.tar, but that's really just
informational.


Sorry, but I am not able to follow your point.   User don't need to create
the relevant links, they will get created during first startup (during
recovery)
from the backup.  I have tested and it works both on Windows and Linux.

Refer below code of patch in xlog.c

+
+ /* read the tablespace_map file if present and create symlinks. */
+ if (read_tablespace_map(tablespaces))
+ {
..

 I think we need to add something to the pg_basebackup docs about that, at
the very least (and it will also need to be a release note item.)


Currently, below lines in patch suggest that this file is required for
recovery.
Do you expect more information to be added?

+These files are not merely for your information; their presence and
+contents are critical to the proper operation of the system's recovery
+process.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_upgrade: quote directory names in delete_old_cluster script

2015-05-11 Thread Bruce Momjian
On Wed, Apr 29, 2015 at 10:52:45PM -0400, Bruce Momjian wrote:
 On Mon, Feb 16, 2015 at 05:03:45PM -0500, Bruce Momjian wrote:
   All of our makefiles use single quotes, so effectively the only
   character we don't support in directory paths is the single quote itself.
  
  This seems to say that Windows batch files don't have any special
  meaning for single quotes:
  
  
  http://stackoverflow.com/questions/24173825/what-does-single-quote-do-in-windows-batch-files
  
  http://stackoverflow.com/questions/10737283/single-quotes-and-double-quotes-how-to-have-the-same-behaviour-in-unix-and-wind
  
  Again, is it worth doing something platform-specific?  I can certainly
  define a platform-specific macro for  or ' as and use that.  Good idea?
 
 I have developed the attached patch to use platform-specific quoting of
 path names.

Applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Josh Berkus
On 05/11/2015 09:54 AM, Robert Haas wrote:
 OK, I have made this change.  Barring further trouble reports, this
 completes the multixact work I plan to do for the next release.  Here
 is what is outstanding:
 
 1. We might want to introduce a GUC to control the point at which
 member offset utilization begins clamping
 autovacuum_multixact_freeze_max_age.  It doesn't seem wise to do
 anything about this before pushing a minor release out.  It's not
 entirely trivial, and it may be helpful to learn more about how the
 changes already made work out in practice before proceeding.  Also, we
 might not back-patch this anyway.

-1 on back-patching a new GUC.  People don't know what to do with the
existing multixact GUCs, and without an age(multixact) function
built-in, any adjustments a user tries to make are likely to do more
harm than good.

In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
this instead of adding a new GUC?  We already have a bunch of freezing
GUCs which fewer than 1% of our user base has any idea how to set.

 2. The recent changes adjust things - for good reason - so that the
 safe threshold for multixact member creation is advanced only at
 checkpoint time.  This means it's theoretically possible to have a
 situation where autovacuum has done all it can, but because no
 checkpoint has happened yet, the user can't create any more
 multixacts.  Thanks to some good work by Thomas, autovacuum will
 realize this and avoid spinning uselessly over every table in the
 system, which is good, but you're still stuck with errors until the
 next checkpoint.  Essentially, we're hoping that autovacuum will clean
 things up far enough in advance of hitting the threshold where we have
 to throw an error that a checkpoint will intervene before the error
 starts happening.  It's possible we could improve this further, but I
 think it would be unwise to mess with it right now.  It may be that
 there is no real-world problem here.

Given that our longest possible checkpoint timeout is an hour, is it
even hypotethically possible that we would hit a limit in that time?
How many mxact members are we talking about?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Alvaro Herrera
Josh Berkus wrote:

 In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
 this instead of adding a new GUC?  We already have a bunch of freezing
 GUCs which fewer than 1% of our user base has any idea how to set.

If you have development resources to pour onto 9.5, I think it would be
better spent changing multixact usage tracking so that oldestOffset is
included in pg_control; also make pg_multixact truncation be WAL-logged.
With those changes, the need for a lot of pretty complicated code would
go away.  The fact that truncation is done by both vacuum and checkpoint
causes a lot of the mess we were in (and from which Robert and Thomas
took us --- thanks guys!).  Such a change is the first step towards
auto-tuning, I think.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Alvaro Herrera
Robert Haas wrote:

 OK, I have made this change.  Barring further trouble reports, this
 completes the multixact work I plan to do for the next release.

Many thanks for all the effort here -- much appreciated.

 2. The recent changes adjust things - for good reason - so that the
 safe threshold for multixact member creation is advanced only at
 checkpoint time.  This means it's theoretically possible to have a
 situation where autovacuum has done all it can, but because no
 checkpoint has happened yet, the user can't create any more
 multixacts.  Thanks to some good work by Thomas, autovacuum will
 realize this and avoid spinning uselessly over every table in the
 system, which is good, but you're still stuck with errors until the
 next checkpoint.  Essentially, we're hoping that autovacuum will clean
 things up far enough in advance of hitting the threshold where we have
 to throw an error that a checkpoint will intervene before the error
 starts happening.  It's possible we could improve this further, but I
 think it would be unwise to mess with it right now.  It may be that
 there is no real-world problem here.

See my response to Josh.  I think much of the current rube-goldbergian
design is due to the fact that pg_control cannot be changed in back
branches.  Going forward, I think a better plan is to include more info
in pg_control, WAL-log more operations, remove checkpoint from the loop
and have everything happen at vacuum time.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use outerPlanState() consistently in executor code

2015-05-11 Thread Qingqing Zhou
On Mon, May 4, 2015 at 1:23 PM, Robert Haas robertmh...@gmail.com wrote:
 I fixed several whitespace errors, reverted the permissions changes
 you included

Sorry about the permission changes - didn't notice that bite.

Thanks,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

2015-05-11 Thread Andres Freund
Hi,

On 2015-05-10 16:01:53 -0400, Tom Lane wrote:
 The cause of the problem seems to be that the UPDATE performs a HOT update
 of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT
 updated by (0,3).  When unique_key_recheck() is invoked for (0,2), it
 believes, correctly, that it has to perform the recheck anyway ... but it
 tells check_exclusion_constraint that the check is being performed for
 (0,2).  So the index search inside check_exclusion_constraint finds the
 live tuple at (0,3) and thinks that is a conflict.

Heh, it's curious that this wasn't found up until now. I also wonder if
this might be related to the spurious violations Peter G. has been
observing...

 The attached patch seems to fix the problem without breaking any existing
 regression tests, but I wonder if anyone can see a hole in it.

Looks good to me.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 10:11 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote:
 Given your concerns, and the need to get a fix for this out the door
 quickly, what I'm inclined to do for the present is go bump the
 threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
 changing anything else.

 +1

 Your analysis shows that this is more in line
 with the existing policy for multixact IDs than what I did, and it
 will reduce the threat of frequent wraparound scans.  Now, it will
 also increase the chances of somebody hitting the wall before
 autovacuum can bail them out.  But maybe not that much.  If we need
 75% of the multixact member space to complete one cycle of
 anti-wraparound vacuums, we're actually very close to the point where
 the system just cannot work.  If that's one big table, we're done.

 Agreed.

OK, I have made this change.  Barring further trouble reports, this
completes the multixact work I plan to do for the next release.  Here
is what is outstanding:

1. We might want to introduce a GUC to control the point at which
member offset utilization begins clamping
autovacuum_multixact_freeze_max_age.  It doesn't seem wise to do
anything about this before pushing a minor release out.  It's not
entirely trivial, and it may be helpful to learn more about how the
changes already made work out in practice before proceeding.  Also, we
might not back-patch this anyway.

2. The recent changes adjust things - for good reason - so that the
safe threshold for multixact member creation is advanced only at
checkpoint time.  This means it's theoretically possible to have a
situation where autovacuum has done all it can, but because no
checkpoint has happened yet, the user can't create any more
multixacts.  Thanks to some good work by Thomas, autovacuum will
realize this and avoid spinning uselessly over every table in the
system, which is good, but you're still stuck with errors until the
next checkpoint.  Essentially, we're hoping that autovacuum will clean
things up far enough in advance of hitting the threshold where we have
to throw an error that a checkpoint will intervene before the error
starts happening.  It's possible we could improve this further, but I
think it would be unwise to mess with it right now.  It may be that
there is no real-world problem here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-11 Thread Andres Freund
Hi,

On 2015-05-11 16:57:08 +0530, Abhijit Menon-Sen wrote:
 I've attached an updated patch for pgstatbloat, as well as a patch to
 replace two uses of BuildTupleFromCStrings() elsewhere in pgstattuple.

TBH, I'd rather not touch unrelated things right now. We're pretty badly
behind...

 I've made the changes you mentioned in your earlier mail, except that I
 have not changed the name pending further suggestions about what would
 be the best name.

I don't really care how it's named, as long as it makes clear that it's
not an exact measurement.

   I haven't checked, but I'm not sure that it's safe/meaningful to
   call PageGetHeapFreeSpace() on a new page.
  
  OK, I'll check and fix if necessary.
 
 You're right, PageGetHeapFreeSpace() isn't safe on a new page. I've
 added a guard to that call in the attached patch, but I'm not sure
 that's the right thing to do.

 Should I copy the orphaned new-page handling from lazy_scan_heap? What
 about for empty pages? Neither feels like a very good thing to do, but
 then neither does skipping the new page silently. Should I count the
 space it would have free if it were initialised, but leave the page
 alone for VACUUM to deal with? Or just leave it as it is?

I think there's absolutely no need for pgstattuple to do anything
here. It's not even desirable.

 + LockBuffer(buf, BUFFER_LOCK_SHARE);
 +
 + page = BufferGetPage(buf);
 +
 + if (!PageIsNew(page))
 + stat-free_space += PageGetHeapFreeSpace(page);
 +
 + if (PageIsNew(page) || PageIsEmpty(page))
 + {
 + UnlockReleaseBuffer(buf);
 + continue;
 + }

Shouldn't new pages be counted as being fully free or at least bloat?
Just disregarding them seems wrong.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-11 Thread Heikki Linnakangas

On 05/08/2015 04:21 PM, Heikki Linnakangas wrote:

On 04/22/2015 10:07 AM, Michael Paquier wrote:

On Wed, Apr 22, 2015 at 3:38 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

I feel that the best approach is to archive the last, partial segment, but
with the .partial suffix. I don't see any plausible real-wold setup where
the current behavior would be better. I don't really see much need to
archive the partial segment at all, but there's also no harm in doing it, as
long as it's clearly marked with the .partial suffix.


Well, as long as it is clearly archived at promotion, even with a
suffix, I guess that I am fine... This will need some tweaking on
restore_command for existing applications, but as long as it is
clearly documented I am fine. Shouldn't this be a different patch
though?


Ok, I came up with the attached, which adds the .partial suffix to the
partial WAL segment that's archived after promotion. I couldn't find any
natural place to talk about it in the docs, though. I think after the
docs changes from the main patch are applied, it would be natural to
mention this in the Continuous archiving in standby, so I think I'll
add that later.

Barring objections, I'll push this later tonight.


Applied that part.


Now that we got this last-partial-segment problem out of the way, I'm
going to try fixing the problem you (Michael) pointed out about relying
on pgstat file. Meanwhile, I'd love to get more feedback on the rest of
the patch, and the documentation.


And here is a new version of the patch. I kept the approach of using 
pgstat, but it now only polls pgstat every 10 seconds, and doesn't block 
to wait for updated stats.


- Heikki

From 08ca3cc7b9824503b793e149247ea9c6d3a7f323 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 16 Apr 2015 14:40:24 +0300
Subject: [PATCH v3 1/1] Make WAL archival behave more sensibly in standby
 mode.

This adds two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has not
yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is taken
to be separate from the primary's, and the standby independently archives
all files it receives from the primary.

This adds a new archival status message to the protocol. WAL sender sends
one automatically, when the last archived WAL file, as reported in pgstat,
changes. (Or rather, some time after it changes. We're not in a hurry, the
standby doesn't need an up-to-the-second status)

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  |  12 +-
 doc/src/sgml/high-availability.sgml   |  48 +++
 doc/src/sgml/protocol.sgml|  31 +
 src/backend/access/transam/xlog.c |  29 +++-
 src/backend/postmaster/pgstat.c   |  44 ++
 src/backend/postmaster/postmaster.c   |  37 +++--
 src/backend/replication/walreceiver.c | 172 +++-
 src/backend/replication/walsender.c   | 186 ++
 src/backend/utils/misc/guc.c  |  21 +--
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/access/xlog.h |  14 +-
 src/include/pgstat.h  |   2 +
 12 files changed, 513 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0d8624a..ac845e0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2521,7 +2521,7 @@ include_dir 'conf.d'
 
 variablelist
  varlistentry id=guc-archive-mode xreflabel=archive_mode
-  termvarnamearchive_mode/varname (typeboolean/type)
+  termvarnamearchive_mode/varname (typeenum/type)
   indexterm
primaryvarnamearchive_mode/ configuration parameter/primary
   /indexterm
@@ -2530,7 +2530,15 @@ include_dir 'conf.d'
para
 When varnamearchive_mode/ is enabled, completed WAL segments
 are sent to archive storage by setting
-xref linkend=guc-archive-command.
+xref linkend=guc-archive-command. In addition to literaloff/,
+to disable, there are three modes: literalon/, literalshared/,
+and literalalways/. During normal operation, there is no
+difference between the three modes, but in archive recovery or
+standby mode, it indicates whether the WAL archive is shared between
+the primary and the standby server or not. See
+xref linkend=continuous-archiving-in-standby for details.
+   /para  
+   para
 varnamearchive_mode/ and varnamearchive_command/ are
 separate variables so that varnamearchive_command/ can be
 changed without leaving archiving mode.
diff --git