date:20150604

On Mon, Jun 1, 2015 at 10:44 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Here is a doc patch to add materialized views and foreign tables to
 database objects that pg_table_is_visible() can be used with.

Good catch, as usual.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

On Thu, Jun 4, 2015 at 12:10 PM, Peter Geoghegan p...@heroku.com wrote:
 jsonb_delete() should certainly be able to traverse objects, but it's
 much less clear that it should be able to *traverse* arrays (affecting
 arrays is a different story, though). That's why I proposed not
 supporting traversing arrays with it or with jsonb_set(). This would
 also removes the questionable second shadow type system within the
 text[] rhs operand too, which seems like a good thing.

Here is a further example of why I find this new shadow type system
for rhs text[] operands to be pretty questionable:

postgres=# select jsonb_set('[1, 2, 3, 4, 5,6,7,8,9,10,11,12]',
'{5e10}'::text[], 'Input unsanitized') ;
 jsonb_set
---
 [1, 2, 3, 4, 5, Input unsanitized, 7, 8, 9, 10, 11, 12]
(1 row)


BTW, there is a bug here -- strtol() needs additional defenses [1]
(before casting to int):

postgres=# select jsonb_set('[1, 2, 3, 4,
5,6,7,8,9,10,11,12,13,14,15,16,17,18]',
'{9223372036854775806}'::text[], 'Input unsanitized', false) ;
jsonb_set
--
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, Input
unsanitized, 18]
(1 row)

[1] 
https://www.securecoding.cert.org/confluence/display/cplusplus/INT06-CPP.+Use+strtol()+or+a+related+function+to+convert+a+string+token+to+an+integer
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw

2015-06-04 Thread Kouhei Kaigai

 On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  Yesterday, JPUG held an unconference event at Tokyo, and
  Hanada-san had a talk about join-pushdown feature of
  postgres_fdw.
  At this talk, someone proposed an interesting idea to
  make join pushdown more aggressive/effective.
  Let me share it with pgsql-hackers.
 
  He said, we may have a workload to join a large foreign-
  scan and a small local-scan regardless of the plan type.
 
  For example:
joinrel (expected nrows = 5)
  + outerrel ForeignScan (expected nrows = 100)
  + innerrel LocalScan (expected nrows = 5)
 
  In this case, we may be able to run the entire joinrel
  on the remote side then fetch just 5 rows, if fdw-driver
  construct VALUES() clause according to the contents of
  LocalScan then makes an entire join query with another
  one kept in ForeignScan.
 
  If above ForeignScan have the following remote query,
SELECT a, b, c FROM t0 WHERE d  100
  we may be able to construct the query below to run remote
  join with local (small) relation.
 
SELECT a, b, c, x, y FROM
  (SELECT a, b, c FROM t0 WHERE d  100) AS ft
  JOIN
  (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'),
  (4,'ddd'), (5,'eee')) AS lt (x, y)
  ON ft.a = lt.x
 
  The VALUES clauses can be mechanically constructed according
  to the result set of LocalScan, and it is not difficult to
  make such a remote query on top of the existing ForeignScan.
  In the result, it will reduce amount of network traffic and
  CPU cycles to form/deform tuples dramatically.
 
  I don't intend to implement this idea urgently (of course,
  join pushdown for both ForeignScan case has higher priority),
  however, it makes sense to keep the future direction in mind.
 
  Also, as an aside, even though Hanada-san mentioned ForeignScan
  does not need an infrastructure to initialize child path nodes,
  this idea may require ForeignScan to have local child path.
 
 Neat idea.  This ties into something I've thought about and mentioned
 before: what if the innerrel is local, but there's a replicated copy
 on the remote server?  Perhaps both cases are worth thinking about at
 some point.

I think, here is both merit and de-merit for each. It implies either of
them never always-better-strategy.

* Push out local table as VALUES(...) clause
Good: No restriction to functions/operators in the local scan or
  underlying plan node.
Bad:  High cost for data format modification (HeapTupleSlot =
  VALUES(...) clause in text), and 2-way data transfer.

* Remote join between foreign table and replicated table
Good: Data already exists on remote side, no need to kick out
  contents of local relation (and no need to consume CPU
  cycle to make VALUES() clause).
Bad:  Functions/operators are restricted as existing postgres_fdw
  is doing. Only immutable and built-in ones are available to
  run on the remote side.

BTW, do we need either of tables being foreign table, if entire database
is (synchronously) replicated?
Also, loopback server may be a candidate even if not replicated (although
it may be an entrance of deadlock heaven).

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation



On 06/04/2015 03:10 PM, Peter Geoghegan wrote:

On Thu, Jun 4, 2015 at 6:43 AM, Andrew Dunstan and...@dunslane.net wrote:

I've noticed some more issues with the jsonb documentation, and the
new jsonb stuff generally. I didn't set out to give Andrew feedback on
the semantics weeks after feature freeze, but unfortunately this feels
like another discussion that we need to have now rather than later.

Yes, I wish you had raised these issues months ago when this was published.
That's the way the process is supposed to work.

I also wish that I managed to do that. As you know, I was working
overtime to get UPSERT into 9.5 during that period. Finding time to
review things is always difficult, and I which I could do more.




That's happened to me in the past. My view has generally been that in 
that case I have missed my chance, and I need to live with what others 
have done. That seems to me preferable to tearing up any pretense we 
might have to be following a defined development process.


I should point out that I have already gone out of my way to accommodate 
concerns you expressed extremely late about this set of features, and I 
have lately indicated another area where we can adjust it to meet your 
objections. Re-litigating this wholesale seems quite a different kettle 
of fish, however.


Just in case it's not clear: I am not at all happy.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

On Thu, Jun 4, 2015 at 5:31 PM, Andrew Dunstan and...@dunslane.net wrote:
 Just in case it's not clear: I am not at all happy.

I've offered to help you with several of the issue I raised; I had
intended to offer more help.

The issues I raise seem pretty substantive to me. I'm trying to make
sure that we don't end up with something bad that we need to live with
indefinitely. I have offered you something not far off an everybody
wins proposal (i.e. no real loss of functionality), and that was my
first proposal.

I don't know what more I could do for you.  I *am* trying to help.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

On Thu, Jun 4, 2015 at 8:43 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 06/04/2015 12:44 AM, Amit Kapila wrote:


 Given that the function raises an error on failure, I think it
 will otherwise be OK as is.


 Please find an updated patch attached with this mail.




 No attachment.

 cheers


I have sent it in the next mail, but anyway sending it again
in case you have missed it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


remove_only_symlinks_during_recovery_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

On Fri, Jun 5, 2015 at 9:57 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 06/04/2015 11:35 PM, Amit Kapila wrote:


 Theoretically, I don't see much problem by changing the checks
 way you have done in patch, but it becomes different than what
 we have in destroy_tablespace_directories() and it is slightly
 changing the way check was originally done in
 create_tablespace_directories(), basically original check will try
 unlink if lstat returns non-zero return code. If you want to proceed
 with the changed checks as in v3, then may be we can modify
 comments on top of function remove_tablespace_symlink() which
 indicates that it works like destroy_tablespace_directories().



 The difference is that here we're getting the list from a base backup and
 it seems to me the risk of having a file we don't really want to unlink is
 significantly greater.


Okay, I think I can understand why you want to be cautious for
having a different check for this path, but in that case there is a
chance that recovery might fail when it will try to create a symlink
for that file.  Shouldn't we then try to call this new function only
when we are trying to restore from tablespace_map file and also
is there a need of ifdef S_ISLINK in remove_tablespace_link?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw

On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Neat idea.  This ties into something I've thought about and mentioned
 before: what if the innerrel is local, but there's a replicated copy
 on the remote server?  Perhaps both cases are worth thinking about at
 some point.

 I think, here is both merit and de-merit for each. It implies either of
 them never always-better-strategy.

 * Push out local table as VALUES(...) clause
 Good: No restriction to functions/operators in the local scan or
   underlying plan node.
 Bad:  High cost for data format modification (HeapTupleSlot =
   VALUES(...) clause in text), and 2-way data transfer.

 * Remote join between foreign table and replicated table
 Good: Data already exists on remote side, no need to kick out
   contents of local relation (and no need to consume CPU
   cycle to make VALUES() clause).
 Bad:  Functions/operators are restricted as existing postgres_fdw
   is doing. Only immutable and built-in ones are available to
   run on the remote side.

Sure.

 BTW, do we need either of tables being foreign table, if entire database
 is (synchronously) replicated?
 Also, loopback server may be a candidate even if not replicated (although
 it may be an entrance of deadlock heaven).

I suppose it's possible that this sort of thing could work out to a
win, but I think it's much less likely to work out than pushing down a
foreign/local join using either the VALUES trick or a replicated copy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file



On 06/04/2015 09:23 AM, Amit Kapila wrote:




Okay, as we both seem to agree that it can be mostly used in
tablespace symlinks context, so I have changed the name to
remove_tablespace_symlink() and moved the function to
tablespace.c.  S_ISLINK check is used for non-windows code,
so not sure adding it here makes any real difference now that
we have made it specific to tablespace and we might need to
write small port specific code if we want to add S_ISLINK check.




Where is it used? I can't see it called at all in tablespace.c or xlog.c.

Perhaps I'm being overcautious, but here's more or less what I had in mind.

cheers

andrew
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 666fa37..8d75d0c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include catalog/catversion.h
 #include catalog/pg_control.h
 #include catalog/pg_database.h
+#include commands/tablespace.h
 #include miscadmin.h
 #include pgstat.h
 #include postmaster/bgwriter.h
@@ -6094,7 +6095,6 @@ StartupXLOG(void)
 		if (read_tablespace_map(tablespaces))
 		{
 			ListCell   *lc;
-			struct stat st;
 
 			foreach(lc, tablespaces)
 			{
@@ -6105,27 +6105,9 @@ StartupXLOG(void)
 
 /*
  * Remove the existing symlink if any and Create the symlink
- * under PGDATA.  We need to use rmtree instead of rmdir as
- * the link location might contain directories or files
- * corresponding to the actual path. Some tar utilities do
- * things that way while extracting symlinks.
+ * under PGDATA.
  */
-if (lstat(linkloc, st) == 0  S_ISDIR(st.st_mode))
-{
-	if (!rmtree(linkloc, true))
-		ereport(ERROR,
-(errcode_for_file_access(),
-			  errmsg(could not remove directory \%s\: %m,
-	 linkloc)));
-}
-else
-{
-	if (unlink(linkloc)  0  errno != ENOENT)
-		ereport(ERROR,
-(errcode_for_file_access(),
-		  errmsg(could not remove symbolic link \%s\: %m,
- linkloc)));
-}
+remove_tablespace_symlink(linkloc);
 
 if (symlink(ti-path, linkloc)  0)
 	ereport(ERROR,
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 4ec1aff..e8acf27 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -627,31 +627,9 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid)
 
 	/*
 	 * In recovery, remove old symlink, in case it points to the wrong place.
-	 *
-	 * On Windows, junction points act like directories so we must be able to
-	 * apply rmdir; in general it seems best to make this code work like the
-	 * symlink removal code in destroy_tablespace_directories, except that
-	 * failure to remove is always an ERROR.
 	 */
 	if (InRecovery)
-	{
-		if (lstat(linkloc, st) == 0  S_ISDIR(st.st_mode))
-		{
-			if (rmdir(linkloc)  0)
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg(could not remove directory \%s\: %m,
-linkloc)));
-		}
-		else
-		{
-			if (unlink(linkloc)  0  errno != ENOENT)
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg(could not remove symbolic link \%s\: %m,
-linkloc)));
-		}
-	}
+		remove_tablespace_symlink(linkloc);
 
 	/*
 	 * Create the symlink under PGDATA
@@ -848,6 +826,61 @@ directory_is_empty(const char *path)
 	return true;
 }
 
+/*
+ *	remove_tablespace_symlink
+ *
+ * This function removes symlinks in pg_tblspc.  On Windows, junction points
+ * act like directories so we must be able to apply rmdir.  This function
+ * works like the symlink removal code in destroy_tablespace_directories,
+ * except that failure to remove is always an ERROR.
+ */
+void
+remove_tablespace_symlink(const char *linkloc)
+{
+	struct stat st;
+
+	if (lstat(linkloc, st) != 0)
+	{
+		if (errno == ENOENT)
+			return;
+		else
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg(could not stat \%s\: %m,
+			linkloc)));
+	}
+
+	if (S_ISDIR(st.st_mode))
+	{
+		/*
+		 * This will fail if the directory isn't empty, but not
+		 * if it's a junction point.
+		 */
+		if (rmdir(linkloc)  0)
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg(could not remove directory \%s\: %m,
+			linkloc)));
+	}
+#ifdef S_ISLNK
+	else if (S_ISLNK(st.st_mode))
+	{
+		if (unlink(linkloc)  0  errno != ENOENT)
+			ereport(ERROR,
+	(errcode_for_file_access(),
+		errmsg(could not remove symbolic link \%s\: %m,
+			linkloc)));
+	}
+#endif
+	else
+	{
+		/* Refuse to remove anything that's not a directory or symlink */
+		ereport(ERROR,
+(ERRCODE_SYSTEM_ERROR,
+ errmsg(Not a  directory or symbolic link: \%s\,
+		linkloc)));
+	}
+}
 
 /*
  * Rename a tablespace
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index 86b0477..6b928a5 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -56,6

Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw

2015-06-04 Thread Kouhei Kaigai

 On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  Neat idea.  This ties into something I've thought about and mentioned
  before: what if the innerrel is local, but there's a replicated copy
  on the remote server?  Perhaps both cases are worth thinking about at
  some point.
 
  I think, here is both merit and de-merit for each. It implies either of
  them never always-better-strategy.
 
  * Push out local table as VALUES(...) clause
  Good: No restriction to functions/operators in the local scan or
underlying plan node.
  Bad:  High cost for data format modification (HeapTupleSlot =
VALUES(...) clause in text), and 2-way data transfer.
 
  * Remote join between foreign table and replicated table
  Good: Data already exists on remote side, no need to kick out
contents of local relation (and no need to consume CPU
cycle to make VALUES() clause).
  Bad:  Functions/operators are restricted as existing postgres_fdw
is doing. Only immutable and built-in ones are available to
run on the remote side.
 
 Sure.
 
  BTW, do we need either of tables being foreign table, if entire database
  is (synchronously) replicated?
  Also, loopback server may be a candidate even if not replicated (although
  it may be an entrance of deadlock heaven).
 
 I suppose it's possible that this sort of thing could work out to a
 win, but I think it's much less likely to work out than pushing down a
 foreign/local join using either the VALUES trick or a replicated copy.

Hmm, it might be too aggressive approach.
If we would try to implement, postgres_fdw will need to add so many junk
paths (expensive than usual local ones) to consider remote join between
replicated local tables.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

On Fri, Jun 5, 2015 at 7:29 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 06/04/2015 09:23 AM, Amit Kapila wrote:




 Okay, as we both seem to agree that it can be mostly used in
 tablespace symlinks context, so I have changed the name to
 remove_tablespace_symlink() and moved the function to
 tablespace.c.  S_ISLINK check is used for non-windows code,
 so not sure adding it here makes any real difference now that
 we have made it specific to tablespace and we might need to
 write small port specific code if we want to add S_ISLINK check.



 Where is it used? I can't see it called at all in tablespace.c or xlog.c.


Below files use S_ISLINK check
basebackup.c, fd.c, initdb.c, copy_fetch.c, pg_rewind/filemap.c

and all these places use it with #ifndef WIN32



 Perhaps I'm being overcautious, but here's more or less what I had in mind.


What is making you feel nervous, if it is that we should not
use unlink call without checking S_ISLINK, then we are
already doing the same at many other places (rewriteheap.c,
slru.c, timeline.c, xlog.c).  It is already defined for Windows
as pgunlink.

Theoretically, I don't see much problem by changing the checks
way you have done in patch, but it becomes different than what
we have in destroy_tablespace_directories() and it is slightly
changing the way check was originally done in
create_tablespace_directories(), basically original check will try
unlink if lstat returns non-zero return code.  If you want to proceed
with the changed checks as in v3, then may be we can modify
comments on top of function remove_tablespace_symlink() which
indicates that it works like destroy_tablespace_directories().



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-04 Thread Michael Paquier

On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer cr...@2ndquadrant.com wrote:


 On 4 June 2015 at 22:43, Stephen Frost sfr...@snowman.net wrote:

 Josh,

 * Josh Berkus (j...@agliodbs.com) wrote:
  I would argue that if we delay 9.5 in order to do a 100% manual review
  of code, without adding any new automated tests or other non-manual
  tools for improving stability, then it's a waste of time; we might as
  well just release the beta, and our users will find more issues than we
  will.  I am concerned that if we declare a cleanup period, especially in
  the middle of the summer, all that will happen is that the project will
  go to sleep for an extra three months.

 This is the exact same concern that I have.  A delay just to have a
 delay is not useful.  I completely agree that we need more automated
 testing, etc, though getting all of that set up and running could be
 done at any time too- there's no reason to wait, nor do I believe
 delaying 9.5 would make such automated testing appear.


 In terms of specific testing improvements, things I think we need to have
 covered and runnable on the buildfarm are:

 * pg_dump and pg_restore testing (because it's scary we don't do this)

We do test it in some way with pg_upgrade using set of objects that
are not removed by the regression test suite. Extension dumps are
uncovered yet though.

 * WAL archiving based warm standby testing with promotion
 * Two node streaming replication with promotion, both with a slot and with
 archive fallback
 * Three node cascading streaming replication with middle node promotion then
 tail end node promotion
 * Logical decoding streaming testing, comparing to expected decoded output
 * hard-kill the postmaster, start up from crashed datadir
 * pg_basebackup + start up from backup
 * pg_start_backup, rsync, pg_stop_backup, start up in hot standby
 * Tests of crash recovery during various DDL operations

Well, steps in this direction are the point of this patch, the
replication test suite:
https://commitfest.postgresql.org/5/197/
And this one, addition of Windows support for TAP tests:
https://commitfest.postgresql.org/5/207/

 * DDL deparse test coverage for all operations

What do you have in mind except what is already in objectaddress.sql
and src/test/modules/test_dll_deparse/?

 * disk exhaustion tests both for pg_xlog and for the main datadir, showing
 we can recover OK when disk is filled then space is freed

This may be tricky. How would you emulate that?

 Is pg_tap a reasonable starting point for this sort of testing?

IMO, using the TAP machinery would be a good base for that. What lacks
is a basic set of perl routines that one can easily use to set of test
scenarios.

 How would a test that would've caught the multixact issues look?

I have not followed closely those discussions, not sure about that.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Thomas Munro

On Fri, Jun 5, 2015 at 11:47 AM, Thomas Munro
thomas.mu...@enterprisedb.com wrote:
On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote:
Here's a new version with some more fixes and improvements:

- SetOffsetVacuumLimit was failing to set MultiXactState-oldestOffset
when the oldest offset became known if the now-known value happened to
be zero. Fixed.

- SetOffsetVacuumLimit now logs useful information at the DEBUG1
level, so that you can see that it's doing what it's supposed to.

- TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the
offsetStopLimit even if it can't truncate anything. This seems
useless, but it's not, because it may be that the last checkpoint
advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real
value, and now we can actually set offsetStopLimit properly.

- TruncateMultiXact no longer calls find_multixact_start when there
are no remaining multixacts. This is actually a completely separate
bug that goes all the way back to 9.3.0 and can potentially cause
TruncateMultiXact to remove every file in pg_multixact/offsets.
Restarting the cluster becomes impossible because TrimMultiXact barfs.

- TruncateMultiXact now logs a message if the oldest multixact does
not precede the earliest one on disk and is not equal to the next
multixact and yet does not exist. The value of the log message is
that it discovered the bug mentioned in the previous line, so I think
it's earning its keep.

With this version, I'm able to see that when you start up a
9.3.latest+this patch with a cluster that has a bogus value of 1 in
relminmxid, datminmxid, and the control file, autovacuum vacuums
everything in sight, all the values get set back to the right thing,
and the next checkpoint enables the member-wraparound guards. This
works with both autovacuum=on and autovacuum=off; the emergency
mechanism kicks in as intended. We'll want to warn people with big
databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may
want to pre-vacuum those tables before upgrading to avoid a vacuum
storm. But generally I'm pretty happy with this: forcing those values
to get fixed so that we can guard against member-space wraparound
seems like the right thing to do.

So, to summarize, this patch does the following:

- Fixes the failure-to-start problems introduced in 9.4.2 in
complicated pg_upgrade scenarios.
- Prevents the new calls to find_multixact_start we added in 9.4.2
from happening during recovery, where they can only create failure
scenarios. The call in TruncateMultiXact that has been there all
along is not eliminated, but now handles failure more gracefully.
- Fixes possible incorrect removal of every single
pg_multixact/offsets file when no multixacts exist; one file should be
kept.
- Forces aggressive autovacuuming when the control file's
oldestMultiXid doesn't point to a valid MultiXact and enables member
wraparound at the next checkpoint following the correction of that
problem.

With this patch, when I run the script
checkpoint-segment-boundary.sh from
http://www.postgresql.org/message-id/CAEepm=1_KbHGbmPVmkUGE5qTP+B4efoCJYS0unGo-Mc5NV=u...@mail.gmail.com
I see the following during shutdown checkpoint:

LOG: could not truncate directory pg_multixact/offsets: apparent wraparound

That message comes from SimpleLruTruncate.

Suggested patch attached.

--
Thomas Munro
http://www.enterprisedb.com

fence-post.patch
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file



On 06/04/2015 11:35 PM, Amit Kapila wrote:
On Fri, Jun 5, 2015 at 7:29 AM, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net wrote:



On 06/04/2015 09:23 AM, Amit Kapila wrote:




Okay, as we both seem to agree that it can be mostly used in
tablespace symlinks context, so I have changed the name to
remove_tablespace_symlink() and moved the function to
tablespace.c.  S_ISLINK check is used for non-windows code,
so not sure adding it here makes any real difference now that
we have made it specific to tablespace and we might need to
write small port specific code if we want to add S_ISLINK
check.



Where is it used? I can't see it called at all in tablespace.c or
xlog.c.


Below files use S_ISLINK check
basebackup.c, fd.c, initdb.c, copy_fetch.c, pg_rewind/filemap.c

and all these places use it with #ifndef WIN32

Perhaps I'm being overcautious, but here's more or less what I had
in mind.


What is making you feel nervous, if it is that we should not
use unlink call without checking S_ISLINK, then we are
already doing the same at many other places (rewriteheap.c,
slru.c, timeline.c, xlog.c).  It is already defined for Windows
as pgunlink.

Theoretically, I don't see much problem by changing the checks
way you have done in patch, but it becomes different than what
we have in destroy_tablespace_directories() and it is slightly
changing the way check was originally done in
create_tablespace_directories(), basically original check will try
unlink if lstat returns non-zero return code. If you want to proceed
with the changed checks as in v3, then may be we can modify
comments on top of function remove_tablespace_symlink() which
indicates that it works like destroy_tablespace_directories().




The difference is that here we're getting the list from a base backup 
and it seems to me the risk of having a file we don't really want to 
unlink is significantly greater.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Incorrect order of database-locking operations in InitPostgres()

I've been chasing the intermittent cache lookup failed for access method
403 failure at session startup that's been seen lately in the buildfarm,
for instance here:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotldt=2015-06-04%2019%3A22%3A46

(Axolotl has shown this 3 times in the last 90 days, not sure if any
others have seen it.)  I hypothesized that this was triggered by the
VACUUM FULL pg_am in the concurrently running vacuum.sql regression
test, so I started running the regression tests in parallel with a
shell script doing

while sleep 0.1; do psql -c 'vacuum full pg_am' regression; done

and sure enough, I can reproduce it once in awhile.  I've not tracked
it down yet, but I have gotten pretty annoyed by an unrelated problem:
every so often the DROP DATABASE regression at the start of the
regression test sequence will hang up for 5 seconds and then fail,
complaining there's another session already in the DB.  Meanwhile, the
current psql -c call also hangs up for 5 seconds.  What is happening
is a sort of deadlock between InitPostgres, which does this:

/* Now we can mark our PGPROC entry with the database ID */
/* (We assume this is an atomic store so no lock is needed) */
MyProc-databaseId = MyDatabaseId;

and then this:

if (!bootstrap)
LockSharedObject(DatabaseRelationId, MyDatabaseId, 0,
 RowExclusiveLock);

and dropdb/CountOtherDBBackends, which first gets the database lock and
then looks to see if any other sessions are advertising the target
database OID in their PGPROC.  If such sessions exist and don't go away
within 5 seconds then CountOtherDBBackends fails.  So, if an incoming
connection sets MyProc-databaseId between the time that dropdb acquires
the database lock and the time that CountOtherDBBackends looks, we have
an effective deadlock because that incoming session will then block on
the database lock.

AFAICS, this is easy to fix by swapping the order of the above-mentioned
operations, ie get the lock before setting MyProc-databaseId, as per
attached patch.  Processes examining the PGPROC array must acquire the
database lock before looking for sessions in the target DB, so they'll
still see any conflicting session.  On the other hand, the incoming
session already has to recheck that the target DB is still there once
it's got the lock, so there's no risk of problems on that side either.

Unless somebody can see a problem with this, I propose to apply and
back-patch this change.  The behavior is infrequent, but it's pretty
nasty when it does occur, since all incoming connections for the
target database are hung up for 5 seconds.  The case of DROP DATABASE
might not be a big deal, but this would also for example cause unwanted
failures in CREATE DATABASE if there were short-term connections to the
template database.

regards, tom lane

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index aa67f75..a5d88c1 100644
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*** InitPostgres(const char *in_dbname, Oid 
*** 848,865 
  			strcpy(out_dbname, dbname);
  	}
  
- 	/* Now we can mark our PGPROC entry with the database ID */
- 	/* (We assume this is an atomic store so no lock is needed) */
- 	MyProc-databaseId = MyDatabaseId;
- 
- 	/*
- 	 * We established a catalog snapshot while reading pg_authid and/or
- 	 * pg_database; but until we have set up MyDatabaseId, we won't react to
- 	 * incoming sinval messages for unshared catalogs, so we won't realize it
- 	 * if the snapshot has been invalidated.  Assume it's no good anymore.
- 	 */
- 	InvalidateCatalogSnapshot();
- 
  	/*
  	 * Now, take a writer's lock on the database we are trying to connect to.
  	 * If there is a concurrently running DROP DATABASE on that database, this
--- 848,853 
*** InitPostgres(const char *in_dbname, Oid 
*** 867,875 
  	 * pg_database).
  	 *
  	 * Note that the lock is not held long, only until the end of this startup
! 	 * transaction.  This is OK since we are already advertising our use of
! 	 * the database in the PGPROC array; anyone trying a DROP DATABASE after
! 	 * this point will see us there.
  	 *
  	 * Note: use of RowExclusiveLock here is reasonable because we envision
  	 * our session as being a concurrent writer of the database.  If we had a
--- 855,868 
  	 * pg_database).
  	 *
  	 * Note that the lock is not held long, only until the end of this startup
! 	 * transaction.  This is OK since we will advertise our use of the
! 	 * database in the PGPROC array before dropping the lock (in fact, that's
! 	 * the next thing to do).  Anyone trying a DROP DATABASE after this point
! 	 * will see us in PGPROC once they have the lock.
! 	 *
! 	 * Ordering is important here because we don't want to advertise ourselves
! 	 * as being in this database until we have the lock; otherwise we create
! 	 * what amounts to a

Re: [HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

2015-06-04 Thread Michael Paquier

On Thu, Jun 4, 2015 at 10:40 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Thu, May 28, 2015 at 9:09 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
  Since commit de768844, XLogFileCopy of xlog.c returns to caller a
  pstrdup'd string that can be used afterwards for other things.
  XLogFileCopy is used in only one place, and it happens that the result
  string is never freed at all, leaking memory.

 That seems to be almost harmless because the startup process will exit
 just after calling XLogFileCopy(). No?


Yes that's harmless. My point here is correctness, prevention does not hurt
particularly if this code path is used more in the future.

 Also the current error message in case of failure is very C-like:
  elog(ERROR, InstallXLogFileSegment should not have failed);
  I thought that we to use function names in the error messages.
  Wouldn't something like that be more adapted?
  could not copy partial log file or could not copy partial log file
  into temporary segment file

 Or we can extend InstallXLogFileSegment so that we can give the log level
 to it. When InstallXLogFileSegment is called after XLogFileCopy, we can
 give ERROR as the log level and cause an exception to occur if link() or
 rename() fails in InstallXLogFileSegment.


That's neat. Done this way in the attached.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 666fa37..5ee68c1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -807,7 +807,7 @@ static bool XLogCheckpointNeeded(XLogSegNo new_segno);
 static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
 static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	   bool find_free, XLogSegNo max_segno,
-	   bool use_lock);
+	   bool use_lock, int elevel);
 static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
@@ -3012,7 +3012,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	max_segno = logsegno + CheckPointSegments;
 	if (!InstallXLogFileSegment(installed_segno, tmppath,
 *use_existent, max_segno,
-use_lock))
+use_lock, LOG))
 	{
 		/*
 		 * No need for any more future segments, or InstallXLogFileSegment()
@@ -3041,18 +3041,15 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 /*
  * Copy a WAL segment file in pg_xlog directory.
  *
- * dstfname		destination filename
  * srcfname		source filename
  * upto			how much of the source file to copy? (the rest is filled with
  *zeros)
  *
- * If dstfname is not given, the file is created with a temporary filename,
- * which is returned.  Both filenames are relative to the pg_xlog directory.
- *
- * NB: Any existing file with the same name will be overwritten!
+ * The file is created with a temporary filename, which is returned.  Both
+ * filenames are relative to the pg_xlog directory.
  */
 static char *
-XLogFileCopy(char *dstfname, char *srcfname, int upto)
+XLogFileCopy(char *srcfname, int upto)
 {
 	char		srcpath[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
@@ -3150,25 +3147,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 
 	CloseTransientFile(srcfd);
 
-	/*
-	 * Now move the segment into place with its final name.  (Or just return
-	 * the path to the file we created, if the caller wants to handle the rest
-	 * on its own.)
-	 */
-	if (dstfname)
-	{
-		char		dstpath[MAXPGPATH];
-
-		snprintf(dstpath, MAXPGPATH, XLOGDIR /%s, dstfname);
-		if (rename(tmppath, dstpath)  0)
-			ereport(ERROR,
-	(errcode_for_file_access(),
-	 errmsg(could not rename file \%s\ to \%s\: %m,
-			tmppath, dstpath)));
-		return NULL;
-	}
-	else
-		return pstrdup(tmppath);
+	/* return name to caller, to let him hamdle the rest */
+	return pstrdup(tmppath);
 }
 
 /*
@@ -3195,6 +3175,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
  * place.  This should be TRUE except during bootstrap log creation.  The
  * caller must *not* hold the lock at call.
  *
+ * elevel: log level used by this routine.
+ *
  * Returns TRUE if the file was installed successfully.  FALSE indicates that
  * max_segno limit was exceeded, or an error occurred while renaming the
  * file into place.
@@ -3202,7 +3184,7 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 static bool
 InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	   bool find_free, XLogSegNo max_segno,
-	   bool use_lock)
+	   bool use_lock, int elevel)
 {
 	char		path[MAXPGPATH];
 	struct stat stat_buf;
@@ -3247,7 +3229,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(LOG,
+		ereport(elevel,
 (errcode_for_file_access(),
  errmsg(could not link file \%s\ to \%s\ (initialization

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

On Thu, Jun 4, 2015 at 5:29 PM, Robert Haas robertmh...@gmail.com wrote:
 - Forces aggressive autovacuuming when the control file's
 oldestMultiXid doesn't point to a valid MultiXact and enables member
 wraparound at the next checkpoint following the correction of that
 problem.

Err, enables member wraparound *protection* at the next checkpoint,
not the wraparound itself.

It's worth noting that every startup will now include one of the
following two messages:

LOG:  MultiXact member wraparound protections are now enabled

Or:

LOG:  MultiXact member wraparound protections are disabled because
oldest checkpointed MultiXact %u does not exist on disk
...where %u is probably 1

If you get the second one, you'll get the first one later after vacuum
has done its thing and a checkpoint has happened.

This is, obviously, some log chatter for people who don't have a
problem and never have, but I think it's worth emitting the first
message at startup even when there's no problem, so that people don't
have to make inferences from the absence of a message.  We can tell
people very simply that (1) if they see the first message, everything
is fine; (2) if they see the second message, autovacuum is going to
clean things up and they will eventually see the first message; and
(3) if they see neither message, they haven't upgraded to a fixed
version yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw

On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Yesterday, JPUG held an unconference event at Tokyo, and
 Hanada-san had a talk about join-pushdown feature of
 postgres_fdw.
 At this talk, someone proposed an interesting idea to
 make join pushdown more aggressive/effective.
 Let me share it with pgsql-hackers.

 He said, we may have a workload to join a large foreign-
 scan and a small local-scan regardless of the plan type.

 For example:
   joinrel (expected nrows = 5)
 + outerrel ForeignScan (expected nrows = 100)
 + innerrel LocalScan (expected nrows = 5)

 In this case, we may be able to run the entire joinrel
 on the remote side then fetch just 5 rows, if fdw-driver
 construct VALUES() clause according to the contents of
 LocalScan then makes an entire join query with another
 one kept in ForeignScan.

 If above ForeignScan have the following remote query,
   SELECT a, b, c FROM t0 WHERE d  100
 we may be able to construct the query below to run remote
 join with local (small) relation.

   SELECT a, b, c, x, y FROM
 (SELECT a, b, c FROM t0 WHERE d  100) AS ft
 JOIN
 (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'),
 (4,'ddd'), (5,'eee')) AS lt (x, y)
 ON ft.a = lt.x

 The VALUES clauses can be mechanically constructed according
 to the result set of LocalScan, and it is not difficult to
 make such a remote query on top of the existing ForeignScan.
 In the result, it will reduce amount of network traffic and
 CPU cycles to form/deform tuples dramatically.

 I don't intend to implement this idea urgently (of course,
 join pushdown for both ForeignScan case has higher priority),
 however, it makes sense to keep the future direction in mind.

 Also, as an aside, even though Hanada-san mentioned ForeignScan
 does not need an infrastructure to initialize child path nodes,
 this idea may require ForeignScan to have local child path.

Neat idea.  This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server?  Perhaps both cases are worth thinking about at
some point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-04 Thread Craig Ringer

On 4 June 2015 at 22:43, Stephen Frost sfr...@snowman.net wrote:

 Josh,

 * Josh Berkus (j...@agliodbs.com) wrote:
  I would argue that if we delay 9.5 in order to do a 100% manual review
  of code, without adding any new automated tests or other non-manual
  tools for improving stability, then it's a waste of time; we might as
  well just release the beta, and our users will find more issues than we
  will.  I am concerned that if we declare a cleanup period, especially in
  the middle of the summer, all that will happen is that the project will
  go to sleep for an extra three months.

 This is the exact same concern that I have.  A delay just to have a
 delay is not useful.  I completely agree that we need more automated
 testing, etc, though getting all of that set up and running could be
 done at any time too- there's no reason to wait, nor do I believe
 delaying 9.5 would make such automated testing appear.


In terms of specific testing improvements, things I think we need to have
covered and runnable on the buildfarm are:

* pg_dump and pg_restore testing (because it's scary we don't do this)
* WAL archiving based warm standby testing with promotion
* Two node streaming replication with promotion, both with a slot and with
archive fallback
* Three node cascading streaming replication with middle node promotion
then tail end node promotion
* Logical decoding streaming testing, comparing to expected decoded output
* DDL deparse test coverage for all operations
* pg_basebackup + start up from backup
* hard-kill the postmaster, start up from crashed datadir
* pg_start_backup, rsync, pg_stop_backup, start up in hot standby
* disk exhaustion tests both for pg_xlog and for the main datadir, showing
we can recover OK when disk is filled then space is freed
* Tests of crash recovery during various DDL operations

Obviously some of these overlap, so one test can cover more than one item.

Implementing these requires stepping outside the comfortable zone of
pg_regress and the isolationtester and having something that can manage
multiple data directories. It's also hard to be sure you're testing the
same thing each time - for example, when using streaming replication with
archive fallback, it might be tricky to ensure that your replica falls
behind and falls back to WAL archive each time. There's always SIGSTOP I
guess.

While these are multi-node tests, at least in PostgreSQL we can just run on
different ports, so there's no need to muck about with containers or VMs.

I already run some of these tests using Ansible for BDR, but I don't
imagine that'd be acceptable in core. It's Python, and it's not especially
well suited to use as a regression testing framework, it's just what I had
to hand and already needed for other automation tasks.

Is pg_tap a reasonable starting point for this sort of testing?

Am I missing obvious and important tests?

How would a test that would've caught the multixact issues look?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] Further issues with jsonb semantics, documentation



On 06/04/2015 04:13 PM, David E. Wheeler wrote:

On Jun 4, 2015, at 12:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:


I'm just skimming here, but if a jsonb_path type is being proposed,

Is this not the purpose of JSQuery?

   https://code.google.com/p/gwtquery/wiki/JsQuery




No, it doesn't seem to have anything at all to do with it. What I 
suggested would be an implementation of json_pointer - see 
http://tools.ietf.org/html/rfc6901


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

On Thu, Jun 4, 2015 at 12:57 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas robertmh...@gmail.com wrote:
 Thanks for the review.

 Here's a new version.  I've fixed the things Alvaro and Noah noted,
 and some compiler warnings about set but unused variables.

 I also tested it, and it doesn't quite work as hoped.  If started on a
 cluster where oldestMultiXid is incorrectly set to 1, it starts up and
 indicates that the member wraparound guards are disabled.  But even
 after everything is fixed, they don't get enabled until after the next
 full restart.  I think that's because TruncateMultiXact() bails out
 too early, without calling DetermineSafeOldestOffset.

 My attempt at a quick fix for that problem didn't work out, so I'm
 posting this version for now to facilitate further review and testing.

Here's a new version with some more fixes and improvements:

- SetOffsetVacuumLimit was failing to set MultiXactState-oldestOffset
when the oldest offset became known if the now-known value happened to
be zero.  Fixed.

- SetOffsetVacuumLimit now logs useful information at the DEBUG1
level, so that you can see that it's doing what it's supposed to.

- TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the
offsetStopLimit even if it can't truncate anything.  This seems
useless, but it's not, because it may be that the last checkpoint
advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real
value, and now we can actually set offsetStopLimit properly.

- TruncateMultiXact no longer calls find_multixact_start when there
are no remaining multixacts.  This is actually a completely separate
bug that goes all the way back to 9.3.0 and can potentially cause
TruncateMultiXact to remove every file in pg_multixact/offsets.
Restarting the cluster becomes impossible because TrimMultiXact barfs.

- TruncateMultiXact now logs a message if the oldest multixact does
not precede the earliest one on disk and is not equal to the next
multixact and yet does not exist.  The value of the log message is
that it discovered the bug mentioned in the previous line, so I think
it's earning its keep.

With this version, I'm able to see that when you start up a
9.3.latest+this patch with a cluster that has a bogus value of 1 in
relminmxid, datminmxid, and the control file, autovacuum vacuums
everything in sight, all the values get set back to the right thing,
and the next checkpoint enables the member-wraparound guards.  This
works with both autovacuum=on and autovacuum=off; the emergency
mechanism kicks in as intended.  We'll want to warn people with big
databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may
want to pre-vacuum those tables before upgrading to avoid a vacuum
storm.  But generally I'm pretty happy with this: forcing those values
to get fixed so that we can guard against member-space wraparound
seems like the right thing to do.

So, to summarize, this patch does the following:

- Fixes the failure-to-start problems introduced in 9.4.2 in
complicated pg_upgrade scenarios.
- Prevents the new calls to find_multixact_start we added in 9.4.2
from happening during recovery, where they can only create failure
scenarios.  The call in TruncateMultiXact that has been there all
along is not eliminated, but now handles failure more gracefully.
- Fixes possible incorrect removal of every single
pg_multixact/offsets file when no multixacts exist; one file should be
kept.
- Forces aggressive autovacuuming when the control file's
oldestMultiXid doesn't point to a valid MultiXact and enables member
wraparound at the next checkpoint following the correction of that
problem.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit 87aa15fe5257060e0c971e135dd9f460fdc00bd0
Author: Robert Haas rhaas@postgresql.org
Date:   Thu Jun 4 11:58:49 2015 -0400

bar

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9568ff1..7c457a6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -198,13 +198,24 @@ typedef struct MultiXactStateData
 	/* next-to-be-assigned offset */
 	MultiXactOffset nextOffset;
 
+	/* Have we completed multixact startup? */
+	bool		finishedStartup;
+
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that is still potentially referenced by a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
+
+	/*
+	 * Oldest multixact offset that is potentially referenced by a
+	 * multixact referenced by a relation.  We don't always know this value,
+	 * so there's a flag here to indicate whether or not we currently do.
+	 */
 	MultiXactOffset oldestOffset;
+	bool		oldestOffsetKnown;

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Thomas Munro

On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote:
Here's a new version with some more fixes and improvements:

- SetOffsetVacuumLimit was failing to set MultiXactState-oldestOffset
when the oldest offset became known if the now-known value happened to
be zero. Fixed.

- SetOffsetVacuumLimit now logs useful information at the DEBUG1
level, so that you can see that it's doing what it's supposed to.

So, to summarize, this patch does the following:

LOG: could not truncate directory pg_multixact/offsets: apparent wraparound

That message comes from SimpleLruTruncate.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix documentation bug in how to calculate the quasi-unique pg_log session_id

On Tue, Jun 2, 2015 at 11:22 AM, Joel Jacobson j...@trustly.com wrote:
 Fix documentation bug in how to calculate the quasi-unique pg_log session_id

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Evidently there is a problem right there.  If I simply add an order by
  tenthous as proposed by Peter, many more errors appear; and what errors
  appear differs if I change shared_buffers.  I think the real fix for
  this is to change the hand-picked values used in the brinopers table, so
  that they all pass the test using some reasonable ORDER BY specification
  in the populating query (probably tenk1.unique1).
 
 I may be confused, but why would the physical ordering of the table
 entries make a difference to the correct answers for this test?
 (I can certainly see why that might break the brin code, but not
 why it should change the seqscan's answers.)

We create the brintest using a scan of tenk1 LIMIT 100, without
specifying the order.  So whether we find rows that match each test query
is pure chance.

 Also, what I'd just noticed is that all of the cases that are failing are
 ones where the expected number of matching rows is exactly 1.  I am
 wondering if the test is sometimes just missing random rows, and we're not
 seeing any reported problem unless that makes it go down to no rows.  (But
 I do not know how that could simultaneously affect the seqscan case ...)

Yeah, we compare the ctid sets of the results, and we assume that a
seqscan would get that correctly.

 I think it would be a good idea to extend the brinopers table to include
 the number of expected matches, and to complain if that's not what we got,
 rather than simply checking for zero.

That sounds reasonable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

Hi,

On 2015-06-04 12:57:42 -0400, Robert Haas wrote:
 + /*
 +  * Do we need an emergency autovacuum?  If we're not sure, assume yes.
 +  */
 + return !oldestOffsetKnown ||
 + (nextOffset - oldestOffset  MULTIXACT_MEMBER_SAFE_THRESHOLD);

I think without teaching autovac about those rules, this might just lead
to lots of autovac processes starting without knowing they should do
something? They know about autovacuum_multixact_freeze_age, but they
know neither about !oldestOffsetKnown nor, afaics, about pending offset
wraparounds?

 -static MultiXactOffset
 -find_multixact_start(MultiXactId multi)
 +static bool
 +find_multixact_start(MultiXactId multi, MultiXactOffset *result)
  {
   MultiXactOffset offset;
   int pageno;
 @@ -2630,6 +2741,9 @@ find_multixact_start(MultiXactId multi)
   pageno = MultiXactIdToOffsetPage(multi);
   entryno = MultiXactIdToOffsetEntry(multi);
  
 + if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 + return false;
 +
   /* lock is acquired by SimpleLruReadPage_ReadOnly */
   slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
   offptr = (MultiXactOffset *) 
 MultiXactOffsetCtl-shared-page_buffer[slotno];
 @@ -2642,25 +2756,31 @@ find_multixact_start(MultiXactId multi)


I think it'd be a good idea to also return false in case of a
InvalidMultiXactId - that'll be returned if the page has been zeroed.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

I wrote:
 I think it would be a good idea to extend the brinopers table to include
 the number of expected matches, and to complain if that's not what we got,
 rather than simply checking for zero.

Also, further experimentation shows that there are about 30 entries in the
brinopers table that give rise to seqscan plans even when we're commanding
a bitmap scan, presumably because those operators aren't brin-indexable.
They're not the problematic cases, but things like 

((charcol)::text  'A'::text)

Is there a reason to have such things in the table, or is this just a
thinko?  Or is it actually a bug that we're getting such plans?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Tom Lane wrote:
 I wrote:
  I think it would be a good idea to extend the brinopers table to include
  the number of expected matches, and to complain if that's not what we got,
  rather than simply checking for zero.
 
 Also, further experimentation shows that there are about 30 entries in the
 brinopers table that give rise to seqscan plans even when we're commanding
 a bitmap scan, presumably because those operators aren't brin-indexable.
 They're not the problematic cases, but things like 
 
   ((charcol)::text  'A'::text)
 
 Is there a reason to have such things in the table, or is this just a
 thinko?  Or is it actually a bug that we're getting such plans?

No, I left those there knowing that there are no plans involving brin --
in a way, they provide some future proofing if some of those operators
are made indexable later.

I couldn't think of a way to test that the plans are actually using the
brin index or not, but if we can do that in some way, that would be
good.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation



On 06/04/2015 11:33 AM, Jim Nasby wrote:

On 6/4/15 8:43 AM, Andrew Dunstan wrote:

You are conflating two different things here, quite pointlessly. The RH
operand of ?| is not a path, whereas the RH operand of this - variant
is. The fact that they are both text arrays doesn't mean that they
should mean the same thing. And this is really the whole problem with
the rest of your analysis.


Has the idea of a specific json_path datatype been discussed? I feel 
it would add a lot of clarity to the operators. It would also make it 
easy to have an array of paths, something that's difficult to do today 
because a path can be an arbitrary length and arrays don't support that.


I actually thought of doing something like that earlier today, although 
I was thinking of making it an array under the hood - I'm not sure how 
much call there is for an array of paths. We could probably finesse 
that. I agree that there is some sense in having such a type, especially 
if we later want to implement json(b)_patch, see 
http://tools.ietf.org/html/rfc6902. And if we do we should call the 
type json_pointer to be consistent with 
http://tools.ietf.org/html/rfc6901.


However, this is certainly not 9.5 material.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely



On 06/04/2015 10:34 AM, Robert Haas wrote:


On Thu, Jun 4, 2015 at 11:49 AM, Joshua D. Drake j...@commandprompt.com wrote:

Except, that is kind of the point. Why are we adding to it?


If you don't know the answer to that question already, then you
probably shouldn't be proposing to get rid of the thing.



I know the answer some people are saying. That doesn't mean it is the 
correct answer, that I agree with it or that it is a good answer.



I think it's because there are some things we want to include in the
core distribution without baking them irrevocably into the server.



I have mentioned before isn't really what this discussion is about. 
Stephen Frost and I even went through the entire list of modules of what 
should and should not be included.



Which, IMV, is 100% reasonable.



Nobody is arguing with that.

Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

On Thu, Jun 4, 2015 at 1:57 PM, Joshua D. Drake j...@commandprompt.com wrote:
 I think it's because there are some things we want to include in the
 core distribution without baking them irrevocably into the server.

 I have mentioned before isn't really what this discussion is about. Stephen
 Frost and I even went through the entire list of modules of what should and
 should not be included.

 Which, IMV, is 100% reasonable.

 Nobody is arguing with that.

Well then I am confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xid wrap / optimize frozen tables?

2015-06-04 Thread Jeff Janes

On Wed, Jun 3, 2015 at 2:49 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 I would not be surprised if it were the reading, not the writing, which
 caused the performance problem.


Of course I screwed up that last sentence.  I meant the opposite, it would
not surprise me if it were the writing that caused the problem, despite
there being 5 times less of it.

Jeff

Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

On Thu, Jun 4, 2015 at 1:27 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-06-04 12:57:42 -0400, Robert Haas wrote:
 + /*
 +  * Do we need an emergency autovacuum?  If we're not sure, assume yes.
 +  */
 + return !oldestOffsetKnown ||
 + (nextOffset - oldestOffset  MULTIXACT_MEMBER_SAFE_THRESHOLD);

 I think without teaching autovac about those rules, this might just lead
 to lots of autovac processes starting without knowing they should do
 something? They know about autovacuum_multixact_freeze_age, but they
 know neither about !oldestOffsetKnown nor, afaics, about pending offset
 wraparounds?

You're right, but that's why the latest patch has changes in
MultiXactMemberFreezeThreshold.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Also, further experimentation shows that there are about 30 entries in the
 brinopers table that give rise to seqscan plans even when we're commanding
 a bitmap scan, presumably because those operators aren't brin-indexable.
 They're not the problematic cases, but things like 
 
 ((charcol)::text  'A'::text)
 
 Is there a reason to have such things in the table, or is this just a
 thinko?  Or is it actually a bug that we're getting such plans?

 No, I left those there knowing that there are no plans involving brin --
 in a way, they provide some future proofing if some of those operators
 are made indexable later.

On closer investigation, I think the ones involving charcol are a flat
out bug in the test, namely failure to quote char.  Observe:

regression=# explain select ctid from brintest where charcol = 'A'::char;
QUERY PLAN
--
 Seq Scan on brintest  (cost=0.00..101.88 rows=1 width=6)
   Filter: ((charcol)::text = 'A'::text)
(2 rows)

regression=# explain select ctid from brintest where charcol = 'A'::char;
  QUERY PLAN   
---
 Bitmap Heap Scan on brintest  (cost=48.02..58.50 rows=3 width=6)
   Recheck Cond: (charcol = 'A'::char)
   -  Bitmap Index Scan on brinidx  (cost=0.00..48.02 rows=3 width=0)
 Index Cond: (charcol = 'A'::char)
(4 rows)

Presumably we'd like to test the latter case not the former.

The other cases that I found involve cidrcol, and seem to represent
an actual bug in the brin planning logic, ie failure to disregard a
no-op cast.  I'll look closer.

 I couldn't think of a way to test that the plans are actually using the
 brin index or not, but if we can do that in some way, that would be
 good.

Yeah, we can do that --- the way I found out there's a problem is to
modify the test script to check the output of EXPLAIN.

So at this point it looks like (1) chipmunk's issue might be explained
by lack of forced ORDER BY; (2) the test script could be improved to
test more carefully, and it has got an issue with char vs char;
(3) there might be a planner bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

On Thu, Jun 4, 2015 at 11:49 AM, Joshua D. Drake j...@commandprompt.com wrote:
 Except, that is kind of the point. Why are we adding to it?

If you don't know the answer to that question already, then you
probably shouldn't be proposing to get rid of the thing.

I think it's because there are some things we want to include in the
core distribution without baking them irrevocably into the server.

Which, IMV, is 100% reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor issue with BRIN regression tests

Peter Geoghegan p...@heroku.com writes:
 Attached patch adjusts BRIN regression tests to make a non-obvious
 dependency on tuple order explicit. Currently, an index-only scan plan
 is used by the query that I've adjusted. I'd rather be sure that that
 continues.

Applied with a correction: the ordering that was being used was really
ORDER BY thousand, tenthous because that's the order of the relevant
index.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

Alvaro Herrera wrote:
 Robert Haas wrote:
 
  So here's a patch taking a different approach.
 
 I tried to apply this to 9.3 but it's messy because of pgindent.  Anyone
 would have a problem with me backpatching a pgindent run of multixact.c?

Done.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely



On 06/04/2015 11:01 AM, Robert Haas wrote:


On Thu, Jun 4, 2015 at 1:57 PM, Joshua D. Drake j...@commandprompt.com wrote:

I think it's because there are some things we want to include in the
core distribution without baking them irrevocably into the server.


I have mentioned before isn't really what this discussion is about. Stephen
Frost and I even went through the entire list of modules of what should and
should not be included.


Which, IMV, is 100% reasonable.


Nobody is arguing with that.


Well then I am confused.


My argument was (after some preliminary discussion):

1. Review contrib
2. All modules that are core worthy install by default
3. Push all other contrib out into the wild

Possibly:

1. Have a contrib project that sat outside of core
2. Push all contrib or some contrib to pgxn (or some other place)

Because:

1. Decrease in code maintenance for core
2. Removal of the idea that contrib is a holding queue for not quite up 
to snuff code
3. Most extensions don't need to follow the same development pattern 
that core does

4. Eliminate the EGO of saying I have a contrib module in core

Derived from:

1. 15 years of the same argument (current source: pg_audit)
2. Helping reduce overall resource need to manage contrib
3. And now (Nasby's excellent argument)

I am not trying to throw a wrench in things. I am trying to help 
streamline them. A lot of the arguments presented just don't hold water 
(eating your own dogfood) because we aren't stopping anyone from doing 
that testing. Whether or not the source (whatever it may be) is in 
-contrib doesn't prevent that testing.


Sincerely,

Joshua D. Drake






--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

On 2015-06-04 11:33:47 -0700, Joshua D. Drake wrote:
 My argument was (after some preliminary discussion):
 
 1. Review contrib
 2. All modules that are core worthy install by default
 3. Push all other contrib out into the wild
 
 Possibly:
 
 1. Have a contrib project that sat outside of core
 2. Push all contrib or some contrib to pgxn (or some other place)
 
 Because:
 
 1. Decrease in code maintenance for core

The modules that aren't going to be in core, hardly cost time, do they?

 2. Removal of the idea that contrib is a holding queue for not quite up to
 snuff code

I don't think that idea exists widely.

 3. Most extensions don't need to follow the same development pattern that
 core does

The point is that users want them to follow that.

 4. Eliminate the EGO of saying I have a contrib module in core

Seriously?

 1. 15 years of the same argument (current source: pg_audit)

I don't see getting rid of contrib helping with that. The only change
will then be whether something should be in core.

And there's stuff that's just very hard to develop out of core; but
doesn't necessarily immediately belong into
core. E.g. pg_stat_statements is relatively closely tied to things in
core.

 2. Helping reduce overall resource need to manage contrib

This discussion a significant share of that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 I may be confused, but why would the physical ordering of the table
 entries make a difference to the correct answers for this test?
 (I can certainly see why that might break the brin code, but not
 why it should change the seqscan's answers.)

 We create the brintest using a scan of tenk1 LIMIT 100, without
 specifying the order.  So whether we find rows that match each test query
 is pure chance.

Oooh ... normally that would not matter, but I wonder if what's happening
on chipmunk is that the synchronized-seqscan logic kicks in and causes us
to read some other part of tenk1 than we normally would, as a consequence
of some concurrent activity or other.  The connection to smaller than
normal shared_buffers would be that it would change our idea of what's a
large enough table to justify using synchronized seqscans.

Peter's patch failed to hit the place where this matters, btw.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

On 2015-06-04 20:41:46 +0200, Andres Freund wrote:
  1. 15 years of the same argument (current source: pg_audit)
 
 I don't see getting rid of contrib helping with that. The only change
 will then be whether something should be in core.
 
 And there's stuff that's just very hard to develop out of core; but
 doesn't necessarily immediately belong into
 core. E.g. pg_stat_statements is relatively closely tied to things in
 core.

An even better example is pg_upgrade. It'd not have been possible to
bring it somewhere close to maturity without having it in contrib/. It
requires core cooperation, and we weren't immediately to declare it as
part of core.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

On Thu, Jun 4, 2015 at 1:02 PM, Peter Geoghegan p...@heroku.com wrote:
 I would like these new-to-9.5 deletion operators to work at the top
 level only, like operator jsonb ? text and operator jsonb ?| text,
 sharing their idea of a key, __including that string array elements
 are keys__. We haven't got a containment-style nested delete operator
 for 9.5, but we can hope for it in the future. In the meantime, you
 get much of the benefit of that with jsonb_delete(), which *can*
 support nested deletion. It does so by buying into the operator jsonb
 ? text idea of a key (including that string array elements are keys),
 although with a twist: the paths text[] right operand operates at
 multiple nesting levels (not supporting traversing arrays, as Andrew
 implemented it, but OTOH adding support for deleting String array
 elements based on the string alone, useful for tag arrays).

 If in 9.6 we have something like an operator jsonb @- jsonb operator
 for containment style deletion, and a 9.5 era operator jsonb - text
 and operator jsonb - text[] pair of operators for existence style
 deletion (matching operator jsonb ? text, operating only on the top
 level), that will be pretty good. The fact that jsonb_delete() will
 have somewhat bridged the gap nesting-deletion-wise for 9.5 (without
 being usable through an operator) won't really matter then. I want to
 keep the twist I described out of any jsonb operators that are
 shipped, and only use it within functions.

To be clear: these two paragraphs are a proposal about how I'd like to
change things for 9.5 to make the jsonb operators more consistent than
the way things are in the master branch, while still offering nested
deletion through a function.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

2015-06-04 Thread David E. Wheeler

On Jun 4, 2015, at 12:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 I'm just skimming here, but if a jsonb_path type is being proposed,

Is this not the purpose of JSQuery?

  https://code.google.com/p/gwtquery/wiki/JsQuery

David



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] RFC: Remove contrib entirely

2015-06-04 Thread David E. Wheeler

On Jun 4, 2015, at 11:53 AM, Neil Tiffin ne...@neiltiffin.com wrote:

 I have looked at PGXN and would never install anything from it.  Why?  
 Because it is impossible to tell, without inside knowledge or a lot of work, 
 what is actively maintained and tested, and what is an abandoned 
 proof-of-concept or idea.

Well, you can see the last release dates for a basic idea of that sort of 
thing. Also the release status (stable, unstable, testing).

 There is no indication of what versions of pg any of PGXN modules are tested 
 on, or even if there are tests that can be run to prove the module works 
 correctly with a particular version of pg.

Yeah, I’ve been meaning to integrate http://pgxn-tester.org/ results for all 
modules, which would help with that. In the meantime you can hit that site 
itself. Awesome work by Tomas Vondra.

 There are many modules that have not been updated for several years.  What is 
 their status?  If they break is there still someone around to fix them or 
 even cares about them?  If not, then why waste my time.

These are challenges to open-source software in general, and not specific to 
PGXN.

 So adding to Jim’s comment above, anything that vets or approves PGXN modules 
 is, in my opinion, essentially required to make PGXN useful for anything 
 other than a scratchpad.

Most of the distributions on PGXN feature links to their source code 
repositories.

 A big help would be to pull in the date of the last git commit in the module 
 overview and ask the authors to edit the readme to add what major version of 
 pg the author last tested or ran on.

That’s difficult to maintain; I used to do it for pgTAP, was too much work. 
pgxn-tester.org is a much better idea.

Best,

David




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Streaming replication for psycopg2

2015-06-04 Thread Shulgin, Oleksandr

On Tue, Jun 2, 2015 at 2:23 PM, Shulgin, Oleksandr 
oleksandr.shul...@zalando.de wrote:

 Hello,

 I've submitted a patch to psycopg2 to support streaming replication
protocol (COPY_BOTH): https://github.com/psycopg/psycopg2/pull/322

 It would be great if more people had a chance to take a look and provide
feedback about it.  In particular, please see example usage at this github
comment[1]. Some bikeshedding is really needed here. :-)

I've been suggested that instead of putting the sync/stop methods into the
replication message class like this

class ReplicationMessage(str):
 #wal_end
 #data_start
 #send_time
 #def commit(self):
 #def stop(self):
 ...


it would make more sense to put them into the cursor, like this:

class ReplicationCursor(...):

def sync_server(self, msg):
...

def stop_replication(self):
...

The client code will be able then to do this:

class LogicalWriter(object):

def __init__(self, cursor):
self.cursor = cursor

def write(self, msg): # receives instance of ReplicationMessage
if should_stop_replication():
self.cursor.stop_replication()
return

self.actually_store_the_message(msg)

if stored_reliably() and want_to_report_now():
self.cursor.sync_server(msg)

# return value not examined by caller


That seems like a more sane interface to me.

--
Alex

Re: [HACKERS] RFC: Remove contrib entirely



On 06/04/2015 09:00 AM, Andrew Dunstan wrote:



No. You keep getting this wrong. The fact that we have extensions
doesn't mean that we want to throw out everything that is an extension.
It's perfectly reasonable for us to maintain some ourselves, not least
as part of eating out own dog food.


Yes. I agree with that. Which is why I suggested .Org -contrib project.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

2015-06-04 Thread Jim Nasby


On 6/4/15 8:43 AM, Andrew Dunstan wrote:

You are conflating two different things here, quite pointlessly. The RH
operand of ?| is not a path, whereas the RH operand of this - variant
is. The fact that they are both text arrays doesn't mean that they
should mean the same thing. And this is really the whole problem with
the rest of your analysis.


Has the idea of a specific json_path datatype been discussed? I feel it 
would add a lot of clarity to the operators. It would also make it easy 
to have an array of paths, something that's difficult to do today 
because a path can be an arbitrary length and arrays don't support that.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas robertmh...@gmail.com wrote:
 Thanks for the review.

Here's a new version.  I've fixed the things Alvaro and Noah noted,
and some compiler warnings about set but unused variables.

I also tested it, and it doesn't quite work as hoped.  If started on a
cluster where oldestMultiXid is incorrectly set to 1, it starts up and
indicates that the member wraparound guards are disabled.  But even
after everything is fixed, they don't get enabled until after the next
full restart.  I think that's because TruncateMultiXact() bails out
too early, without calling DetermineSafeOldestOffset.

My attempt at a quick fix for that problem didn't work out, so I'm
posting this version for now to facilitate further review and testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit eb39cf10e4ff853ed4b9d3fab599cf42911e6f70
Author: Robert Haas rhaas@postgresql.org
Date:   Thu Jun 4 11:58:49 2015 -0400

bar

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 699497c..209d3e6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -196,13 +196,24 @@ typedef struct MultiXactStateData
 	/* next-to-be-assigned offset */
 	MultiXactOffset nextOffset;
 
+	/* Have we completed multixact startup? */
+	bool		finishedStartup;
+
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that is still potentially referenced by a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
+
+	/*
+	 * Oldest multixact offset that is potentially referenced by a
+	 * multixact referenced by a relation.  We don't always know this value,
+	 * so there's a flag here to indicate whether or not we currently do.
+	 */
 	MultiXactOffset oldestOffset;
+	bool		oldestOffsetKnown;
 
 	/*
 	 * This is what the previous checkpoint stored as the truncate position.
@@ -219,6 +230,7 @@ typedef struct MultiXactStateData
 
 	/* support for members anti-wraparound measures */
 	MultiXactOffset offsetStopLimit;
+	bool offsetStopLimitKnown;
 
 	/*
 	 * Per-backend data starts here.  We have two arrays stored in the area
@@ -348,10 +360,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
 		MultiXactOffset offset2);
 static void ExtendMultiXactOffset(MultiXactId multi);
 static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static void DetermineSafeOldestOffset(MultiXactId oldestMXact);
+static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact);
 static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
 		 MultiXactOffset start, uint32 distance);
-static MultiXactOffset find_multixact_start(MultiXactId multi);
+static bool SetOffsetVacuumLimit(bool finish_setup);
+static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
 static void WriteMZeroPageXlogRec(int pageno, uint8 info);
 
 
@@ -960,7 +973,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 * against catastrophic data loss due to multixact wraparound.  The basic
 	 * rules are:
 	 *
-	 * If we're past multiVacLimit or the safe threshold for member storage space,
+	 * If we're past multiVacLimit or the safe threshold for member storage
+	 * space, or we don't know what the safe threshold for member storage is,
 	 * start trying to force autovacuum cycles.
 	 * If we're past multiWarnLimit, start issuing warnings.
 	 * If we're past multiStopLimit, refuse to create new MultiXactIds.
@@ -969,6 +983,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 *--
 	 */
 	if (!MultiXactIdPrecedes(result, MultiXactState-multiVacLimit) ||
+		!MultiXactState-oldestOffsetKnown ||
 		(MultiXactState-nextOffset - MultiXactState-oldestOffset
 			 MULTIXACT_MEMBER_SAFE_THRESHOLD))
 	{
@@ -1083,7 +1098,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 *--
 	 */
 #define OFFSET_WARN_SEGMENTS	20
-	if (MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit, nextOffset,
+	if (MultiXactState-offsetStopLimitKnown 
+		MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit, nextOffset,
  nmembers))
 		ereport(ERROR,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
@@ -1095,7 +,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 		   MultiXactState-offsetStopLimit - nextOffset - 1),
  errhint(Execute a database-wide VACUUM in database with OID %u with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.,
 		 MultiXactState-oldestMultiXactDB)));
-	else if (MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit,
+	else if (MultiXactState-offsetStopLimitKnown 
+			 MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit,

Re: [HACKERS] RFC: Remove contrib entirely

On Thu, Jun 4, 2015 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote:
 The biggest problem is that packagers tend just to bundle contrib together
 in one lump. If we could divide it into two, something like standard
 modules and misc, with the former being included with the server package,
 I think that would be an advance, although packagers might reasonably want
 to treat pgcrypto as a special case.

The problem is that it's very hard to agree on which stuff ought to be
standard and which stuff ought to be misc.  There's probably some
stuff that almost everyone would agree is pretty useful (like hstore
and postgres_fdw) but figuring out which stuff isn't useful is a lot
harder.  Almost anything you say - that's junk - someone else will say
- no, that thing is great, I use it all the time.  For example, I just
offered contrib/isn as a sort of archetypal example of stuff that's
pretty marginal crap and Josh immediately came back and said, hey, I
use that!  I don't see any principled way of getting past that
difficulty.  Just because a module isn't regularly used by someone who
reads -hackers daily doesn't mean it's not worth keeping.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact
 so it does not explain whatever is happening on chipmunk.

 Ah, thanks for diagnosing that.

 The chipmunk failure is strange -- notice it only references the
 = operators, except for type box for which it's ~= that fails.  The test
 includes a lot of operators ...

Actually not --- if you browse through the last half dozen failures
on chipmunk you will notice that

(1) the set of operators complained of varies a bit from one failure
to the next;

(2) more often than not, this is one of the failures:

WARNING:  no results for (boxcol,@,box,((1,2),(300,400)))

Certainly the majority of the complaints are about equality operators,
but not quite all of them.

 Also, we have quite a number of ARM boxes: apart from chipmunk we have
 gull, hamster, mereswine, dangomushi, axolotl, grison.  (hamster and
 chipmunk report hostname -m as armv6l, the others armv7l).  All of
 them are running Linux, either Fedora or Debian.  Most are using gcc,
 compilation flags look pretty standard.

I have no idea what might be different about chipmunk compared to any
other ARM buildfarm critter ... Heikki, any thoughts on that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely


On 06/04/2015 08:55 AM, Jim Nasby wrote:


Personally, I'd rather we publish a list of formally vetted and approved
versions of PGXN modules. There are many benefits to that, and the
downside of not having that stuff as part of make check would be
overcome by the explicit testing we would need to have for approved
modules.


I tried to come up with more words but: Exactly.

The benefit to this idea so far exceeds the benefit to having contrib.

Sincerely,

JD


--
The most kicking donkey PostgreSQL Infrastructure company in existence.
The oldest, the most experienced, the consulting company to the stars.
Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 -
24x7 - 365 - Proactive and Managed Professional Services!


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Tom Lane wrote:

 Actually not --- if you browse through the last half dozen failures
 on chipmunk you will notice that
 
 (1) the set of operators complained of varies a bit from one failure
 to the next;
 
 (2) more often than not, this is one of the failures:
 
 WARNING:  no results for (boxcol,@,box,((1,2),(300,400)))
 
 Certainly the majority of the complaints are about equality operators,
 but not quite all of them.

Hm.  Well, what this message says is that we ran that query using
both BRIN and seqscan, and that in both cases no row was returned.  Note
that if the BRIN and seqscan cases had returned different sets of rows,
the error message would have been different.  So this might be related
to the way the test table is created, rather than to a bug in BRIN.
Peter G. recently pointed out that this seems to be relying on an
index-only scan on table tenk1 and suggested an ORDER BY.  Maybe that
assumption is being violated on chipmunk and so the table populated is
different than what the table actually expects.

I just noticed that chipmunk has shared_buffers=10MB on its buildfarm
config.  I don't see that in any of the other ARM animals.  Maybe that
can change the plan choice.

I will test locally with reduced shared_buffers and see if I can
reproduce the results.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Alvaro Herrera wrote:

 Hm.  Well, what this message says is that we ran that query using
 both BRIN and seqscan, and that in both cases no row was returned.  Note
 that if the BRIN and seqscan cases had returned different sets of rows,
 the error message would have been different.  So this might be related
 to the way the test table is created, rather than to a bug in BRIN.
 Peter G. recently pointed out that this seems to be relying on an
 index-only scan on table tenk1 and suggested an ORDER BY.  Maybe that
 assumption is being violated on chipmunk and so the table populated is
 different than what the table actually expects.

Evidently there is a problem right there.  If I simply add an order by
tenthous as proposed by Peter, many more errors appear; and what errors
appear differs if I change shared_buffers.  I think the real fix for
this is to change the hand-picked values used in the brinopers table, so
that they all pass the test using some reasonable ORDER BY specification
in the populating query (probably tenk1.unique1).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 4, 2015 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote:
 The biggest problem is that packagers tend just to bundle contrib together
 in one lump. If we could divide it into two, something like standard
 modules and misc, with the former being included with the server package,
 I think that would be an advance, although packagers might reasonably want
 to treat pgcrypto as a special case.

As an ex-packager, I agree that would be a good thing.  In particular
we should try to avoid people packaging stuff that's meant either for
server testing or as sample-source-code-not-useful-in-itself.  We've
made some progress on the former via src/test/modules but I wonder
if we're all the way there; test_decoding surely shouldn't be in
contrib on this measure should it?

BTW, pg_upgrade is also a special case from a packager's viewpoint,
since it needs to be bundled with back-version executables.

 The problem is that it's very hard to agree on which stuff ought to be
 standard and which stuff ought to be misc.  There's probably some
 stuff that almost everyone would agree is pretty useful (like hstore
 and postgres_fdw) but figuring out which stuff isn't useful is a lot
 harder.  Almost anything you say - that's junk - someone else will say
 - no, that thing is great, I use it all the time.  For example, I just
 offered contrib/isn as a sort of archetypal example of stuff that's
 pretty marginal crap and Josh immediately came back and said, hey, I
 use that!  I don't see any principled way of getting past that
 difficulty.  Just because a module isn't regularly used by someone who
 reads -hackers daily doesn't mean it's not worth keeping.

Yeah.  One possible way of measuring this would be to go through the
commit logs and count commits in contrib/ that either added a new
feature or fixed a field-reported bug (ie, did not arise simply from
core-code-driven housekeeping).  Either would be solid evidence that
somebody out there is using it.  Gathering the evidence would be
pretty tedious though :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Evidently there is a problem right there.  If I simply add an order by
 tenthous as proposed by Peter, many more errors appear; and what errors
 appear differs if I change shared_buffers.  I think the real fix for
 this is to change the hand-picked values used in the brinopers table, so
 that they all pass the test using some reasonable ORDER BY specification
 in the populating query (probably tenk1.unique1).

I may be confused, but why would the physical ordering of the table
entries make a difference to the correct answers for this test?
(I can certainly see why that might break the brin code, but not
why it should change the seqscan's answers.)

Also, what I'd just noticed is that all of the cases that are failing are
ones where the expected number of matching rows is exactly 1.  I am
wondering if the test is sometimes just missing random rows, and we're not
seeing any reported problem unless that makes it go down to no rows.  (But
I do not know how that could simultaneously affect the seqscan case ...)

I think it would be a good idea to extend the brinopers table to include
the number of expected matches, and to complain if that's not what we got,
rather than simply checking for zero.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely



On 06/04/2015 07:34 AM, Robert Haas wrote:


I don't think it's very practical to talk about getting rid of contrib
when we reliably add to it in every release:


Except, that is kind of the point. Why are we adding to it? Contrib 
(AFAICS) is all things that don't need to be in -core. That is the whole 
point of having extensions, is it not?


Sincerely,

JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

2015-06-04 Thread Jim Nasby


On 6/4/15 10:30 AM, Robert Haas wrote:

On Thu, Jun 4, 2015 at 11:22 AM, Andrew Dunstanand...@dunslane.net  wrote:

The biggest problem is that packagers tend just to bundle contrib together
in one lump. If we could divide it into two, something like standard
modules and misc, with the former being included with the server package,
I think that would be an advance, although packagers might reasonably want
to treat pgcrypto as a special case.

The problem is that it's very hard to agree on which stuff ought to be
standard and which stuff ought to be misc.


What I took away upthread was the idea here was to distinguish things 
that were intended as a POC (like worker_spi,

auth_delay and test_decoding) from everything else.

I think the real problem here that we're skirting around is this idea of 
'blessed extensions', because that's really the only user benefit 
contrib brings: the idea that this stuff is formally blessed by the 
community. If that's really what we're after then we should just be 
explicit about that. Then we can decide if the best way to approach that 
is keeping it in the main repo (as opposed to say, publishing a list of 
explict PGXN package versions and their checksums).


Personally, I'd rather we publish a list of formally vetted and approved 
versions of PGXN modules. There are many benefits to that, and the 
downside of not having that stuff as part of make check would be 
overcome by the explicit testing we would need to have for approved modules.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely



On 06/04/2015 11:49 AM, Joshua D. Drake wrote:


On 06/04/2015 07:34 AM, Robert Haas wrote:


I don't think it's very practical to talk about getting rid of contrib
when we reliably add to it in every release:


Except, that is kind of the point. Why are we adding to it? Contrib 
(AFAICS) is all things that don't need to be in -core. That is the 
whole point of having extensions, is it not?





No. You keep getting this wrong. The fact that we have extensions 
doesn't mean that we want to throw out everything that is an extension. 
It's perfectly reasonable for us to maintain some ourselves, not least 
as part of eating out own dog food.


And I say that as someone who has created and maintains quite a few 
third party extensions.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-06-04 Thread Ashutosh Bapat

Hi,
Documentation here http://www.postgresql.org/docs/devel/static/bgworker.html
does not indicate any relation between the fields bgw_notify_pid and
bgw_flags of BackgroundWorker structure. But in one has to set
BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature.

In BackgroundWorkerStateChange
 318 /*
 319  * Copy the PID to be notified about state changes, but only
if the
 320  * postmaster knows about a backend with that PID.  It isn't
an error
 321  * if the postmaster doesn't know about the PID, because the
backend
 322  * that requested the worker could have died (or been killed)
just
 323  * after doing so.  Nonetheless, at least until we get some
experience
 324  * with how this plays out in the wild, log a message at a
relative
 325  * high debug level.
 326  */
 327 rw-rw_worker.bgw_notify_pid = slot-worker.bgw_notify_pid;
 328 if
(!PostmasterMarkPIDForWorkerNotify(rw-rw_worker.bgw_notify_pid))
 329 {
 330 elog(DEBUG1, worker notification PID %lu is not valid,
 331  (long) rw-rw_worker.bgw_notify_pid);
 332 rw-rw_worker.bgw_notify_pid = 0;
 333 }

bgw_notify_pid gets wiped out (and that too silently) if
PostmasterMarkPIDForWorkerNotify() returns false.
PostmasterMarkPIDForWorkerNotify() only looks at the BackendList, which
does not contain all the background worker created by
Register*BackgroundWorker() calls. Only a baground worker which has set
BGWORKER_BACKEND_DATABASE_CONNECTION, gets added into BackendList in
maybe_start_bgworker()

5629 if (rw-rw_worker.bgw_flags 
BGWORKER_BACKEND_DATABASE_CONNECTION)
5630 {
5631 if (!assign_backendlist_entry(rw))
5632 return;
5633 }
5634 else
5635 rw-rw_child_slot = MyPMChildSlot =
AssignPostmasterChildSlot();
5636
5637 do_start_bgworker(rw);  /* sets rw-rw_pid */
5638
5639 if (rw-rw_backend)
5640 {
5641 dlist_push_head(BackendList, rw-rw_backend-elem);

Should we change the documentation to say one needs to set
BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid
feature? OR we should fix the code not to wipe out bgw_notify_pid in the
code above (may be change PostmasterMarkPIDForWorkerNotify() to scan the
list of other background workers as well)?
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] RFC: Remove contrib entirely

On Fri, May 29, 2015 at 10:30 AM, Stephen Frost sfr...@snowman.net wrote:
 That work will be much less if we simply split what's in contrib now
 into extension and contrib directories, as it's still all one source
 repo to the packagers.  If we punt things out (unless they're being
 formally deprecated/removed) to another repo somewhere, then the
 packagers are going to have to deal with new source repos and related
 complexity.  I'm not saying that's a horrible thing and it might make
 sense in some cases, but generally it's a lot easier to go from one
 soruce package to a bunch of binary ones than from lots of tiny source
 packages to lots of tiny packages.

 The versioning aspect of this does come into play though, as having
 everything with one relatively slow versioning cycle (on the order of
 months) is actually keeping the load on the packagers down.  Lots of
 little releases, all at different times, from lots of different source
 packages, would increase the workload.

These are all good points.

 I'm not sure where I ultimately come down on the question about in-repo
 vs. out-of-repo.  My gut feeling is that if the community is willing to
 continue maintaining contrib modules, then that's ultimately a good
 thing and many of them are relatively low-maintenance anyway.  Having a
 high barrier to entry for new modules looks a bit odd, given the
 definition of contrib, but would be more understandable with a proper
 'extensions' directory.  Of course, we'd have to collectivly agree that
 we feel comfortable with a lower barrier for contrib that what is being
 done now, if the distinction is going to have any meaning.

I don't agree.  It may make sense to keep some stuff in contrib that
is not great code or not really that useful just for historical
reasons.  But if we accept every extension with the quality of, say,
contrib/isn into the core distribution, or core distribution is going
to be huge and bloated with crap.

I'd like to point out that we've already done some significant cleanup
of contrib - some things got moved to src/test/modules, and other
stuff to src/bin.  So what's in that directory is already a lot more
homogenous than it was a few years ago.  Pretty much everything in
there is a loadable module, and if it does something SQL-visible, it's
packaged as an extension.  The only exceptions are oid2name,
pg_standby, start-scripts, and vacuumlo.  For a directory with 60
things in it, that's pretty good.

I don't think it's very practical to talk about getting rid of contrib
when we reliably add to it in every release:

9.1 added auth_delay, dummy_seclabel, file_fdw, pg_test_fsync, and sepgsql
9.2 added pg_test_timing and tcn
9.3 added pg_xlogdump, postgres_fdw, and worker_spi
9.4 added pg_prewarm, test_decoding, and test_shm_mq
9.5 added hstore_plperl, hstore_plpython, ltree_plpython,
tsm_system_rows, and tsm_system_time

I haven't really gotten familiar with the 9.5 stuff, but most of that
earlier stuff was, uh, pretty good stuff.  We wouldn't have been
better off rejecting it, and we wouldn't have been better off putting
it into the main tree.

Any individual person who looks at contrib will no doubt see a fairly
high percentage of stuff they don't care about.  Given that part of
the charter of contrib is to hold things that we don't want to put in
the core product for one reason or another, that is to be expected.
But every single one of those 18 contrib modules we've added in the
last 5 years was something that someone cared deeply about, many of
them get real use, and we're just sticking our head in the sand if we
think that's not going to keep happening.

For what it's worth, I also don't particularly support renaming
contrib.  I don't really see that it buys us enough to justify the
hassle it will cause.  One thing that may be worth doing yet is
separating the code that is just intended as a POC (like worker_spi,
auth_delay and test_decoding) from the stuff that you are really
intended to run in production (like tcn and hstore).  But that
distinction is fuzzier than you might think, because while auth_delay
was intended as a POC, I've subsequently heard rumors of it being used
in production with satisfactory results.  It's very easy to get the
idea that you know what PostgreSQL users use but usually that tends
to mean what I use and the community is broad enough that those
things are Not The Same.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file



On 06/04/2015 12:44 AM, Amit Kapila wrote:


Given that the function raises an error on failure, I think it
will otherwise be OK as is.


Please find an updated patch attached with this mail.





No attachment.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely



On 06/04/2015 10:34 AM, Robert Haas wrote:




For what it's worth, I also don't particularly support renaming
contrib.  I don't really see that it buys us enough to justify the
hassle it will cause.  One thing that may be worth doing yet is
separating the code that is just intended as a POC (like worker_spi,
auth_delay and test_decoding) from the stuff that you are really
intended to run in production (like tcn and hstore).  But that
distinction is fuzzier than you might think, because while auth_delay
was intended as a POC, I've subsequently heard rumors of it being used
in production with satisfactory results.  It's very easy to get the
idea that you know what PostgreSQL users use but usually that tends
to mean what I use and the community is broad enough that those
things are Not The Same.





The biggest problem is that packagers tend just to bundle contrib 
together in one lump. If we could divide it into two, something like 
standard modules and misc, with the former being included with the 
server package, I think that would be an advance, although packagers 
might reasonably want to treat pgcrypto as a special case.


I'm also not in favor of booting packages wholesale out of core, for all 
the good reasons you and others have advanced.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-04 Thread Stephen Frost

Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
 I would argue that if we delay 9.5 in order to do a 100% manual review
 of code, without adding any new automated tests or other non-manual
 tools for improving stability, then it's a waste of time; we might as
 well just release the beta, and our users will find more issues than we
 will.  I am concerned that if we declare a cleanup period, especially in
 the middle of the summer, all that will happen is that the project will
 go to sleep for an extra three months.

This is the exact same concern that I have.  A delay just to have a
delay is not useful.  I completely agree that we need more automated
testing, etc, though getting all of that set up and running could be
done at any time too- there's no reason to wait, nor do I believe
delaying 9.5 would make such automated testing appear.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] xid wrap / optimize frozen tables?

2015-06-04 Thread Nils Goroll

Just FYI: We have worked around these issues by running regular (scripted and
thus controlled) vaccuums on all tables but the active ones and adding L2 ZFS
caching (l2arc). I hope to get back to this again soon.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Noah Misch

On Wed, Jun 03, 2015 at 04:53:46PM -0400, Robert Haas wrote:
 So here's a patch taking a different approach.  In this approach, if
 the multixact whose members we want to look up doesn't exist, we don't
 use a later one (that might or might not be valid).  Instead, we
 attempt to cope with the unknown.  That means:
 
 1. In TruncateMultiXact(), we don't truncate.

I like that change a lot.  It's much easier to seek forgiveness for wasting =
28 GiB of disk than for deleting visibility information wrongly.

 2. If setting the offset stop limit (the point where we refuse to
 create new multixact space), we don't arm the stop point.  This means
 that if you're in this situation, you run without member wraparound
 protection until it's corrected.  A message gets logged once per
 checkpoint telling you that you have this problem, and another message
 gets logged when things get straightened out and the guards are
 enabled.
 
 3. If setting the vacuum force point, we assume that it's appropriate
 to immediately force vacuum.

Those seem reasonable, too.

 I've only tested this very lightly - this is just to see what you and
 Noah and others think of the approach.  As compared with the previous
 approach, it has the advantage of making minimal assumptions about the
 sanity of what's on disk.  It has the disadvantage that, for some
 people, the member-wraparound guard won't be enabled at startup -- but
 note that those people can't start 9.3.7/9.4.2 *at all*, so currently
 they are either running without member wraparound protection anyway
 (if they haven't upgraded to those releases) or they're down entirely.

That disadvantage is negligible, considering.

 Another disadvantage is that we'll be triggering what may be quite a
 bit of autovacuum activity for some people, which could be painful.
 On the plus side, they'll hopefully end up with sane relminmxid and
 datminmxid guards afterwards.

That sounds good so long as each table requires just one successful emergency
autovacuum.  I'm not seeing code to ensure that the launched autovacuum will
indeed perform a full-table scan and update relminmxid; is it there?

For sites that can't tolerate an autovacuum storm, what alternative can we
provide?  Is SET vacuum_multixact_freeze_table_age = 0; VACUUM table of
every table, done before applying the minor update, sufficient?

  static void
 -DetermineSafeOldestOffset(MultiXactId oldestMXact)
 +DetermineSafeOldestOffset(MultiXactOffset oldestMXact)

Leftover change from an earlier iteration?  The values passed continue to be
MultiXactId values.

   /* move back to start of the corresponding segment */
 - oldestOffset -= oldestOffset %
 - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
 + offsetStopLimit = oldestOffset - (oldestOffset %
 + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
 + /* always leave one segment before the wraparound point */
 + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * 
 SLRU_PAGES_PER_SEGMENT);
 +
 + /* if nothing has changed, we're done */
 + if (prevOffsetStopLimitKnown  offsetStopLimit == prevOffsetStopLimit)
 + return;
  
   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 - /* always leave one segment before the wraparound point */
 - MultiXactState-offsetStopLimit = oldestOffset -
 - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
 + MultiXactState-offsetStopLimit = oldestOffset;

That last line needs s/oldestOffset/offsetStopLimit/, I presume.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [CORE] [HACKERS] postpone next week's release

2015-06-04 Thread Heikki Linnakangas


On 06/04/2015 12:17 PM, Andres Freund wrote:

On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote:

So, I'm all for refactoring and adding abstractions where it makes sense,
but it's not going to solve design problems.


I personally don't really see the multixact changes being that bad on
the overall design. It pretty much just extended an earlier design. Now
that wasn't great, but I don't think too many people had realized that
at that point.  The biggest problem was underestimating the complexity.


Yeah, many of the issues were pre-existing, and would've been good to 
fix anyway.


The multixact issues remind me of the another similar thing we did: the 
visibility map. It too was non-critical when it was first introduced, 
but later we started using it for index-only-scans, and it suddenly 
became important that it's up-to-date and crash-safe. We did uncover 
some bugs in that area when index-only-scans were introduced, similar to 
the multixact bugs, only not as bad because it didn't lead to data loss. 
I don't have any point to make with that comparison, but it was similar 
in many ways.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Construction of Plan-node by CSP (RE: Custom/Foreign-Join-APIs)

2015-06-04 Thread Kouhei Kaigai

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
 Sent: Thursday, May 28, 2015 5:35 AM
 To: Robert Haas
 Cc: Kaigai Kouhei(海外 浩平); Thom Brown; Kohei KaiGai; Shigeru Hanada;
 pgsql-hackers@postgreSQL.org
 Subject: Re: [HACKERS] Construction of Plan-node by CSP (RE:
 Custom/Foreign-Join-APIs)

 Robert Haas robertmh...@gmail.com writes:
  Tom, do you want to review this patch and figure out how to solve the
  underlying problem?  If not, I will take care of it.  But I will be
  unhappy if I put time and effort into this and then you insist on
  changing everything afterwards, again.

 [ sorry for slow response, been busy ]  I will take a look.

Tom, how about your availability?

From my side, I adjust my extension (PG-Strom) to fit the infrastructure you 
proposed,
then confirmed it is workable even if custom-scan, that replaced relations 
join, takes
more than two Path nodes in the custom_children list of CustomPath, with no 
exportiong
create_plan_recurse().

Below is an example of custom-scan (GpuJoin) that involves four relations join.
Its code base is the latest master + custom-join-children.v2.patch; unchanged 
from
the last post.

postgres=# explain analyze select avg(x) from t0 natural join t1 natural join 
t2 natural join t3 group by cat;
 QUERY PLAN
--
HashAggregate  (cost=298513.77..298514.10 rows=26 width=12) (actual 
time=5622.028..5622.033 rows=26 loops=1)
   Group Key: t0.cat
   -  Custom Scan (GpuJoin)  (cost=4702.00..249169.85 rows=9868784 width=12) 
(actual time=540.718..2075.566 rows=1000 loops=1)
 Bulkload: On (density: 100.00%)
 Depth 1: Logic: GpuHashJoin, HashKeys: (cid), JoinQual: (cid = cid), 
nrows_ratio: 0.98936439
 Depth 2: Logic: GpuHashJoin, HashKeys: (bid), JoinQual: (bid = bid), 
nrows_ratio: 0.99748135
 Depth 3: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = aid), 
nrows_ratio: 1.
 -  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60 rows=1060 
width=24) (actual time=8.555..903.864 rows=1000 loops=1)
 -  Seq Scan on t3  (cost=0.00..734.00 rows=4 width=4) (actual 
time=0.019..4.370 rows=4 loops=1)
 -  Seq Scan on t2  (cost=0.00..734.00 rows=4 width=4) (actual 
time=0.004..4.182 rows=4 loops=1)
 -  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) (actual 
time=0.005..4.275 rows=4 loops=1)
 Planning time: 0.918 ms
 Execution time: 6178.264 ms
(13 rows)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

custom-join-children.v2.patch
Description: custom-join-children.v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Missing -i / --ignore-version in pg_dump help

2015-06-04 Thread Fujii Masao

On Wed, Jun 3, 2015 at 1:53 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, May 22, 2015 at 11:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Fri, May 22, 2015 at 8:59 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 There are some reason to -i, --ignore-version option doesn't appear in
 pg_dump help?

 Because it's obsolete option, I think.

 Yeah, see commit c22ed3d523782c43836c163c16fa5a7bb3912826.

 Considering we shipped that in 8.4, maybe it's time to remove all
 trace of those switches.  We'd still have to wait a couple releases
 before it'd be safe to use -i for something else, but it'd be a start.

 +1

 Barring any objection, I will apply the attached patch and
 remove that obsolete option..

Applied.

Regards,


-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Remove contrib entirely

2015-06-04 Thread Neil Tiffin


 On Jun 4, 2015, at 10:55 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 
 Personally, I'd rather we publish a list of formally vetted and approved 
 versions of PGXN modules. There are many benefits to that, and the downside 
 of not having that stuff as part of make check would be overcome by the 
 explicit testing we would need to have for approved modules.

I have looked at PGXN and would never install anything from it.  Why?  Because 
it is impossible to tell, without inside knowledge or a lot of work, what is 
actively maintained and tested, and what is an abandoned proof-of-concept or 
idea.  There is no indication of what versions of pg any of PGXN modules are 
tested on, or even if there are tests that can be run to prove the module works 
correctly with a particular version of pg.  There are many modules that have 
not been updated for several years.  What is their status?  If they break is 
there still someone around to fix them or even cares about them?  If not, then 
why waste my time.

So adding to Jim’s comment above, anything that vets or approves PGXN modules 
is, in my opinion, essentially required to make PGXN useful for anything other 
than a scratchpad.  A big help would be to pull in the date of the last git 
commit in the module overview and ask the authors to edit the readme to add 
what major version of pg the author last tested or ran on.

When I install from contrib, at least I have some minimal assurance that the 
code is meant to work with the corresponding version of pg.

Neil

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

On Thu, Jun 4, 2015 at 6:43 AM, Andrew Dunstan and...@dunslane.net wrote:
 I've noticed some more issues with the jsonb documentation, and the
 new jsonb stuff generally. I didn't set out to give Andrew feedback on
 the semantics weeks after feature freeze, but unfortunately this feels
 like another discussion that we need to have now rather than later.

 Yes, I wish you had raised these issues months ago when this was published.
 That's the way the process is supposed to work.

I also wish that I managed to do that. As you know, I was working
overtime to get UPSERT into 9.5 during that period. Finding time to
review things is always difficult, and I which I could do more.

 operator jsonb - integer
 ===

 I think it was a bad idea to allow array-style removal of object
 key/value pairs. ISTM that it implies a level of stability in the
 ordering that doesn't make sense. Besides, is it really all that
 useful?

 But I agree that it's not a great contribution to science, especially since
 the index will be applied to the list of elements in the somewhat
 counter-intuitive storage order we use, and we could just raise an error if
 we try to apply integer delete to an object instead of an array.

Cool. Do you want to write a patch, or should I?

Also, what about negative array subscripting (making the 9.4-era
operator jsonb - integer operator support that for consistency with
the new operator jsonb - integer operator)? Should I write the
patch? Will you commit it if I do?

 operator jsonb - text[] (and *nested* deletion more generally)
 ===

 Summary: I think that this operator has many problems, and should be
 scraped (although only as an operator). IMV nested deletion should
 only be handled by functions, and the way that nested deletion works
 in general should be slightly adjusted.


 The new operator jsonb - text[] operator is confusingly inconsistent
 with:

 A) operator jsonb text

 What exactly is this? I have no idea what you're talking about.

It's a typo -- I meant operator jsonb - text. The fact that
operator jsonb - text and operator jsonb - text[] diverge in the
way they do seems confusing.

 The fact that hstore uses it that way doesn't really concern me. Since
 hstore isn't nested it doesn't make a whole lot of sense for it to mean
 anything else there.

It seems pretty obvious to me that it makes just as much sense as in
hstore. In hstore, you might want to delete multiple key/value pairs
at once, for exactly the same reason as you might want to with jsonb.
Certainly, you'll also want to support nested deletion with jsonb, but
that's beside the point.

 But json(b) is nested, and jsonb - path seems quite a
 reasonable treatment, something you're much more likely to want to do than
 removeing top level elements in bulk.

Probably true. I think that this interface for nested deletion is
complicated enough and inconsistent enough that I'd rather not have an
operator at all, just a function (so somewhat like jsonb_set() --
jsonb_delete()).  That is my main point on operator jsonb - text[];
I think the interface is complicated and inconsistent with everything
else for no good reason.

 Regarding nested deletion behavior more generally, consider this
 example of how this can work out badly:

 postgres=# select jsonb_delete(jsonb_set('[a]', '{5}', 'b'), '{5}')  ;
   jsonb_delete
 --
   [a, b]
 (1 row)

 Here, we're adding and then deleting an array element at offset 5 (the
 string b). But the element is never deleted by the outer
 jsonb_delete(), because we can't rely on the element actually being
 stored at offset 5. Seems a bit fragile.


 The behaviour of jsonb_set is pretty explicitly documented. If we wanted to
 do something else then we'd have to disable the special meaning given to
 negative indices, but that would mean in turn we wouldn't be able to prepend
 to an array.

jsonb_delete() should certainly be able to traverse objects, but it's
much less clear that it should be able to *traverse* arrays (affecting
arrays is a different story, though). That's why I proposed not
supporting traversing arrays with it or with jsonb_set(). This would
also removes the questionable second shadow type system within the
text[] rhs operand too, which seems like a good thing.

I think that traversing arrays in nested documents is a rare
requirement, because the ordering within arrays is unstable. If you
already know the ordinal number of the thing you want to nuke, then
you probably have already locked the row, and you might as well
manipulate the JSON using Javascript or Python at that stage.

Making jsonb_delete() buy into the operator jsonb ? text idea of a
key (a thing that it must delete) would also allow jsonb_delete() to
reliably delete particular strings in arrays, which actually does make
a lot of sense (think of arrays of tags). But FWIW it's the
inconsistency that bothers me most.

 More importantly, consider the

Re: [HACKERS] brin regression test intermittent failures

I wrote:
 The other cases that I found involve cidrcol, and seem to represent
 an actual bug in the brin planning logic, ie failure to disregard a
 no-op cast.  I'll look closer.

I leapt to the wrong conclusion on that one.  The reason for failure to
match to an index column had nothing to do with the extra cast, and
everything to do with the fact that there was no such index column.

I think we're probably good now, though it'd be wise to keep an eye on
chipmunk for awhile to verify that it doesn't see the problem anymore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin regression test intermittent failures

Tom Lane wrote:
 I wrote:
  The other cases that I found involve cidrcol, and seem to represent
  an actual bug in the brin planning logic, ie failure to disregard a
  no-op cast.  I'll look closer.
 
 I leapt to the wrong conclusion on that one.  The reason for failure to
 match to an index column had nothing to do with the extra cast, and
 everything to do with the fact that there was no such index column.

Oops!  Thanks for reviewing this.

 I think we're probably good now, though it'd be wise to keep an eye on
 chipmunk for awhile to verify that it doesn't see the problem anymore.

Will do.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

I'm just skimming here, but if a jsonb_path type is being proposed,
perhaps it would be better not to have operators that take text or
text[] as second argument.  We can provide that functionality with just
functions.  For example, it will be confusing to have 

jsonb 'some json value' - '{foo,bar}'

operate too differently from

jsonb 'some json value' - json_path '{foo,bar}'

And it will be a nasty regression to have 9.5 allow
jsonb 'some json value' - '{foo,bar}'
and then have 9.6 error out with ambiguous operator when the json_path
thing is added.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] two-arg current_setting() with fallback

2015-06-04 Thread Jeevan Chalke

Hi,

Attached patch which fixes my review comments.

Since code changes were good, just fixed reported cosmetic changes.

David, can you please cross check?

Thanks
-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/contrib/tsearch2/tsearch2.c b/contrib/tsearch2/tsearch2.c
index 143dabb..4354c5b 100644
--- a/contrib/tsearch2/tsearch2.c
+++ b/contrib/tsearch2/tsearch2.c
@@ -363,7 +363,7 @@ tsa_tsearch2(PG_FUNCTION_ARGS)
 		tgargs[i + 1] = trigger-tgargs[i];
 
 	tgargs[1] = pstrdup(GetConfigOptionByName(default_text_search_config,
-			  NULL));
+			  NULL, false));
 	tgargs_old = trigger-tgargs;
 	trigger-tgargs = tgargs;
 	trigger-tgnargs++;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c6e3540..7f6c4ad 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16435,7 +16435,7 @@ SELECT collation for ('foo' COLLATE de_DE);
 indexterm
  primarycurrent_setting/primary
 /indexterm
-literalfunctioncurrent_setting(parametersetting_name/parameter)/function/literal
+literalfunctioncurrent_setting(parametersetting_name/parameter [, parametermissing_ok/parameter ])/function/literal
/entry
entrytypetext/type/entry
entryget current value of setting/entry
@@ -16474,7 +16474,9 @@ SELECT collation for ('foo' COLLATE de_DE);
 The function functioncurrent_setting/function yields the
 current value of the setting parametersetting_name/parameter.
 It corresponds to the acronymSQL/acronym command
-commandSHOW/command.  An example:
+commandSHOW/command.  If parametersetting/parameter does not exist,
+throws an error unless parametermissing_ok/parameter is
+literaltrue/literal.  An example:
 programlisting
 SELECT current_setting('datestyle');
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b3c9f14..a43925d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -7137,7 +7137,7 @@ ExtractSetVariableArgs(VariableSetStmt *stmt)
 		case VAR_SET_VALUE:
 			return flatten_set_variable_args(stmt-name, stmt-args);
 		case VAR_SET_CURRENT:
-			return GetConfigOptionByName(stmt-name, NULL);
+			return GetConfigOptionByName(stmt-name, NULL, false);
 		default:
 			return NULL;
 	}
@@ -7206,7 +7206,7 @@ set_config_by_name(PG_FUNCTION_ARGS)
 			 true, 0, false);
 
 	/* get the new current value */
-	new_value = GetConfigOptionByName(name, NULL);
+	new_value = GetConfigOptionByName(name, NULL, false);
 
 	/* Convert return string to text */
 	PG_RETURN_TEXT_P(cstring_to_text(new_value));
@@ -7633,7 +7633,7 @@ GetPGVariableResultDesc(const char *name)
 		const char *varname;
 
 		/* Get the canonical spelling of name */
-		(void) GetConfigOptionByName(name, varname);
+		(void) GetConfigOptionByName(name, varname, false);
 
 		/* need a tuple descriptor representing a single TEXT column */
 		tupdesc = CreateTemplateTupleDesc(1, false);
@@ -7656,7 +7656,7 @@ ShowGUCConfigOption(const char *name, DestReceiver *dest)
 	char	   *value;
 
 	/* Get the value and canonical spelling of name */
-	value = GetConfigOptionByName(name, varname);
+	value = GetConfigOptionByName(name, varname, false);
 
 	/* need a tuple descriptor representing a single TEXT column */
 	tupdesc = CreateTemplateTupleDesc(1, false);
@@ -7740,19 +7740,26 @@ ShowAllGUCConfig(DestReceiver *dest)
 }
 
 /*
- * Return GUC variable value by name; optionally return canonical
- * form of name.  Return value is palloc'd.
+ * Return GUC variable value by name; optionally return canonical form of
+ * name.  If the GUC is unset, then throw an error unless missing_ok is true,
+ * in which case return NULL.  Return value is palloc'd.
  */
 char *
-GetConfigOptionByName(const char *name, const char **varname)
+GetConfigOptionByName(const char *name, const char **varname, bool missing_ok)
 {
 	struct config_generic *record;
 
 	record = find_option(name, false, ERROR);
 	if (record == NULL)
+	{
+		if (missing_ok)
+			return NULL;
+
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
 			   errmsg(unrecognized configuration parameter \%s\, name)));
+	}
+
 	if ((record-flags  GUC_SUPERUSER_ONLY)  !superuser())
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
@@ -8046,12 +8053,42 @@ show_config_by_name(PG_FUNCTION_ARGS)
 	varname = TextDatumGetCString(PG_GETARG_DATUM(0));
 
 	/* Get the value */
-	varval = GetConfigOptionByName(varname, NULL);
+	varval = GetConfigOptionByName(varname, NULL, false);
+
+	/* Convert to text */
+	PG_RETURN_TEXT_P(cstring_to_text(varval));
+}
+
+/*
+ * show_config_by_name_missing_ok - equiv to SHOW X command but implemented as
+ * a function.  If X does not exist, suppress the error and just return NULL
+ * if missing_ok is TRUE.
+ */
+Datum
+show_config_by_name_missing_ok(PG_FUNCTION_ARGS)
+{
+	char	   *varname;
+	bool   missing_ok;
+	char	   *varval;

Re: [CORE] [HACKERS] postpone next week's release

2015-06-04 Thread Heikki Linnakangas


On 05/30/2015 11:47 PM, Andres Freund wrote:

I don't think it's primarily a problem of lack of review; although that
is a large problem.  I think the biggest systematic problem is that the
compound complexity of postgres has increased dramatically over the
years.  Features have added complexity little by little, each not
incrementally not looking that bad.  But very little has been done to
manage complexity. Since 8.0 the codesize has roughly doubled, but
little has been done to manage the increased complexity. Few new
abstractions have been introduced and the structure of the code is
largely the same.

As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it
was ~500 LOC, in master it's ~1500.  The interactions in 8.0 were
complex, they have gotten much more complex since.  It fullfills lots of
different roles, all in one function:

(roughly in the order things happen, but simplified)
* Read the control file/determine whether we crashed
* recovery.conf handling
* backup label handling
* tablespace map handling (huh, I missed that this was added directly to
   StartupXLOG. What a bad idea)
* Determine whether we're doing archive recovery, read the relevant
   checkpoint if so
* relcache init file removal
* timeline switch handling
* Loading the checkpoint we're starting from
* Initialization of a lot of subsystems
* crash recovery/replay
   * Including pgstat, unlogged table, exported snapshot handling
   * iff hot standby, some more subsystems are initialized here
   * hot standby state handling
   * replay process intialization
   * crash replay itself, including
 * progress tracking
 * recovery pause handling
 * nextxid tracking
 * timeline increase handling
 * hot standby state handling
   * unlogged relations handling
   * archive recovery handling
   * creation/initialization of the end of recovery checkpoint
   * timeline increment if failover
* subsystem initialization iff !hot_standby
* end of recovery actions

Yes. that's one routine. And, to make things even funnier, half of that
routine isn't exercised by our tests.

You can argue that this is an outlier, but I don't think so. Heapam, the
planner, etc. have similar cases.

And I think this, to some degree, explains a lot of the multixact
problems. While there were a few simple bugs, most of them were
interactions between the various subsystems that are rather intricate.


I think this explanation is wrong. I agree that there are many places 
that would be good to refactor - like StartupXLOG() - but the multixact 
code was not too bad in that regard. IIRC the patch included some 
refactoring, it added some new helper functions in heapam.c, for 
example. You can argue that it didn't do enough of it, but that was not 
the big issue.


The big issue was at the architecture level. Basically, we liked 
vacuuming of XIDs and clog so much that we decided that it'd be nice if 
you had to vacuum multixids too, in order to not lose data. Many of the 
bugs and issues were not new - we had multixids before - but we upped 
the ante and turned minor locking bugs into data loss. And that had 
nothing to do with the code structure - we'd have similar issues if we 
had rewritten everything java, with the same design.


So, I'm all for refactoring and adding abstractions where it makes 
sense, but it's not going to solve design problems.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] two-arg current_setting() with fallback

2015-06-04 Thread Jeevan Chalke

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I have reviewed the patch.
Here are my review comments:

1. Patch does not apply due to some recent changes in pg_proc.h

2. 
-GetConfigOptionByName(const char *name, const char **varname)
+GetConfigOptionByNameMissingOK(const char *name, const char **varname, bool 
missing_ok)

Will it be better if we keep the name as is and change the callers to pass
false for missing_ok parameter?
It looks weired to have an extra #define just to avoid that.
I see countable callers and thus see NO issues changing those.

3. Oid used for new function is already used. Check unused_oids.sh.

4. Changes in builtins.h are accidental. Need to remove that.


However, code changes looks good and implements the desired feature.


The new status of this patch is: Waiting on Author


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [CORE] [HACKERS] postpone next week's release

On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote:
 I think this explanation is wrong. I agree that there are many places that
 would be good to refactor - like StartupXLOG() - but the multixact code was
 not too bad in that regard. IIRC the patch included some refactoring, it
 added some new helper functions in heapam.c, for example. You can argue that
 it didn't do enough of it, but that was not the big issue.

Yea, but the bugs were more around the interactions to other parts of
the system. Like e.g. crash recovery, which now is about bug 7 or
so. And those are the ones that are hard to understand.

 The big issue was at the architecture level. Basically, we liked vacuuming
 of XIDs and clog so much that we decided that it'd be nice if you had to
 vacuum multixids too, in order to not lose data. Many of the bugs and issues
 were not new - we had multixids before - but we upped the ante and turned
 minor locking bugs into data loss. And that had nothing to do with the code
 structure - we'd have similar issues if we had rewritten everything java,
 with the same design.

I think we're probably just using slightly different terms here - for me
one very good way of fixing some structurally bad things *is* improving
the design.

If you look at the bugs around multixacts: The first few were around
ctid-chaining, hard to find and fix because there's about 8-10 places
implementing it with slight differences.  The next bunch were around
vacuuming, some of them oversights, a good bunch of them more
fundamental. Crash recovery wasn't thought about (lack of
testing/review), and more generally the new code tripped over bad old
decisions (hey, wraparound is ok!).  Then there were a bunch of stupid
bugs in crash-recovery (testing mainly), and larger scale bugs (hey, let's
access stuff during recovery).  Then there's the whole row level locking
code - which is by now among the hardest to understand code in
postgres - and voila it contained a bunch of oversights that were hard
to spot.

So yes, I think nicer code to work with would have prevented us from
making a significant portion of these. It might have also made us
realize earlier how significant the increase in complexity was.

 So, I'm all for refactoring and adding abstractions where it makes sense,
 but it's not going to solve design problems.

I personally don't really see the multixact changes being that bad on
the overall design. It pretty much just extended an earlier design. Now
that wasn't great, but I don't think too many people had realized that
at that point.  The biggest problem was underestimating the complexity.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [CORE] [HACKERS] postpone next week's release

2015-06-04 Thread Simon Riggs

On 30 May 2015 at 05:08, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Fri, May 29, 2015 at 6:33 PM, Andres Freund and...@anarazel.de
 wrote:
  Why? A large portion of the input required to go from beta towards a
  release is from actual users. To see when things break, what confuses
  them and such.

  I have two concerns:

  1. I'm concerned that once we release beta, any idea about reverting a
  feature or fixing something that is broken will get harder, because
  people will say well, we can't do that after we've released a beta.
  I confess to particularly wanting a solution to the item listed as
  custom-join has no way to construct Plan nodes of child Path nodes,
  the history of which I'll avoid recapitulating until I'm sure I can do
  it while maintaining my blood pressure at safe levels.

  2. Also, if we're going to make significant multixact-related changes
  to 9.5 to try to improve reliability, as you proposed on the other
  thread, then it would be nice to do that before beta, so that it gets
  tested.  Of course, someone is bound to point out that we could make
  those changes in time for beta2, and people could test that.  But in
  practice I think that'll just mean that stuff is only out there for
  let's say 2 months before we put it in a major release, which ain't
  much.

 I think your position is completely nuts.  The GROUPING SETS code is
 desperately in need of testing.  The custom-plan code is desperately
 in need of fixing and testing.  The multixact code is desperately
 in need of testing.  The open-items list has several other problems
 besides those.  All of those problems are independent.  If we insist
 on tackling them serially rather than in parallel, 9.5 might not come
 out till 2017.

 I agree that we are not in a position to promise features won't change.
 So let's call it an alpha not a beta --- but for heaven's sake let's
 try to move forward on all these issues, not just some of them.


I think releasing 9.5 in some form NOW will aid its software quality.

We've never linked Beta release date to final release date, so if the
quality proves to be as poor as some people think then the list of bugs
will show that and we release later.

AFAIK beta period is exactly the time when we are allowed to pull features
from the release. I welcome the idea that we test it, if its stable and it
works we release it. If doesn't, we pull it.

Not releasing our software yet making a list of our fears doesn't work
towards a solution. Our fears will make us shout at each other too, so I
for one would rather skip that part and do some practical actions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Further issues with jsonb semantics, documentation

On Thu, Jun 4, 2015 at 12:16 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I'm just skimming here, but if a jsonb_path type is being proposed,
 perhaps it would be better not to have operators that take text or
 text[] as second argument.  We can provide that functionality with just
 functions.  For example, it will be confusing to have

 jsonb 'some json value' - '{foo,bar}'

 operate too differently from

 jsonb 'some json value' - json_path '{foo,bar}'

 And it will be a nasty regression to have 9.5 allow
 jsonb 'some json value' - '{foo,bar}'
 and then have 9.6 error out with ambiguous operator when the json_path
 thing is added.

Fair point, but FWIW I don't think it'll end up being a new type like
json_path -- it'll just be jsonb, as with containment. I can see there
being an operator that performs deletion in a very similar way to how
operator jsonb @ jsonb performs containment (recall that jsonb
containment is a very JSON-ish flavor of containment).

I would like these new-to-9.5 deletion operators to work at the top
level only, like operator jsonb ? text and operator jsonb ?| text,
sharing their idea of a key, __including that string array elements
are keys__. We haven't got a containment-style nested delete operator
for 9.5, but we can hope for it in the future. In the meantime, you
get much of the benefit of that with jsonb_delete(), which *can*
support nested deletion. It does so by buying into the operator jsonb
? text idea of a key (including that string array elements are keys),
although with a twist: the paths text[] right operand operates at
multiple nesting levels (not supporting traversing arrays, as Andrew
implemented it, but OTOH adding support for deleting String array
elements based on the string alone, useful for tag arrays).

If in 9.6 we have something like an operator jsonb @- jsonb operator
for containment style deletion, and a 9.5 era operator jsonb - text
and operator jsonb - text[] pair of operators for existence style
deletion (matching operator jsonb ? text, operating only on the top
level), that will be pretty good. The fact that jsonb_delete() will
have somewhat bridged the gap nesting-deletion-wise for 9.5 (without
being usable through an operator) won't really matter then. I want to
keep the twist I described out of any jsonb operators that are
shipped, and only use it within functions.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2015-06-04 Thread Guillaume Lelarge

2015-01-05 17:44 GMT+01:00 Guillaume Lelarge guilla...@lelarge.info:

 2015-01-05 17:40 GMT+01:00 Robert Haas robertmh...@gmail.com:

 On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'd be all right with putting the data structure declarations in a file
  named something like autovacuum_private.h, especially if it carried an
  annotation that if you depend on this, don't be surprised if we break
  your code in future.

 Works for me.  I am not in general surprised when we do things that
 break my code, or anyway, the code that I'm responsible for
 maintaining.  But I think it makes sense to segregate this into a
 separate header file so that we are clear that it is only exposed for
 the benefit of extension authors, not so that other things in the core
 system can touch it.


 I'm fine with that too. I'll try to find some time to work on that.


So I took a look at this this week. I discovered, with the help of a
coworker, that I can already use the AutoVacuumShmem pointer and read the
struct. Unfortunately, it doesn't give me as much details as I would have
liked. The list of databases and tables aren't in shared memory. They are
local to the process that uses them. Putting them in shared memory (if at
all possible) would imply a much bigger patch than I was willing to write
right now.

Thanks anyway for the help.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

On Thu, Jun 4, 2015 at 10:14 AM, Amit Kapila amit.kapil...@gmail.com
wrote:

 On Thu, Jun 4, 2015 at 1:52 AM, Andrew Dunstan and...@dunslane.net
 wrote:


 On 06/02/2015 11:55 PM, Amit Kapila wrote:

  On Tue, Jun 2, 2015 at 10:26 PM, Andrew Dunstan and...@dunslane.net
 mailto:and...@dunslane.net wrote:

 Well, it seems to me the new function is being altogether way too
 trusting about the nature of what it's being asked to remove. In
 the first place, the S_ISDIR/rmdir branch should only be for
 Windows, and secondly in the other branch we should be checking
 that S_ISLNK is true. It would actually be nice if we could test
 for a junction point on Windows, but that seems to be a bit
 difficult.

 I think during recovery for tablespace related operations, it is
 quite possible to have a directory instead of symlink in some
 special cases (see function TablespaceCreateDbspace() and comments
 in destroy_tablespace_directories() { ..Try to remove the symlink..}).
 Also this new function is being called from
 create_tablespace_directories()
 which uses the code as written in new function, so it doesn't make much
 sense to change it Windows and non-Windows specific code.




 Looking at it again, this might be not as bad as I thought, but I do
 think we should probably call the function something other than
 rmsymlink(). That seems too generic, since it also tries to remove
 directories - albeit that this will fail if the directory isn't empty. And
 I still think we should add a test for S_ISLNK in the second branch. As it
 stands the function could try to unlink anything that's not a directory.
 That might be safe-ish in the context it's used in for the tablespace code,
 but it's far from safe enough for a function that's in src/common


 Okay, as we both seem to agree that it can be mostly used in
 tablespace symlinks context, so I have changed the name to
 remove_tablespace_symlink() and moved the function to
 tablespace.c.  S_ISLINK check is used for non-windows code,
 so not sure adding it here makes any real difference now that
 we have made it specific to tablespace and we might need to
 write small port specific code if we want to add S_ISLINK check.


 Given that the function raises an error on failure, I think it will
 otherwise be OK as is.


 Please find an updated patch attached with this mail.


Sorry, forgot to attach the patch in previous mail, now attaching.

Thanks Bruce for reminding me offlist.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


remove_only_symlinks_during_recovery_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

2015-06-04 Thread Fujii Masao

On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, May 28, 2015 at 9:09 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Since commit de768844, XLogFileCopy of xlog.c returns to caller a
 pstrdup'd string that can be used afterwards for other things.
 XLogFileCopy is used in only one place, and it happens that the result
 string is never freed at all, leaking memory.

That seems to be almost harmless because the startup process will exit
just after calling XLogFileCopy(). No?

 Having a second look at that, XLogFileCopy() is called only in one
 place, and dstfname is never used, hence I think that it would make
 sense to return unconditionally a temporary file name to the caller.

+1

 Also the current error message in case of failure is very C-like:
 elog(ERROR, InstallXLogFileSegment should not have failed);
 I thought that we to use function names in the error messages.
 Wouldn't something like that be more adapted?
 could not copy partial log file or could not copy partial log file
 into temporary segment file

Or we can extend InstallXLogFileSegment so that we can give the log level
to it. When InstallXLogFileSegment is called after XLogFileCopy, we can
give ERROR as the log level and cause an exception to occur if link() or
rename() fails in InstallXLogFileSegment.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

On Thu, Jun 4, 2015 at 2:42 AM, Noah Misch n...@leadboat.com wrote:
 I like that change a lot.  It's much easier to seek forgiveness for wasting =
 28 GiB of disk than for deleting visibility information wrongly.

I'm glad you like it.  I concur.

 2. If setting the offset stop limit (the point where we refuse to
 create new multixact space), we don't arm the stop point.  This means
 that if you're in this situation, you run without member wraparound
 protection until it's corrected.  A message gets logged once per
 checkpoint telling you that you have this problem, and another message
 gets logged when things get straightened out and the guards are
 enabled.

 3. If setting the vacuum force point, we assume that it's appropriate
 to immediately force vacuum.

 Those seem reasonable, too.

Cool.

 I've only tested this very lightly - this is just to see what you and
 Noah and others think of the approach.  As compared with the previous
 approach, it has the advantage of making minimal assumptions about the
 sanity of what's on disk.  It has the disadvantage that, for some
 people, the member-wraparound guard won't be enabled at startup -- but
 note that those people can't start 9.3.7/9.4.2 *at all*, so currently
 they are either running without member wraparound protection anyway
 (if they haven't upgraded to those releases) or they're down entirely.

 That disadvantage is negligible, considering.

All right.

 Another disadvantage is that we'll be triggering what may be quite a
 bit of autovacuum activity for some people, which could be painful.
 On the plus side, they'll hopefully end up with sane relminmxid and
 datminmxid guards afterwards.

 That sounds good so long as each table requires just one successful emergency
 autovacuum.  I'm not seeing code to ensure that the launched autovacuum will
 indeed perform a full-table scan and update relminmxid; is it there?

No.  Oops.

 For sites that can't tolerate an autovacuum storm, what alternative can we
 provide?  Is SET vacuum_multixact_freeze_table_age = 0; VACUUM table of
 every table, done before applying the minor update, sufficient?

I don't know.  In practical terms, they probably need to ensure that
if pg_multixact/offsets/ does not exist, no relations have
relminmxid = 1 and no remaining databases have datminmxid = 1.
Exactly what it will take to get there is possibly dependent on which
minor release you are running; on current minor releases, I am hopeful
that what you propose is sufficient.

  static void
 -DetermineSafeOldestOffset(MultiXactId oldestMXact)
 +DetermineSafeOldestOffset(MultiXactOffset oldestMXact)

 Leftover change from an earlier iteration?  The values passed continue to be
 MultiXactId values.

Oopsie.

   /* move back to start of the corresponding segment */
 - oldestOffset -= oldestOffset %
 - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
 + offsetStopLimit = oldestOffset - (oldestOffset %
 + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
 + /* always leave one segment before the wraparound point */
 + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * 
 SLRU_PAGES_PER_SEGMENT);
 +
 + /* if nothing has changed, we're done */
 + if (prevOffsetStopLimitKnown  offsetStopLimit == prevOffsetStopLimit)
 + return;

   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 - /* always leave one segment before the wraparound point */
 - MultiXactState-offsetStopLimit = oldestOffset -
 - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
 + MultiXactState-offsetStopLimit = oldestOffset;

 That last line needs s/oldestOffset/offsetStopLimit/, I presume.

Another oops.

Thanks for the review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation