date:20150701

On 1 July 2015 at 07:52, Michael Paquier michael.paqu...@gmail.com wrote:

On Wed, Jul 1, 2015 at 12:39 PM, Fujii Masao masao.fu...@gmail.com
wrote:
WAL-related tools, i.e., pg_archivecleanup, pg_resetxlog and
pg_xlogdump don't seem to properly handle .paritial WAL file.
I think that we should fix at least pg_archivecleanup, otherwise,
in the system using pg_archivecleanup to clean up old archived
files, the archived .paritial WAL file will never be removed.
Thought?

+1 to handle it properly in pg_archivecleanup. If people use the
archive cleanup command in recovery.conf and they remove WAL files
before the fork point of promotion they should not see the partial
file as well.

To make pg_archivecleanup detect .paritial WAL file, I'd like to
use IsPartialXLogFileName() macro that commit 179cdd0 added.
Fortunately Michael proposed the patch which makes the macros
in xlog_internal.h usable in WAL-related tools, and it's better
to apply the fix based on his patch. So my plan is to apply his
patch first and then apply the fix to pg_archivecleanup.

http://www.postgresql.org/message-id/CAB7nPqSiKU+8HjVe_J05btonC5E7kvmRaLpGS6EaEQe=dy3...@mail.gmail.com

As we already use extensively XLogFromFileName in a couple of frontend
utilies, I suspect that the buildfarm is not going to blow up, but it
is certainly better to have certitude with a full buildfarm cycle
running on it...

Regarding pg_resetxlog and pg_xlogdump, do we need to change
also them so that they can handle .paritial file properly?
pg_resetxlog cannot clean up .partial file even if it should be
removed. But the remaining .paritial file is harmless and will be
recycled later by checkpoint, so extra treatment of .paritial
file in pg_resetxlog may not be necessary. IOW, maybe we can
document that's the limitation of pg_resetxlog.

I would rather vote for having pg_resetxlog remove the .partial
segment as well. That's a two-liner in KillExistingArchiveStatus and
KillExistingXLOG at quick glance. It looks to me that as pg_resetxlog
is a reset utility, it should do its jib.

Also regarding pg_xlogdump, we can just document, for example,
please get rid of .paritial suffix from the WAL file name if
you want to dump it by pg_xlogdump.

For pg_xlogdump, I am on board for only documenting the limitation
(renaming the file by yourself) as it is after all only a debug tool.
This would also make fuzzy_open_file more complicated by having to
detect two times more potential grammars... That's a bit crazy for a
very narrow use-case.

+1 to all

--
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training Services

[HACKERS] hola-importante

2015-07-01 Thread Milton Labanda

Espero que esto te llegue a tiempo, hice un viaje a Manchester, Inglaterra.
Y se me fue robado el bolso con mi Pasaporte Internacional, Tarjetas de
Crédito dentro. La Embajada está deseando ayudarme con dejarme tomar un
vuelo sin mi Pasaporte, solo que tengo que pagar por el billete y cubrir
las cuentas del Hotel. Para mi desgracia, no puedo acceder a mis fondos sin
las tarjetas de crédito, ya contacte con mi Banco pero necesitan más tiempo
para los procesos y así conseguirme uno nuevo. En esa inoportuna situación
he pensado en pedirte un préstamo rápido de fondo que puedo devolverte tan
pronto que regrese. Realmente necesito estar en el próximo vuelo.
Necesito como 10500 Bolívar para cubrir mis gastos. western union es la
mejor manera de enviar dinero a mí. Puedo enviarte los detalles en cómo
hacer llegar los fondos a mí.

Espero ansiosamente tu respuesta.

Atte.

-- 
/\/\;/-
Milton  Labanda  [miltonlab]
Distro:Linux Mint 17.1
Blog:  http://1000tonlab.wordpress.com
Trabajo:  milaband...@internacional.edu.ec milotn...@jabber.org
El Matrimonio puede rezar así: ´Danos hoy nuestro amor de cada día
Papa Francisco

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

On 2015-07-01 10:51:49 -0400, Tom Lane wrote:
 The problem is that there are multiple risks to manage here.  If I were to
 back-patch that patch, it would actively break any third-party extensions
 that might be using the formerly-considered-valid technique of passing a
 NULL array pointer to these lookup functions.  We don't like breaking
 things in minor releases; that discourages people from updating to new
 minor releases.

Yep, let's just fix it in master (and potentially 9.5, I don't care).

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Raising our compiler requirements for 9.6

Hi,

During the 9.5 cycle, and earlier, the topic of increasing our minimum
bar for compilers came up a bunch of times. Specifically whether we
still should continue to use C90 as a baseline.

I think the time has come to rely at least on some newer features.

At the very least I think we should start to rely on 'static inline's
working. There is not, and hasn't been for a while, any buildfarm animal
that does not support it and we go through some ugly lengths to avoid
relying on inline functions in headers.  It's a feature that has been
there in most compilers long before C99.

My feeling is that we shouldn't go the full way to C99 because there's
still common compilers without a complete coverage. But individual
features are fine.

The list of features, in the order of perceived importance, that might
be worthwhile thinking about are:
* static inline
* variadic macros
* designated initializers (e.g. somestruct foo = { .bar = 3 } )
* // style comments (I don't care, but it comes up often enough ...)

Others might have different items. I think we should *not* decide on all
of them at once. We should pick items that are supported everywhere and
uncontroversial and do those first.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

On Wed, Jul 1, 2015 at 9:14 AM, Andres Freund and...@anarazel.de wrote:
 At the very least I think we should start to rely on 'static inline's
 working. There is not, and hasn't been for a while, any buildfarm animal
 that does not support it and we go through some ugly lengths to avoid
 relying on inline functions in headers.  It's a feature that has been
 there in most compilers long before C99.

 My feeling is that we shouldn't go the full way to C99 because there's
 still common compilers without a complete coverage. But individual
 features are fine.

I am in full agreement.

 The list of features, in the order of perceived importance, that might
 be worthwhile thinking about are:
 * static inline
 * variadic macros
 * designated initializers (e.g. somestruct foo = { .bar = 3 } )
 * // style comments (I don't care, but it comes up often enough ...)

I don't want to add // style comments, FWIW.

What is the state of support like for variadic macros and designated
initializers? Unlike static inline, I am not aware that they are
something that was widely implemented before C99 was formalized.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

On Wed, Jul 1, 2015 at 10:45 AM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jun 30, 2015 at 11:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 I agree it would be cleaner to have a separate CHECK_UNIQUE_XXX code for
 speculative insertions. You've defined CHECK_UNIQUE_SPECULATIVE as like
 CHECK_UNIQUE_PARTIAL, but you don't have to insert the index tuple if
 there's a conflict. I think it'd be better to define it as like
 CHECK_UNIQUE_YES, but return FALSE instead of throwing an error on
 conflict. The difference is that the aminsert would not be allowed to
 return FALSE when there is no conflict.

 Suppose we do it that way. Then what's the difference between
 CHECK_UNIQUE_SPECULATIVE and CHECK_UNIQUE_PARTIAL? You've just
 effectively required the CHECK_UNIQUE_YES case to not physically
 insert a physical tuple before throwing an error, which does not seem
 essential to the existing definition of CHECK_UNIQUE_YES -- you've
 redefined CHECK_UNIQUE_YES in a way that nbtree happens to meet at the
 moment. If we had an amcanunique AM that worked a bit like exclusion
 constraints, this new obligation for CHECK_UNIQUE_YES might make it
 impossible for that to work.

Another more obvious and important thing: CHECK_UNIQUE_YES waits for
conflicts to be resolved before returning to its caller. If you don't
get an error, you're done. CHECK_UNIQUE_PARTIAL never waits, and if we
add a CHECK_UNIQUE_SPECULATIVE, it ought to not wait either.

Sure, if a speculative inserter detects a conflict, it still has to
wait. But not in the aminsert call, and not until it cleans up its
physical insertion (by super-deleting). Clearly a
CHECK_UNIQUE_SPECULATIVE would have to be much closer to
CHECK_UNIQUE_PARTIAL than to CHECK_UNIQUE_YES.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-01 Thread Josh Berkus

All:

Replying to multiple people below.

On 07/01/2015 07:15 AM, Fujii Masao wrote:
 On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus j...@agliodbs.com wrote:
 You're confusing two separate things.  The primary manageability problem
 has nothing to do with altering the parameter.  The main problem is: if
 there is more than one synch candidate, how do we determine *after the
 master dies* which candidate replica was in synch at the time of
 failure?  Currently there is no way to do that.  This proposal plans to,
 effectively, add more synch candidate configurations without addressing
 that core design failure *at all*.  That's why I say that this patch
 decreases overall reliability of the system instead of increasing it.
 
 I agree this is a problem even today, but it's basically independent from
 the proposed feature *itself*. So I think that it's better to discuss and
 work on the problem separately. If so, we might be able to provide
 good way to find new master even if the proposed feature finally fails
 to be adopted.

I agree that they're separate features.  My argument is that the quorum
synch feature isn't materially useful if we don't create some feature to
identify which server(s) were in synch at the time the master died.

The main reason I'm arguing on this thread is that discussion of this
feature went straight into GUC syntax, without ever discussing:

* what use cases are we serving?
* what features do those use cases need?

I'm saying that we need to have that discussion first before we go into
syntax.  We gave up on quorum commit in 9.1 partly because nobody was
convinced that it was actually useful; that case still needs to be
established, and if we can determine *under what circumstances* it's
useful, then we can know if the proposed feature we have is what we want
or not.

Myself, I have two use case for changes to sync rep:

1. the ability to specify a group of three replicas in the same data
center, and have commit succeed if it succeeds on two of them.  The
purpose of this is to avoid data loss even if we lose the master and one
replica.

2. the ability to specify that synch needs to succeed on two replicas in
two different data centers.  The idea here is to be able to ensure
consistency between all data centers.

Speaking of which: how does the proposed patch roll back the commit on
one replica if it fails to get quorum?

On 07/01/2015 07:55 AM, Peter Eisentraut wrote: I respect that some
people might like this, but I don't really see this
 as an improvement.  It's much easier for an administration person or
 program to type out a list of standbys in a text file than having to go
 through these interfaces that are non-idempotent, verbose, and only
 available when the database server is up.  The nice thing about a plain
 and simple system is that you can build a complicated system on top of
 it, if desired.

I'm disagreeing that the proposed system is plain and simple.  What we
have now is simple; anything we try to add on top of it is goign to be
much less so.  Frankly, given the proposed feature, I'm not sure that a
plain and simple implementation is *possible*; it's not a simple problem.

On 07/01/2015 07:58 AM, Sawada Masahiko wrote: On Tue, Jun 30, 2015 at
 We can have same application name servers today, it's like group.
 So there are two problems regarding fail-over:
 1. How can we know which group(set) we should use? (group means
 application_name here)
 2. And how can we decide which a server of that group we should
 promote to the next master server?

Well, one possibility is to have each replica keep a flag which
indicates whether it thinks it's in sync or not.  This flag would be
updated every time the replica sends a sync-ack to the master. There's a
couple issues with that though:

Synch Flag: the flag would need to be WAL-logged or written to disk
somehow on the replica, in case of the situation where the whole data
center shuts down, comes back up, and the master fails on restart.  In
order for the replica to WAL-log this, we'd need to add special .sync
files to pg_xlog, like we currently have .history. Such a file could be
getting updated thousands of times per second, which is potentially an
issue.  We could reduce writes by either synching to disk periodically,
or having the master write the sync state to a catalog, and replicate
it, but ...

Race Condition: there's a bit of a race condition during adverse
shutdown situations which could result in uncertainty, especially in
general data center failures and network failures which might not hit
all servers at the same time. If the master is wal-logging sync state,
this race condition is much worse, because it's pretty much certain that
one message updating sync state would be lost in the event of a master
crash.  Likewise, if we don't log every synch state change, we've
widened the opportunity for a race condition.

 #1, it's one of the big problem, I think.
 I haven't came up with correct solution yet, but we would

Re: [HACKERS] Reducing ClogControlLock contention

On 2015-07-01 11:19:40 +0100, Simon Riggs wrote:
 What tricks are being used??
 
 Please explain why taking 2 locks is bad here, yet works fine elsewhere.

I didn't say anything about 'bad'. It's more complicated than one
lock. Suddenly you have to care about lock ordering and such. The
algorithms for ensuring correctness gets more complicated.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus j...@agliodbs.com wrote:
 On 06/29/2015 01:01 AM, Michael Paquier wrote:
 On Mon, Jun 29, 2015 at 4:20 AM, Josh Berkus j...@agliodbs.com wrote:

 Right.  Well, another reason we should be using a system catalog and not
 a single GUC ...

The problem by using system catalog to configure the synchronous replication
is that even configuration change needs to wait for its WAL record (i.e., caused
by change of system catalog) to be replicated. Imagine the case where you have
one synchronous standby but it does down. To keep the system up, you'd like
to switch the replication mode to asynchronous by changing the corresponding
system catalog. But that change may need to wait until synchronous standby
starts up again and its WAL record is successfully replicated. This means that
you may need to wait forever...

One approach to address this problem is to introduce something like unlogged
system catalog. I'm not sure if that causes another big problem, though...

 You're confusing two separate things.  The primary manageability problem
 has nothing to do with altering the parameter.  The main problem is: if
 there is more than one synch candidate, how do we determine *after the
 master dies* which candidate replica was in synch at the time of
 failure?  Currently there is no way to do that.  This proposal plans to,
 effectively, add more synch candidate configurations without addressing
 that core design failure *at all*.  That's why I say that this patch
 decreases overall reliability of the system instead of increasing it.

I agree this is a problem even today, but it's basically independent from
the proposed feature *itself*. So I think that it's better to discuss and
work on the problem separately. If so, we might be able to provide
good way to find new master even if the proposed feature finally fails
to be adopted.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expending the use of xlog_internal.h's macros

On Wed, Jul 1, 2015 at 8:58 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jul 1, 2015 at 8:18 PM, Fujii Masao wrote:
 On Wed, Jul 1, 2015 at 8:16 PM, Fujii Masao wrote:
 I updated the patch as follows. Patch attached.

 +#define XLogFileNameExtended(fname, tli, log, seg)

 Move this macro to xlog_internal.h because it's used both in
 pg_standby and pg_archivecleanup. There seems no need to
 define it independently.

 OK for me.

 -#define MAXFNAMELEN64
 +#define MAXFNAMELEN64

 Revert this unnecessary change.

 Yes, thanks.


 +/* Length of XLog file name */
 +#define XLOG_DATA_FNAME_LEN 24

 Shorten the name of this macro variable, to XLOG_FNAME_LEN,
 for more code readability.

 Thanks. You have more talent for naming than I do.

 Comments?

 Just reading it again, I think that XLogFileNameById should use
 MAXFNAMELEN, and that XLogFileName should call directly
 XLogFileNameById as both are doing the same operation like in the
 attached.

We can refactor the code that way, but which looks a bit messy
at least to me. The original coding looks simpler and easier-readable,
so I'd like to adopt the original one here.

 It seems also safer to use MAXFNAMELEN instead of MAXPGPATH
 for exclusiveCleanupFileName in pg_standby.c and pg_archivecleanup.c.

Yep.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing ClogControlLock contention

On 2015-07-01 09:08:11 +0100, Simon Riggs wrote:
 On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:
 
  On Tue, Jun 30, 2015 at 12:32 PM, Simon Riggs si...@2ndquadrant.com
  a.  the semantics of new LWLock (CommitLock) introduced
  by patch seems to be different in the sense that it is just taken in
  Exclusive mode (and no Shared mode is required) as per your proposal. We
  could use existing LWLock APi's, but on the other hand we could even
  invent new LWLock API for this kind of locking.
 
 
 LWLock API code is already too complex, so -1 for more changes there

I don't think that's a valid argument. It's better to have the
complexity in one place (lwlock) than have rather similar complexity in
several other places. The clog control lock is far from the only place
that would benefit from tricks along these lines.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expending the use of xlog_internal.h's macros

On Wed, Jul 1, 2015 at 8:18 PM, Fujii Masao wrote:
 On Wed, Jul 1, 2015 at 8:16 PM, Fujii Masao wrote:
 I updated the patch as follows. Patch attached.

 +#define XLogFileNameExtended(fname, tli, log, seg)

 Move this macro to xlog_internal.h because it's used both in
 pg_standby and pg_archivecleanup. There seems no need to
 define it independently.

OK for me.

 -#define MAXFNAMELEN64
 +#define MAXFNAMELEN64

 Revert this unnecessary change.

Yes, thanks.


 +/* Length of XLog file name */
 +#define XLOG_DATA_FNAME_LEN 24

 Shorten the name of this macro variable, to XLOG_FNAME_LEN,
 for more code readability.

Thanks. You have more talent for naming than I do.

 Comments?

Just reading it again, I think that XLogFileNameById should use
MAXFNAMELEN, and that XLogFileName should call directly
XLogFileNameById as both are doing the same operation like in the
attached. It seems also safer to use MAXFNAMELEN instead of MAXPGPATH
for exclusiveCleanupFileName in pg_standby.c and pg_archivecleanup.c.
-- 
Michael
diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
index 2f9f2b4..861caea 100644
--- a/contrib/pg_standby/pg_standby.c
+++ b/contrib/pg_standby/pg_standby.c
@@ -32,6 +32,8 @@
 
 #include pg_getopt.h
 
+#include access/xlog_internal.h
+
 const char *progname;
 
 /* Options and defaults */
@@ -57,7 +59,7 @@ char	   *restartWALFileName; /* the file from which we can restart restore */
 char	   *priorWALFileName;	/* the file we need to get from archive */
 char		WALFilePath[MAXPGPATH];		/* the file path including archive */
 char		restoreCommand[MAXPGPATH];	/* run this to restore */
-char		exclusiveCleanupFileName[MAXPGPATH];		/* the file we need to
+char		exclusiveCleanupFileName[MAXFNAMELEN];		/* the file we need to
 		 * get from archive */
 
 /*
@@ -113,11 +115,6 @@ struct stat stat_buf;
  *	folded in to later versions of this program.
  */
 
-#define XLOG_DATA_FNAME_LEN		24
-/* Reworked from access/xlog_internal.h */
-#define XLogFileName(fname, tli, log, seg)	\
-	snprintf(fname, XLOG_DATA_FNAME_LEN + 1, %08X%08X%08X, tli, log, seg)
-
 /*
  *	Initialize allows customized commands into the warm standby program.
  *
@@ -182,10 +179,7 @@ CustomizableNextWALFileReady()
 		 * If it's a backup file, return immediately. If it's a regular file
 		 * return only if it's the right size already.
 		 */
-		if (strlen(nextWALFileName)  24 
-			strspn(nextWALFileName, 0123456789ABCDEF) == 24 
-		strcmp(nextWALFileName + strlen(nextWALFileName) - strlen(.backup),
-			   .backup) == 0)
+		if (IsBackupHistoryFileName(nextWALFileName))
 		{
 			nextWALFileType = XLOG_BACKUP_LABEL;
 			return true;
@@ -261,8 +255,7 @@ CustomizableCleanupPriorWALFiles(void)
  * are not removed in the order they were originally written,
  * in case this worries you.
  */
-if (strlen(xlde-d_name) == XLOG_DATA_FNAME_LEN 
-	strspn(xlde-d_name, 0123456789ABCDEF) == XLOG_DATA_FNAME_LEN 
+if (IsXLogFileName(xlde-d_name) 
   strcmp(xlde-d_name + 8, exclusiveCleanupFileName + 8)  0)
 {
 #ifdef WIN32
@@ -366,7 +359,7 @@ SetWALFileNameForCleanup(void)
 		}
 	}
 
-	XLogFileName(exclusiveCleanupFileName, tli, log, seg);
+	XLogFileNameById(exclusiveCleanupFileName, tli, log, seg);
 
 	return cleanup;
 }
@@ -740,10 +733,7 @@ main(int argc, char **argv)
 	 * Check for initial history file: always the first file to be requested
 	 * It's OK if the file isn't there - all other files need to wait
 	 */
-	if (strlen(nextWALFileName)  8 
-		strspn(nextWALFileName, 0123456789ABCDEF) == 8 
-		strcmp(nextWALFileName + strlen(nextWALFileName) - strlen(.history),
-			   .history) == 0)
+	if (IsTLHistoryFileName(nextWALFileName))
 	{
 		nextWALFileType = XLOG_HISTORY;
 		if (RestoreWALFileForRecovery())
diff --git a/src/bin/pg_archivecleanup/pg_archivecleanup.c b/src/bin/pg_archivecleanup/pg_archivecleanup.c
index ba6e242..579a9bb 100644
--- a/src/bin/pg_archivecleanup/pg_archivecleanup.c
+++ b/src/bin/pg_archivecleanup/pg_archivecleanup.c
@@ -21,6 +21,8 @@
 
 #include pg_getopt.h
 
+#include access/xlog_internal.h
+
 const char *progname;
 
 /* Options and defaults */
@@ -31,7 +33,7 @@ char	   *additional_ext = NULL;		/* Extension to remove from filenames */
 char	   *archiveLocation;	/* where to find the archive? */
 char	   *restartWALFileName; /* the file from which we can restart restore */
 char		WALFilePath[MAXPGPATH];		/* the file path including archive */
-char		exclusiveCleanupFileName[MAXPGPATH];		/* the oldest file we
+char		exclusiveCleanupFileName[MAXFNAMELEN];		/* the oldest file we
 		 * want to remain in
 		 * archive */
 
@@ -51,12 +53,6 @@ char		exclusiveCleanupFileName[MAXPGPATH];		/* the oldest file we
  *	folded in to later versions of this program.
  */
 
-#define XLOG_DATA_FNAME_LEN		24
-/* Reworked from access/xlog_internal.h */
-#define XLogFileName(fname, tli, log, seg)	\
-	snprintf(fname, XLOG_DATA_FNAME_LEN + 1,

Re: [HACKERS] pg_rewind failure by file deletion in source server

On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 06/29/2015 09:44 AM, Michael Paquier wrote:

 On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:

 But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
 it
 would be enough to special-case pg_xlog for now.


 Well, sure, pg_rewind does not copy the soft links either way. Now it
 would be nice to have an option to be able to recreate the soft link
 of at least pg_xlog even if it can be scripted as well after a run.


 Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
 any non-trivial scenarios, just copying all the files from pg_xlog isn't
 enough anyway, and you need to set up a recovery.conf after running
 pg_rewind that contains a restore_command or primary_conninfo, to fetch the
 WAL. So you can argue that by not copying pg_xlog automatically, we're
 actually doing a favour to the DBA, by forcing him to set up the
 recovery.conf file correctly. Because if you just test simple scenarios
 where not much time has passed between the failover and running pg_rewind,
 it might be enough to just copy all the WAL currently in pg_xlog, but it
 would not be enough if more time had passed and not all the required WAL is
 present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
 copying, as restore_command or streaming replication will only copy what's
 needed, while pg_rewind would copy all WAL it can find the target's data
 directory.

 pg_basebackup also doesn't include any WAL, unless you pass the --xlog
 option. It would be nice to also add an optional --xlog option to pg_rewind,
 but with pg_rewind it's possible that all the required WAL isn't present in
 the pg_xlog directory anymore, so you wouldn't always achieve the same
 effect of making the backup self-contained.

 So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
 in both the source and the target.

If pg_xlog is simply ignored, some old WAL files may remain in target server.
Don't these old files cause the subsequent startup of target server as new
standby to fail? That is, it's the case where the WAL file with the same name
but different content exist both in target and source. If that's harmfull,
pg_rewind also should remove the files in pg_xlog of target server.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing ClogControlLock contention

On Wed, Jul 1, 2015 at 1:38 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:


 I think it will be better to partition it or use it in some other way to
avoid
 two concurrent writers block at it, however if you want to first see the
 test results with this, then that is also okay.


 Many updates would be on same page, so partitioning it would need to be
at least 4-way to be worth doing. Maybe we could stripe into 512 bye pages.


Sure, it makes sense to try that way, once you have that ready, I can
try this out along with ProcArrayLock patch to see the impact.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Expending the use of xlog_internal.h's macros

On Wed, Jun 10, 2015 at 2:41 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi all,

 While looking at the code of pg_archivecleanup.c, I noticed that there
 is some code present to detect if a given string has the format of a
 WAL segment file name or of a backup file.
 The recent commit 179cdd09 has introduced in xlog_internal.h a set of
 macros to facilitate checks of pg_xlog's name format:
 IsPartialXLogFileName(), IsXLogFileName() and IsTLHistoryFileName().

 And by looking at the code, there are some utilities where we could
 make use of that, like pg_resetxlog, pg_archivecleanup and pg_standby.

 Attached is a small refactoring patch doing so for HEAD.

Thanks for the patch!

I updated the patch as follows. Patch attached.

+#define XLogFileNameExtended(fname, tli, log, seg)

Move this macro to xlog_internal.h because it's used both in
pg_standby and pg_archivecleanup. There seems no need to
define it independently.

-#define MAXFNAMELEN64
+#define MAXFNAMELEN64

Revert this unnecessary change.

+/* Length of XLog file name */
+#define XLOG_DATA_FNAME_LEN 24

Shorten the name of this macro variable, to XLOG_FNAME_LEN,
for more code readability.

Comments?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing ClogControlLock contention

On 1 July 2015 at 11:14, Andres Freund and...@anarazel.de wrote:

 On 2015-07-01 09:08:11 +0100, Simon Riggs wrote:
  On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:
 
   On Tue, Jun 30, 2015 at 12:32 PM, Simon Riggs si...@2ndquadrant.com
   a.  the semantics of new LWLock (CommitLock) introduced
   by patch seems to be different in the sense that it is just taken in
   Exclusive mode (and no Shared mode is required) as per your proposal.
 We
   could use existing LWLock APi's, but on the other hand we could even
   invent new LWLock API for this kind of locking.
  
 
  LWLock API code is already too complex, so -1 for more changes there

 I don't think that's a valid argument. It's better to have the
 complexity in one place (lwlock) than have rather similar complexity in
 several other places. The clog control lock is far from the only place
 that would benefit from tricks along these lines.


What tricks are being used??

Please explain why taking 2 locks is bad here, yet works fine elsewhere.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Parallel Seq Scan

On Wed, Jul 1, 2015 at 1:21 PM, Jeff Davis pg...@j-davis.com wrote:

 On Wed, 2015-07-01 at 11:07 +0530, Amit Kapila wrote:

  For what you are asking to change name for?

 There are still some places, at least in the comments, that call it a
 parallel sequential scan.


In the initial version of patch, there was only one node parallel seqscan
node and the occurrences you are seeing are left over's, I will change
them in next patch.


  a. Infrastructure for parallel execution, like some of the stuff in
  execparallel.c, heapam.c,tqueue.c, etc and all other generic
  (non-nodes specific) code.

 Did you consider passing tuples through the tqueue by reference rather
 than copying? The page should be pinned by the worker process, but
 perhaps that's a bad assumption to make?


Yes, IIRC there was some discussion happened and I haven't used for
the reason you mentioned.  It doesn't same sane to hold the pin on
page for long time (we need to retain the pin till master backend processes
that tuple).



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Reducing ClogControlLock contention

On 1 July 2015 at 11:11, Amit Kapila amit.kapil...@gmail.com wrote:

 On Wed, Jul 1, 2015 at 1:38 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:
 
 
  I think it will be better to partition it or use it in some other way
 to avoid
  two concurrent writers block at it, however if you want to first see the
  test results with this, then that is also okay.
 
 
  Many updates would be on same page, so partitioning it would need to be
 at least 4-way to be worth doing. Maybe we could stripe into 512 bye pages.
 

 Sure, it makes sense to try that way, once you have that ready, I can
 try this out along with ProcArrayLock patch to see the impact.


Seems sensible to measure what the new point of contention is with both
before doing anything further.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Unneeded NULL-pointer check in FreeSpaceMapTruncateRel

On Tue, Jun 30, 2015 at 9:57 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-06-30 21:53:07 +0900, Michael Paquier wrote:
 In the category of nitpicky-code-style-issues, FreeSpaceMapTruncateRel
 is doing a NULL-pointer check for something that has been dereferenced
 on all the code paths leading to this check.
 (Yes, that's not interesting for common humans, Coverity sees things
 based on correctness).

 Can you, in the future, batch these together into one thread, perhaps
 with one message below an introductory one for each patch? Unless
 they'll get applied immediately it'll be hard to follow the different
 threads.

I'll try.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and replication slots

On Wed, Jul 1, 2015 at 10:33 AM, Peter Eisentraut wrote:
 (If you're looking at the patch and wondering why there is no code to
 actually do anything with the replication slot, that's because the code
 that does the WAL streaming is already aware of replication slots
 because of the pg_receivexlog functionality that it also serves.  So all
 that needs to be done is set replication_slot.)

This is cool to see this patch taking shape.

+my ($stdout, $stderr);
+run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-' ],
'', \$sql, '', \$stdout, '2', \$stderr or die;
+chomp $stdout;
+return $stdout;

Could it be possible to chomp and return $stderr as well here? It
seems useful to me to perform sanity checks after calling psql.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop/truncate table sucks for large values of shared buffers

On 1 July 2015 at 15:39, Amit Kapila amit.kapil...@gmail.com wrote:


 Okay. I think we can maintain the list in similar way as we do for
 UNLINK_RELATION_REQUEST in RememberFsyncRequest(), but
 why to wait till 64 tables?


I meant once per checkpoint cycle OR every N tables, whichever is sooner. N
would need to vary according to size of Nbuffers.


 We already scan whole buffer list in each
 checkpoint cycle, so during that scan we can refer this dropped relation
 list and avoid syncing such buffer contents.  Also for ENOENT error
 handling for FileWrite, we can use this list to refer relations for which
 we
 need to ignore the error.  I think we are already doing something similar
 in
 mdsync to avoid the problem of Dropped tables, so it seems okay to
 have it in mdwrite as well.

 The crucial thing in this idea to think about is avoiding reassignment of
 relfilenode (due to wrapped OID's) before we have ensured that none of
 the buffers contains tag for that relfilenode.  Currently we avoid this for
 Fsync case by retaining the first segment of relation (which will avoid
 reassignment of relfilenode) till checkpoint ends, I think if we just
 postpone
 it till we have validated it in shared_buffers, then we can avoid this
 problem
 in new scheme and it should be delay of maximum one checkpoint cycle
 for unlinking such file assuming we refer dropped relation list in each
 checkpoint
 cycle during buffer scan.


Yes

So you are keeping more data around for longer, right? I think we would
need some way to trigger a scan when the amount of deferred dropped data
files hits a certain size.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-01 Thread Sawada Masahiko

On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It would
 also match pg_clog, which uses two bits per transaction --- maybe we can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. 
 Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good source
 for cut/paste/whack.


 I agree with adding a bit that indicates corresponding page is
 all-frozen into VM, just like CLOG.
 I'll change the patch as second version patch.


 The second patch is attached.

 In second patch, I added a bit that indicates all tuples in page are
 completely frozen into visibility map.
 The visibility map became a bitmap with two bit per heap page:
 all-visible and all-frozen.
 The logics around vacuum, insert/update/delete heap are almost same as
 previous version.

 This patch lack some point: documentation, comment in source code,
 etc, so it's WIP patch yet,
 but I think that it's enough to discuss about this.


 The previous patch is no longer applied cleanly to HEAD.
 The attached v2 patch is latest version.

 Please review it.

Attached new rebased version patch.
Please give me comments!

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 7/1/15 10:15 AM, Fujii Masao wrote:
 One approach to address this problem is to introduce something like unlogged
 system catalog. I'm not sure if that causes another big problem, though...

Yeah, like the data disappearing after a crash. ;-)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-01 Thread Sawada Masahiko

On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus j...@agliodbs.com wrote:
 On 06/29/2015 01:01 AM, Michael Paquier wrote:

 You're confusing two separate things.  The primary manageability problem
 has nothing to do with altering the parameter.  The main problem is: if
 there is more than one synch candidate, how do we determine *after the
 master dies* which candidate replica was in synch at the time of
 failure?  Currently there is no way to do that.  This proposal plans to,
 effectively, add more synch candidate configurations without addressing
 that core design failure *at all*.  That's why I say that this patch
 decreases overall reliability of the system instead of increasing it.

 When I set up synch rep today, I never use more than two candidate synch
 servers because of that very problem.  And even with two I have to check
 replay point because I have no way to tell which replica was in-sync at
 the time of failure.  Even in the current limited feature, this
 significantly reduces the utility of synch rep.  In your proposal, where
 I could have multiple synch rep groups in multiple geos, how on Earth
 could I figure out what to do when the master datacenter dies?

We can have same application name servers today, it's like group.
So there are two problems regarding fail-over:
1. How can we know which group(set) we should use? (group means
application_name here)
2. And how can we decide which a server of that group we should
promote to the next master server?

#1, it's one of the big problem, I think.
I haven't came up with correct solution yet, but we would need to know
which server(group) is the best for promoting
without the running old master server.
For example, improving pg_stat_replication view. or the mediation
process always check each progress of standby.

#2, I guess the best solution is that the DBA can promote any server of group.
That is, DBA always can promote server without considering state of
server of that group.
It's not difficult, always using lowest LSN of a group as group LSN.


 BTW, ALTER SYSTEM is a strong reason to use JSON for the synch rep GUC
 (assuming it's one parameter) instead of some custom syntax.  If it's
 JSON, we can validate it in psql, whereas if it's some custom syntax we
 have to wait for the db to reload and fail to figure out that we forgot
 a comma.  Using JSON would also permit us to use jsonb_set and
 jsonb_delete to incrementally change the configuration.

Sounds convenience and flexibility. I agree with this json format
parameter only if we don't combine both quorum and prioritization.
Because of  backward compatibility.
I tend to use json format value and it's new separated GUC parameter.
Anyway, if we use json, I'm imaging parameter values like below.
{
group1 : {
quorum : 1,
standbys : [
{
a : {
quorum : 2,
standbys : [
c, d
]
}
},
b
]
}
}


 Question: what happens *today* if we have two different synch rep
 strings in two different *.conf files?  I wouldn't assume that anyone
 has tested this ...

We use last defied parameter even if sync rep strings in several file, right?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On 30 April 2015 at 12:07, Sawada Masahiko sawada.m...@gmail.com wrote:


 This patch lack some point: documentation, comment in source code,
 etc, so it's WIP patch yet,
 but I think that it's enough to discuss about this.


Code comments exist to indicate the intention of sections of code. They are
essential for reviewers, not a cosmetic thing to be added later. To gain
wide agreement we need wide understanding. (I recommend a development
approach where you write the comments first, then add code later.)

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 6/26/15 1:46 AM, Michael Paquier wrote:
 - k(elt1,elt2,eltN) means that we need for the k elements in the set
 to return true (aka commit confirmation).
 - k[elt1,elt2,eltN] means that we need for the first k elements in the
 set to return true.

I think the difference between (...) and [...] is not intuitive.  To me,
{...} would be more intuitive to indicate order does not matter.

 When k is not defined for a group, k = 1.

How about putting it at the end?  Like

[foo,bar,baz](2)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop/truncate table sucks for large values of shared buffers

On Tue, Jun 30, 2015 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 30 June 2015 at 07:34, Amit Kapila amit.kapil...@gmail.com wrote:

 On Tue, Jun 30, 2015 at 11:00 AM, Simon Riggs si...@2ndquadrant.com
wrote:
 
  On 30 June 2015 at 05:02, Amit Kapila amit.kapil...@gmail.com wrote:
 
  On Mon, Jun 29, 2015 at 7:18 PM, Simon Riggs si...@2ndquadrant.com
wrote:
  
   On 28 June 2015 at 17:17, Tom Lane t...@sss.pgh.pa.us wrote:
  
   If lseek fails badly then SeqScans would give *silent* data loss,
which in my view is worse. Just added pages aren't the only thing we might
miss if lseek is badly wrong.
  
 
  So for the purpose of this patch, do we need to assume that
  lseek can give us wrong size of file and we should add preventive
  checks and other handling for the same?
  I am okay to change that way, if we are going to have that as
assumption
  in out code wherever we are using it or will use it in-future,
otherwise
  we will end with some preventive checks which are actually not
required.
 
 
  They're preventative checks. You always hope it is wasted effort.
 

 I am not sure if Preventative checks (without the real need) are okay if
they
 are not-cheap which could happen in this case.  I think Validating
buffer-tag
 would require rel or sys cache lookup.


 True, so don't do that.

 Keep a list of dropped relations and have the checkpoint process scan the
buffer pool every 64 tables, kinda like AbsorbFsync


Okay. I think we can maintain the list in similar way as we do for
UNLINK_RELATION_REQUEST in RememberFsyncRequest(), but
why to wait till 64 tables?  We already scan whole buffer list in each
checkpoint cycle, so during that scan we can refer this dropped relation
list and avoid syncing such buffer contents.  Also for ENOENT error
handling for FileWrite, we can use this list to refer relations for which we
need to ignore the error.  I think we are already doing something similar in
mdsync to avoid the problem of Dropped tables, so it seems okay to
have it in mdwrite as well.

The crucial thing in this idea to think about is avoiding reassignment of
relfilenode (due to wrapped OID's) before we have ensured that none of
the buffers contains tag for that relfilenode.  Currently we avoid this for
Fsync case by retaining the first segment of relation (which will avoid
reassignment of relfilenode) till checkpoint ends, I think if we just
postpone
it till we have validated it in shared_buffers, then we can avoid this
problem
in new scheme and it should be delay of maximum one checkpoint cycle
for unlinking such file assuming we refer dropped relation list in each
checkpoint
cycle during buffer scan.

Does that make sense?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 6/26/15 1:12 PM, Josh Berkus wrote:
 If we're going to do quorum, multi-set synchrep, then we need to have a
 real management interface.  Like, we really ought to have a system
 catalog and some built in functions to manage this instead, e.g.
 
 pg_add_synch_set(set_name NAME, quorum INT, set_members VARIADIC)
 
 pg_add_synch_set('bolivia', 1, 'bsrv-2,'bsrv-3','bsrv-5')
 
 pg_modify_sync_set(quorum INT, set_members VARIADIC)
 
 pg_drop_synch_set(set_name NAME)

I respect that some people might like this, but I don't really see this
as an improvement.  It's much easier for an administration person or
program to type out a list of standbys in a text file than having to go
through these interfaces that are non-idempotent, verbose, and only
available when the database server is up.  The nice thing about a plain
and simple system is that you can build a complicated system on top of
it, if desired.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Macro nesting hell

2015-07-01 Thread Alvaro Herrera

Tom Lane wrote:
 Last night my ancient HP compiler spit up on HEAD:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pademelondt=2015-07-01%2001%3A30%3A18
 complaining thus:
 cpp: brin_pageops.c, line 626: error 4018: Macro param too large after 
 substitution - use -H option.
 I was able to revive pademelon by adding a new compiler flag as suggested,
 but after looking at what the preprocessor is emitting, I can't say that
 I blame it for being unhappy.  This simple-looking line
 
 Assert(BRIN_IS_REGULAR_PAGE(BufferGetPage(oldbuf)));
 
 is expanding to this:

Wow, that's kind of amazing.  I think this particular case boils down to
just PageGetSpecialPointer (bufpage.h) and BufferGetBlock (bufmgr.h).

 I'm thinking we really ought to mount a campaign to replace some of these
 macros with inlined-if-possible functions.

My guess is that changing a very small amount of them will do a large
enough portion of the job.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and replication slots

On 7/1/15 8:37 AM, Michael Paquier wrote:
 On Wed, Jul 1, 2015 at 10:33 AM, Peter Eisentraut wrote:
 (If you're looking at the patch and wondering why there is no code to
 actually do anything with the replication slot, that's because the code
 that does the WAL streaming is already aware of replication slots
 because of the pg_receivexlog functionality that it also serves.  So all
 that needs to be done is set replication_slot.)
 
 This is cool to see this patch taking shape.
 
 +my ($stdout, $stderr);
 +run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-' ],
 '', \$sql, '', \$stdout, '2', \$stderr or die;
 +chomp $stdout;
 +return $stdout;
 
 Could it be possible to chomp and return $stderr as well here? It
 seems useful to me to perform sanity checks after calling psql.

Sure, makes sense.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

Piotr Stefaniak postg...@piotr-stefaniak.me writes:
 On 06/27/2015 11:47 PM, Tom Lane wrote:
 Given the utter lack of any evidence that this actually causes any
 problems in the field, I don't feel a need to back-patch this change.

 I'm under the impression that you don't care about not avoiding 
 undefined behavior as much as you care about solving real problems 
 caused by it, whenever they show up in a report from one platform or 
 another, or worse - when it's too late and somebody has reported an 
 actual program misbehavior. The problem with that kind of thinking is 
 that bugs caused by aggressive compiler optimizations taking advantage 
 of invoking UB are a moving target (since compilers come and go, and the 
 existing ones evolve) while the list of things not to do is constant and 
 mostly clearly defined by the standard.

The problem is that there are multiple risks to manage here.  If I were to
back-patch that patch, it would actively break any third-party extensions
that might be using the formerly-considered-valid technique of passing a
NULL array pointer to these lookup functions.  We don't like breaking
things in minor releases; that discourages people from updating to new
minor releases.

As against that, we have exactly no reports of any field problems, and a
look at the two parse_func.c functions affected shows no reason to think
that there will ever be any; neither of them do anything much with their
argtypes argument except pass it to memcmp and other functions.  So even
if the compiler did assume that argtypes couldn't be NULL, there would not
be much it could do with the assumption.

So my judgement is that the risks of back-patching this outweigh any
likely benefit.  When and if some toolchain manages to actually break
things here, I could be proven wrong --- but I doubt that will happen
before 9.4 and earlier are out of support.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Macro nesting hell

Last night my ancient HP compiler spit up on HEAD:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pademelondt=2015-07-01%2001%3A30%3A18
complaining thus:
cpp: brin_pageops.c, line 626: error 4018: Macro param too large after 
substitution - use -H option.
I was able to revive pademelon by adding a new compiler flag as suggested,
but after looking at what the preprocessor is emitting, I can't say that
I blame it for being unhappy.  This simple-looking line

Assert(BRIN_IS_REGULAR_PAGE(BufferGetPage(oldbuf)));

is expanding to this:

do { if (!(BrinSpecialSpace *) ( ((void) ((bool) (! (!(((const 
void*)(((Page)( ((void) ((bool) (! (!(( ((void) ((bool) (! (!((oldbuf) = 
NBuffers  (oldbuf) = -NLocBuffer)) || (ExceptionalCondition(!((oldbuf) = 
NBuffers  (oldbuf) = -NLocBuffer), (FailedAssertion), brin_pageops.c, 
626), 0, (oldbuf) != 0 ))) || (ExceptionalCondition(!(( ((void) ((bool) (! 
(!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) || 
(ExceptionalCondition(\!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)\, 
(\FailedAssertion\), \brin_pageops.c\, 626), 0, (oldbuf) != 0 )), 
(FailedAssertion), brin_pageops.c, 626), 0, ((oldbuf)  0) ? 
LocalBufferBlockPointers[-(oldbuf) - 1] : (Block) (BufferBlocks + ((Size) 
((oldbuf) - 1)) * 8192) ))) != ((void *)0 || 
(ExceptionalCondition(!(((const void*)(((Page)( ((void) ((bool) (! (!(( 
((void) ((bool) (! (!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) || 
(ExceptionalCondition(\!((oldbuf) = NBuffers  (oldbuf) !
 = -NLocBuffer)\, (\FailedAssertion\), \brin_pageops.c\, 626), 0, 
 (oldbuf) != 0 ))) || (ExceptionalCondition(\!(( ((void) ((bool) (! 
 (!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) || 
 (ExceptionalCondition(\\\!((oldbuf) = NBuffers  (oldbuf) = 
 -NLocBuffer)\\\, (\\\FailedAssertion\\\), \\\brin_pageops.c\\\, 626), 
 0, (oldbuf) != 0 ))\, (\FailedAssertion\), \brin_pageops.c\, 626), 
 0, ((oldbuf)  0) ? LocalBufferBlockPointers[-(oldbuf) - 1] : (Block) 
 (BufferBlocks + ((Size) ((oldbuf) - 1)) * 8192) ))) != ((void *)0))), 
 (FailedAssertion), brin_pageops.c, 626), 0, ((void) ((bool) (! 
 (!(((PageHeader) (((Page)( ((void) ((bool) (! (!(( ((void) ((bool) (! 
 (!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) || 
 (ExceptionalCondition(!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer), 
 (FailedAssertion), brin_pageops.c, 626), 0, (oldbuf) != 0 ))) || 
 (ExceptionalCondition(!(( ((void) ((bool) (! (!((oldbuf) = NBuffers  
 (oldbuf) !
 = -NLocBuffer)) || (ExceptionalCondition(\!((oldbuf) = NBuffers  (oldbuf) 
= -NLocBuffer)\, (\FailedAssertion\), \brin_pageops.c\, 626), 0, 
(oldbuf) != 0 )), (FailedAssertion), brin_pageops.c, 626), 0, 
((oldbuf)  0) ? LocalBufferBlockPointers[-(oldbuf) - 1] : (Block) 
(BufferBlocks + ((Size) ((oldbuf) - 1)) * 8192) -pd_special = 8192)) || 
(ExceptionalCondition(!(((PageHeader) (((Page)( ((void) ((bool) (! (!(( 
((void) ((bool) (! (!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) || 
(ExceptionalCondition(\!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)\, 
(\FailedAssertion\), \brin_pageops.c\, 626), 0, (oldbuf) != 0 ))) || 
(ExceptionalCondition(\!(( ((void) ((bool) (! (!((oldbuf) = NBuffers  
(oldbuf) = -NLocBuffer)) || (ExceptionalCondition(\\\!((oldbuf) = NBuffers 
 (oldbuf) = -NLocBuffer)\\\, (\\\FailedAssertion\\\), 
\\\brin_pageops.c\\\, 626), 0, (oldbuf) != 0 ))\, (\FailedAssertion\), 
\brin_pageops.c\, 626), 0, 
 ((oldbuf)  0) ? LocalBufferBlockPointers[-(oldbuf) - 1] : (Bl!
 ock) (BufferBlocks + ((Size) ((oldbuf) - 1)) * 8192) -pd_special = 
8192), (FailedAssertion), brin_pageops.c, 626), 0, ((void) ((bool) (! 
(!(((PageHeader) (((Page)( ((void) ((bool) (! (!(( ((void) ((bool) (! 
(!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) || 
(ExceptionalCondition(!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer), 
(FailedAssertion), brin_pageops.c, 626), 0, (oldbuf) != 0 ))) || 
(ExceptionalCondition(!(( ((void) ((bool) (! (!((oldbuf) = NBuffers  
(oldbuf) = -NLocBuffer)) || (ExceptionalCondition(\!((oldbuf) = NBuffers  
(oldbuf) = -NLocBuffer)\, (\FailedAssertion\), \brin_pageops.c\, 626), 
0, (oldbuf) != 0 )), (FailedAssertion), brin_pageops.c, 626), 0, 
((oldbuf)  0) ? LocalBufferBlockPointers[-(oldbuf) - 1] : (Block) 
(BufferBlocks + ((Size) ((oldbuf) - 1)) * 8192) -pd_special = 
(__builtin_offsetof (PageHeaderData, pd_linp || 
(ExceptionalCondition(!(((PageHeader) (((Page)( ((void) ((bool) (!!
  (!(( ((void) ((bool) (! (!((oldbuf) = NBuffers  (oldbuf) = -NLocBuffer)) 
|| (ExceptionalCondition(\!((oldbuf) = NBuffers  (oldbuf) = 
-NLocBuffer)\, (\FailedAssertion\), \brin_pageops.c\, 626), 0, 
(oldbuf) != 0 ))) || (ExceptionalCondition(\!(( ((void) ((bool) (! (!((oldbuf) 
= NBuffers  (oldbuf) = -NLocBuffer)) || 
(ExceptionalCondition(\\\!((oldbuf) = NBuffers  (oldbuf) = 
-NLocBuffer)\\\, (\\\FailedAssertion\\\), \\\brin_pageops.c\\\, 626), 
0, (oldbuf) != 0 ))\, (\FailedAssertion\), \brin_pageops.c\,

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 6/26/15 2:53 PM, Josh Berkus wrote:
 I would also suggest that if I lose this battle and
 we decide to go with a single stringy GUC, that we at least use JSON
 instead of defining our out, proprietary, syntax?

Does JSON have a natural syntax for a set without order?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Thu, Jul 2, 2015 at 12:13 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com 
 wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It would
 also match pg_clog, which uses two bits per transaction --- maybe we can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. 
 Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good 
 source
 for cut/paste/whack.


 I agree with adding a bit that indicates corresponding page is
 all-frozen into VM, just like CLOG.
 I'll change the patch as second version patch.


 The second patch is attached.

 In second patch, I added a bit that indicates all tuples in page are
 completely frozen into visibility map.
 The visibility map became a bitmap with two bit per heap page:
 all-visible and all-frozen.
 The logics around vacuum, insert/update/delete heap are almost same as
 previous version.

 This patch lack some point: documentation, comment in source code,
 etc, so it's WIP patch yet,
 but I think that it's enough to discuss about this.


 The previous patch is no longer applied cleanly to HEAD.
 The attached v2 patch is latest version.

 Please review it.

 Attached new rebased version patch.
 Please give me comments!

Now we should review your design and approach rather than code,
but since I got an assertion error while trying the patch, I report it.

initdb -D test -k caused the following assertion failure.

vacuuming database template1 ... TRAP:
FailedAssertion(!PageHeader) (heapPage))-pd_flags  0x0004)),
File: visibilitymap.c, Line: 328)
sh: line 1: 83785 Abort trap: 6
/dav/000_add_frozen_bit_into_visibilitymap_v3/bin/postgres --single
-F -O -c search_path=pg_catalog -c exit_on_error=true template1 
/dev/null
child process exited with exit code 134
initdb: removing data directory test

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.6 commitfest schedule

2015-07-01 Thread Alvaro Herrera

Fellow hackers,

As discussed at PGCon's developer meeting, this is the schedule for the
upcoming commitfests:

CF1: July 1 to July 31 2015
CF2: September 1 to September 30 2015
CF3: November 1 to November 30 2015
CF4: Januart 2 to January 31 2016
CF5: March 1 to March 31 2016
Feature Freeze: April 15 
   https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Meeting#9.6_Schedule

This means today is the big day for CF1!

Also, I'm pleased to announce that Heikki Linnakangas has agreed to be
our beloved commitfest manager for CF1; he will be starting on it
sometime in the next couple dozen hours or so.  Beware, as by now he has
gotten extremely good at stunningly strong blows of the commitfest mace.

The reviews must flow!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-01 Thread Gavin Flower


On 01/07/15 17:37, Amit Kapila wrote:
On Tue, Jun 30, 2015 at 4:00 AM, Jeff Davis pg...@j-davis.com 
mailto:pg...@j-davis.com wrote:


 [Jumping in without catching up on entire thread.


[...]

.

 2. Where is the speedup coming from? How much of it is CPU and IO
 overlapping (i.e. not leaving disk or CPU idle while the other is
 working), and how much from the CPU parallelism? I know this is
 difficult to answer rigorously, but it would be nice to have some
 breakdown even if for a specific machine.


Yes, you are right and we have done quite some testing (on the hardware
available) with this patch (with different approaches) to see how much
difference it creates for IO and CPU, with respect to IO we have found
that it doesn't help much [1], though it helps when the data is cached
and there are really good benefits in terms of CPU [2].


[...]

I assume your answer refers to a table on one spindle of spinning rust.


QUESTIONS:

1. what about I/O using an SSD?

2. what if the table is in a RAID array (of various types), would
   having the table spread over multiple spindles help?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

Andres Freund and...@anarazel.de writes:
 On 2015-07-01 10:51:49 -0400, Tom Lane wrote:
 The problem is that there are multiple risks to manage here.  If I were to
 back-patch that patch, it would actively break any third-party extensions
 that might be using the formerly-considered-valid technique of passing a
 NULL array pointer to these lookup functions.  We don't like breaking
 things in minor releases; that discourages people from updating to new
 minor releases.

 Yep, let's just fix it in master (and potentially 9.5, I don't care).

It's in the 9.5 branch already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

Andres Freund and...@anarazel.de writes:
 At the very least I think we should start to rely on 'static inline's
 working. There is not, and hasn't been for a while, any buildfarm animal
 that does not support it

pademelon doesn't.

Also, I think there are some other non-gcc animals that nominally allow
static inline but will generate warnings when such functions are
unreferenced in a particular compile (that's what the quiet inline
configure test is about).  That would be hugely annoying for development,
though maybe we don't care too much if it's only a build target.

I'm not against requiring static inline; it would be a huge improvement
really.  But we should not fool ourselves that this comes at zero
compatibility cost.

 The list of features, in the order of perceived importance, that might
 be worthwhile thinking about are:
 * static inline
 * variadic macros
 * designated initializers (e.g. somestruct foo = { .bar = 3 } )
 * // style comments (I don't care, but it comes up often enough ...)

Of these I think only the first is really worth breaking portability
for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.6 commitfest schedule

2015-07-01 Thread Robert Haas

On Wed, Jul 1, 2015 at 3:19 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Also, I'm pleased to announce that Heikki Linnakangas has agreed to be
 our beloved commitfest manager for CF1; he will be starting on it
 sometime in the next couple dozen hours or so.  Beware, as by now he has
 gotten extremely good at stunningly strong blows of the commitfest mace.

That's a feature, not a bug.

Roll your Fortitude save!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

On 2015-07-01 16:33:24 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  At the very least I think we should start to rely on 'static inline's
  working. There is not, and hasn't been for a while, any buildfarm animal
  that does not support it
 
 pademelon doesn't.

Oh. I'd gone through all the animals somewhere in the middle of the 9.5
cycle. pademelon was either dormant or not yet reincarnated back then.

I'll go through all again then. Interesting that anole's compiler
supports inlines.

 Also, I think there are some other non-gcc animals that nominally allow
 static inline but will generate warnings when such functions are
 unreferenced in a particular compile (that's what the quiet inline
 configure test is about).  That would be hugely annoying for development,
 though maybe we don't care too much if it's only a build target.

I know that 4c8aa8b5aea1e032f569222d4b6c1019e84622dc fixed that test
on a number of older platforms. Apparently even 10+ years ago some
compiler authors added the smarts to recognize whether static inlines
where declared in a header (don't warn) or in a .c file (warn).

 I'm not against requiring static inline; it would be a huge improvement
 really.  But we should not fool ourselves that this comes at zero
 compatibility cost.

It's not zero, but I think it's quite small.

  The list of features, in the order of perceived importance, that might
  be worthwhile thinking about are:
  * static inline
  * variadic macros
  * designated initializers (e.g. somestruct foo = { .bar = 3 } )
  * // style comments (I don't care, but it comes up often enough ...)
 
 Of these I think only the first is really worth breaking portability
 for.

If variadic macros, or even designated initializers, were supported by
the same set of animals in the buildfarm I'd happily use it, but they're
unfortunately not...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-07-01 Thread Oskari Saarenmaa

01.07.2015, 23:33, Tom Lane kirjoitti:
 Andres Freund and...@anarazel.de writes:
 At the very least I think we should start to rely on 'static inline's
 working. There is not, and hasn't been for a while, any buildfarm animal
 that does not support it
 
 pademelon doesn't.

HP-UX 10.20 was released in 1996, last shipped in July 2002 and
unsupported since July 2003.  Assuming the new features aren't going to
be used in release branches PG 9.5 is probably going to support that
platform longer than there's hardware to run it..

 But we should not fool ourselves that this comes at zero
 compatibility cost.

But compatibility with obsolete platforms doesn't come with zero cost
either -- and judging from the fact that no one noticed until now that
you couldn't even configure PG 9.0 .. 9.3 on recent Solaris 10 releases
(which I believe was the most popular proprietary Unix) suggests that
not a whole lot of people care about those platforms anymore.

/ Oskari


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

Joshua D. Drake j...@commandprompt.com writes:
 On 07/01/2015 01:33 PM, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
 At the very least I think we should start to rely on 'static inline's
 working. There is not, and hasn't been for a while, any buildfarm animal
 that does not support it

 pademelon doesn't.

 Other reasoning aside, pademelon is running an HPUX version that is 10 
 years old. I don't think we should care.

Try 16.  The reason I run it is not because anyone cares about actually
running Postgres on such a machine; it's just so that we will know when
we are breaking compatibility with ancient C compilers.  I brought it up
merely to refute Andres' claim that we do not have buildfarm coverage
of the case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

Peter Geoghegan p...@heroku.com writes:
 On Wed, Jul 1, 2015 at 9:14 AM, Andres Freund and...@anarazel.de wrote:
 The list of features, in the order of perceived importance, that might
 be worthwhile thinking about are:
 * static inline
 * variadic macros
 * designated initializers (e.g. somestruct foo = { .bar = 3 } )
 * // style comments (I don't care, but it comes up often enough ...)

 I don't want to add // style comments, FWIW.

I concur with that one.  I think the portability risk is nil (even
pademelon's compiler takes // without complaint, which surprised me
when I realized it).  Rather, I think the argument for continuing to
disallow // has to do with maintaining code style consistency.  A mishmash
of // and /* comments looks like heck.  (Note, BTW, that pgindent will
forcibly convert // to /* in some though seemingly not all cases.)

As far as static inline goes, it occurs to me after more thought that
there really is zero risk of build failures if we start relying on that,
because we already have the #define inline as empty trick in configure.
What would happen, on a compiler like pademelon's, is that you'd get a
whole lot of warnings about unreferenced static functions.  And maybe some
bloat in the executable, if the compiler is not smart enough to drop those
functions from its output.  I think both of these effects are probably
acceptable considering what a small minority of platforms would have any
issue.

A potentially larger issue is that I think we have places where frontend
code is #include'ing backend headers that contain macros we might wish
to convert to inline functions, and that will not work if the macros
contain references to backend functions or global variables.  But we could
solve that by refactoring headers where necessary.

 What is the state of support like for variadic macros and designated
 initializers? Unlike static inline, I am not aware that they are
 something that was widely implemented before C99 was formalized.

I agree that the value-for-portability-risk tradeoff doesn't look
great for these features.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-07-01 Thread Joshua D. Drake



On 07/01/2015 01:33 PM, Tom Lane wrote:

Andres Freund and...@anarazel.de writes:

At the very least I think we should start to rely on 'static inline's
working. There is not, and hasn't been for a while, any buildfarm animal
that does not support it


pademelon doesn't.


Other reasoning aside, pademelon is running an HPUX version that is 10 
years old. I don't think we should care.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

On Tue, Jun 30, 2015 at 11:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 I agree it would be cleaner to have a separate CHECK_UNIQUE_XXX code for
 speculative insertions. You've defined CHECK_UNIQUE_SPECULATIVE as like
 CHECK_UNIQUE_PARTIAL, but you don't have to insert the index tuple if
 there's a conflict. I think it'd be better to define it as like
 CHECK_UNIQUE_YES, but return FALSE instead of throwing an error on
 conflict. The difference is that the aminsert would not be allowed to
 return FALSE when there is no conflict.

Suppose we do it that way. Then what's the difference between
CHECK_UNIQUE_SPECULATIVE and CHECK_UNIQUE_PARTIAL? You've just
effectively required the CHECK_UNIQUE_YES case to not physically
insert a physical tuple before throwing an error, which does not seem
essential to the existing definition of CHECK_UNIQUE_YES -- you've
redefined CHECK_UNIQUE_YES in a way that nbtree happens to meet at the
moment. If we had an amcanunique AM that worked a bit like exclusion
constraints, this new obligation for CHECK_UNIQUE_YES might make it
impossible for that to work.

I'm not necessarily in disagreement. I just didn't do it that way for
that reason.

 Actually, even without this patch, a dummy implementation of aminsert that
 always returns FALSE in CHECK_UNIQUE_PARTIAL mode would be legal per the
 docs, but would loop forever. So that would be nice to fix or document away,
 even though it's not a problem for B-tree currently.

UNIQUE_CHECK_PARTIAL calls to an aminsert function are allowed to
report false positives (of not being unique, by returning FALSE as you
say), where there may not have been a conflict, but it's not clear
(e.g. because the conflicting xact has yet to commit/abort). You still
need to insert, though, so that the recheck at end-of-xact works for
deferred constraints. At that point, it's really not too different to
ordinary unique enforcement, except that the other guy is guaranteed
to notice you, too.

You can construct a theoretical case where lock starvation occurs with
unique constraint enforcement. I think it helps with nbtree here that
someone will reliably *not* see a conflict when concurrently
inserting, because unique index value locking exists (by locking the
first leaf page a value could be on with a buffer lock). But even if
that wasn't the case, the insert + recheck thing would be safe, just
as with exclusion constraints...provided you insert to begin with,
that is.

 In passing, I have make ExecInsertIndexTuples() give up when a
 speculative insertion conflict is detected. Again, this is not about
 bloat prevention; it's about making the code easier to understand by
 not doing something that is unnecessary, and in avoiding that also
 avoiding the implication that it is necessary. There are already
 enough complicated interactions that *are* necessary (e.g. livelock
 avoidance for exclusion constraints). Let us make our intent clearer.

 Hmm, not sure I like those changes. ExecInsertIndexTuples's behaviour now
 depends on whether specConflict is passed, but that seems weird as
 specConflict is an output parameter.

Your statement is ambiguous and confusing to me. Are you objecting to
the idea of returning from ExecInsertIndexTuples() early in general,
or to one narrow aspect of how that was proposed -- the fact that it
occurs due to specConflict != NULL rather than noDupErr? Obviously
if it's just the latter, then that's a small adjustment. But AFAICT
your remark about specConflict could one detail of a broader complaint
about an idea that you dislike generally.

 The patch also updates the AM interface documentation (the part
 covering unique indexes). It seems quite natural to me to document the
 theory of operation for speculative insertion there.


 Yeah, although I find the explanation pretty long-winded and difficult to
 understand ;-).

I don't know what you mean. You say things like this to me fairly
regularly, but I'm always left wondering what the take-away should be.
Please be more specific.

Long-winded generally means that more words than necessary were used.
I think that the documentation proposed is actually very information
dense, if anything. As always, I aimed to keep it consistent with the
surrounding documentation.

I understand that the material might be a little hard to follow, but
it concerns how someone can externally implement a Postgres index
access method that is amcanunique. That's a pretty esoteric subject
-- so far, we've had exactly zero takers (even without the
amcanunique part). It's damn hard to make these ideas accessible.

I feel that I have an obligation to share information about (say) how
the speculative insertion mechanism works -- the theory of operation
seems very useful. Like any one of us, I might not be around to
provide input on something like that in the future, so it makes sense
to be thorough and unambiguous. There are very few people (3?) who
have a good sense of how it works currently, and there are some

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

On Wed, Jul 1, 2015 at 10:45 AM, Peter Geoghegan p...@heroku.com wrote:
 You can construct a theoretical case where lock starvation occurs with
 unique constraint enforcement. I think it helps with nbtree here that
 someone will reliably *not* see a conflict when concurrently
 inserting, because unique index value locking exists (by locking the
 first leaf page a value could be on with a buffer lock). But even if
 that wasn't the case, the insert + recheck thing would be safe, just
 as with exclusion constraints...provided you insert to begin with,
 that is.

Of course, the fact that UNIQUE_CHECK_PARTIAL aminsert callers
(including deferred constraint inserters and, currently, speculative
inserters) can rely on this is very useful to UPSERT. It guarantees
progress of one session without messy exclusion constraint style fixes
(for deadlock and livelock avoidance). You cannot talk about
speculative insertion without talking about unprincipled deadlocks
(and maybe livelocks).

If I had to bet where we might find some bugs in the executor parts of
UPSERT, my first guess would be the exclusion constraint edge-case
handling (livelocking stuff). Those are probably relatively benign
bugs, but recent bugs in exclusion constraints (in released branches)
show that they can hide for a long time.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL-related tools and .paritial WAL file

On Wed, Jul 1, 2015 at 12:39 PM, Fujii Masao masao.fu...@gmail.com wrote:
WAL-related tools, i.e., pg_archivecleanup, pg_resetxlog and
pg_xlogdump don't seem to properly handle .paritial WAL file.
I think that we should fix at least pg_archivecleanup, otherwise,
in the system using pg_archivecleanup to clean up old archived
files, the archived .paritial WAL file will never be removed.
Thought?

Also regarding pg_xlogdump, we can just document, for example,
please get rid of .paritial suffix from the WAL file name if
you want to dump it by pg_xlogdump.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-01 Thread Noah Misch

On Tue, Jun 30, 2015 at 09:45:08AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Sun, Jun 28, 2015 at 07:00:14PM -0400, Tom Lane wrote:
  Another idea would be to make a test during postmaster start to see
  if this bug exists, and fail if so.  I'm generally on board with the
  thought that we don't need to work on systems with such a bad bug,
  but it would be a good thing if the failure was clean and produced
  a helpful error message, rather than looking like a Postgres bug.
 
  Failing cleanly on unpatched Solaris is adequate, agreed.  A check at
  postmaster start isn't enough, because the postmaster might run in the C
  locale while individual databases or collations use problem locales.  The
  safest thing is to test after every setlocale(LC_COLLATE) and
  newlocale(LC_COLLATE).  That's once at backend start and once per backend 
  per
  collation used, more frequent than I would like.  Hmm.
 
 I was thinking more along the lines of making a single test by momentarily
 switching into a known-buggy locale.

It seems like every Solaris system I meet has a different set of installed
locales.  Any one I might pick could sometimes be unavailable when other buggy
locales are available.

On this Solaris 10 05/08 (10u5) system, the buggy and non-buggy locales have
no clear pattern.  While fr_FR.UTF-8 is fine, fr_LU.UTF-8 is buggy.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-01 Thread Jeff Davis

On Wed, 2015-07-01 at 11:07 +0530, Amit Kapila wrote:

 For what you are asking to change name for?

There are still some places, at least in the comments, that call it a
parallel sequential scan.


 a. Infrastructure for parallel execution, like some of the stuff in
 execparallel.c, heapam.c,tqueue.c, etc and all other generic
 (non-nodes specific) code.

Did you consider passing tuples through the tqueue by reference rather
than copying? The page should be pinned by the worker process, but
perhaps that's a bad assumption to make?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing ClogControlLock contention

On Tue, Jun 30, 2015 at 12:32 PM, Simon Riggs si...@2ndquadrant.com wrote:

 ClogControlLock contention is high at commit time. This appears to be due
to the fact that ClogControlLock is acquired in Exclusive mode prior to
marking commit, which then gets starved by backends running
TransactionIdGetStatus().

 Proposal for improving this is to acquire the ClogControlLock in Shared
mode, if possible.


This approach looks good way for avoiding the contention around
ClogControlLock.  Few things that occurred to me while looking at
patch are that

a.  the semantics of new LWLock (CommitLock) introduced
by patch seems to be different in the sense that it is just taken in
Exclusive mode (and no Shared mode is required) as per your proposal. We
could use existing LWLock APi's, but on the other hand we could even
invent new LWLock API for this kind of locking.

b. SimpleLruReadPage_ReadOnly() - This API's is meant mainly for
read-access of page and the description also says the same, but now
we want to use it for updating page as well. It might be better to invent
similar new API or at the very least modify it's description.

 Two concurrent writers might access the same word concurrently, so we
protect against that with a new CommitLock. We could partition that by
pageno also, if needed.


I think it will be better to partition it or use it in some other way to
avoid
two concurrent writers block at it, however if you want to first see the
test results with this, then that is also okay.

Overall the idea seems good to pursue, however I have slight feeling
that using 2 LWLocks (CLOGControlLock in shared mode and new
CommitLock in Exclusive mode) to set the transaction information
is somewhat odd, but I could not see any problem with it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_rewind failure by file deletion in source server

On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
 any non-trivial scenarios, just copying all the files from pg_xlog isn't
 enough anyway, and you need to set up a recovery.conf after running
 pg_rewind that contains a restore_command or primary_conninfo, to fetch the
 WAL. So you can argue that by not copying pg_xlog automatically, we're
 actually doing a favour to the DBA, by forcing him to set up the
 recovery.conf file correctly. Because if you just test simple scenarios
 where not much time has passed between the failover and running pg_rewind,
 it might be enough to just copy all the WAL currently in pg_xlog, but it
 would not be enough if more time had passed and not all the required WAL is
 present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
 copying, as restore_command or streaming replication will only copy what's
 needed, while pg_rewind would copy all WAL it can find the target's data
 directory.
 pg_basebackup also doesn't include any WAL, unless you pass the --xlog
 option. It would be nice to also add an optional --xlog option to pg_rewind,
 but with pg_rewind it's possible that all the required WAL isn't present in
 the pg_xlog directory anymore, so you wouldn't always achieve the same
 effect of making the backup self-contained.

Those are very convincing arguments.

 So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
 in both the source and the target.

Minor thing: s/continous/continuous. Except that this patch looks sane to me.

(btw, it seems to me that we still have a race condition with
pg_tablespace_location).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing ClogControlLock contention

On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:

 On Tue, Jun 30, 2015 at 12:32 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
  ClogControlLock contention is high at commit time. This appears to be
 due to the fact that ClogControlLock is acquired in Exclusive mode prior to
 marking commit, which then gets starved by backends running
 TransactionIdGetStatus().
 
  Proposal for improving this is to acquire the ClogControlLock in Shared
 mode, if possible.
 

 This approach looks good way for avoiding the contention around
 ClogControlLock.  Few things that occurred to me while looking at
 patch are that

 a.  the semantics of new LWLock (CommitLock) introduced
 by patch seems to be different in the sense that it is just taken in
 Exclusive mode (and no Shared mode is required) as per your proposal. We
 could use existing LWLock APi's, but on the other hand we could even
 invent new LWLock API for this kind of locking.


LWLock API code is already too complex, so -1 for more changes there


 b. SimpleLruReadPage_ReadOnly() - This API's is meant mainly for
 read-access of page and the description also says the same, but now
 we want to use it for updating page as well. It might be better to invent
 similar new API or at the very least modify it's description.


Agreed, perhaps SimpleLruReadPage_Optimistic()


  Two concurrent writers might access the same word concurrently, so we
 protect against that with a new CommitLock. We could partition that by
 pageno also, if needed.
 

 I think it will be better to partition it or use it in some other way to
 avoid
 two concurrent writers block at it, however if you want to first see the
 test results with this, then that is also okay.


Many updates would be on same page, so partitioning it would need to be at
least 4-way to be worth doing. Maybe we could stripe into 512 bye pages.


 Overall the idea seems good to pursue, however I have slight feeling
 that using 2 LWLocks (CLOGControlLock in shared mode and new
 CommitLock in Exclusive mode) to set the transaction information
 is somewhat odd, but I could not see any problem with it.


Perhaps call it the CommitSerializationLock would help. There are already
locks that are held only in Exclusive mode.

Locking two separate resources at same time is common in other code.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] error message diff with Perl 5.22.0

Alex Hunsaker bada...@gmail.com writes:
 (What about the back branches? :D)

Indeed.  dangomushi is complaining about this in the back branches now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

On 2 July 2015 at 03:00, Rahila Syed rahilasye...@gmail.com wrote:


 Yes, I suggest just a single column on pg_stat_activity called
 pct_complete

 Reporting remaining time also can be crucial to make decisions regarding
 continuing or aborting VACUUM.
 The same has been suggested  in  the thread below,

 http://www.postgresql.org/message-id/13072.1284826...@sss.pgh.pa.us

 trace_completion_interval = 5s (default)

 Every interval, we report the current % complete for any operation that
 supports it. We just show NULL if the current operation has not reported
 anything or never will.

 We do this for VACUUM first, then we can begin adding other operations
 as we work out how (for that operation).

 Thank you for explaining. This design seems good to me except, adding more
 than one columns(percent_complete, remaining_time)


It is attractive to have a remaining_time column, or a
predicted_completion_timestamp, but those columns are prediction
calculations rather than actual progress reports. I'm interested in seeing
a report that relates to actual progress made.

Predicted total work required is also interesting, but is much less
trustworthy figure.

I think we'll need to get wider input about the user interface for this
feature.



 if required to pg_stat_activity can be less user intuitive than having a
 separate view for VACUUM.


I think it is a mistake to do something just for VACUUM.

Monitoring software will look at pg_stat_activity. I don't think we should
invent a separate view for progress statistics because it will cause users
to look in two places rather than just one. Reporting progress is fairly
cheap instrumentation, calculating a prediction of completion time might be
expensive.

Having said that, monitoring systems currently use a polling mechanism to
retrieve status data. They look at information published by the backend. We
don't currently have a mechanism to defer publication of expensive
monitoring information until requested by the monitoring system. If you
have a design for how that might work then say so, otherwise we need to
assume a simple workflow: the backend publishes whatever it chooses,
whenever it chooses and then that is made available via the monitoring
system via views.


Your current design completely misses the time taken to scan indexes, which
is significant.

There might be a justification to put this out of core, but measuring
progress of VACUUM wouldn't be it, IMHO.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

[HACKERS] Asynchronous execution on FDW

2015-07-01 Thread Kyotaro HORIGUCHI

Hello. This is the new version of FDW async exection feature.

The status of this feature is as follows, as of the last commitfest.

- Async execution is valuable to have.
- But do the first kick in ExecInit phase is wrong.

So the design outline of this version is as following,

- The patch set consists of three parts. The fist is the
  infrastracture in core-side, second is the code to enable
  asynchronous execution of Postgres-FDW. The third part is the
  alternative set of three methods to adapt fetch size, which
  makes asynchronous execution more effective.

- It was a problem when to give the first kick for async exec. It
  is not in ExecInit phase, and ExecProc phase does not fit,
  too. An extra phase ExecPreProc or something is too
  invasive. So I tried pre-exec callback.

  Any init-node can register callbacks on their turn, then the
  registerd callbacks are called just before ExecProc phase in
  executor. The first patch adds functions and structs to enable
  this.

- The second part is not changed from the previous version. Add
  PgFdwConn as a extended PgConn which have some members to
  support asynchronous execution.

  The asynchronous execution is kicked only for the first
  ForeignScan node on the same foreign server. And the state
  lasts until the next scan comes. This behavior is mainly
  controlled in fetch_more_data(). The behavior limits the number
  of simultaneous exection for one foreign server to 1. This
  behavior is decided from the reason that no reasonable method
  to limit multiplicity of execution on *single peer* was found
  so far.

- The third part is three kind of trials of adaptive fetch size
  feature.

   The first one is duration-based adaptation. The patch
   increases the fetch size by every FETCH execution but try to
   keep the duration of every FETCH below 500 ms. But it is not
   promising because it looks very unstable, or the behavior is
   nearly unforeseeable..

   The second one is based on byte-based FETCH feature. This
   patch adds to FETCH command an argument to limit the number of
   bytes (octets) to send. But this might be a over-exposure of
   the internals. The size is counted based on internal
   representation of a tuple and the client is needed to send the
   overhead of its internal tuple representation in bytes. This
   is effective but quite ugly..

   The third is the most simple and straight-forward way, that
   is, adds a foreign table option to specify the fetch_size. The
   effect of this is also in doubt since the size of tuples for
   one foreign table would vary according to the return-columns
   list. But it is foreseeable for users and is a necessary knob
   for those who want to tune it. Foreign server also could have
   the same option as the default for that for foreign tables but
   this patch have not added it.


The attached patches are the following,

- 0001-Add-infrastructure-of-pre-execution-callbacks.patch
  Infrastructure of pre-execution callback

- 0002-Allow-asynchronous-remote-query-of-postgres_fdw.patch
  FDW asynchronous execution feature

- 0003a-Add-experimental-POC-adaptive-fetch-size-feature.patch
  Adaptive fetch size alternative 1: duration based control

- 0003b-POC-Experimental-fetch_by_size-feature.patch
  Adaptive fetch size alternative 2: FETCH by size

- 0003c-Add-foreign-table-option-to-set-fetch-size.patch
  Adaptive fetch size alternative 3: Foreign table option.

regards,
  
From eb621897d1410079c6458bc4d1914d1345eb77bc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Fri, 26 Jun 2015 15:12:16 +0900
Subject: [PATCH 1/3] Add infrastructure of pre-execution callbacks.

Some exec nodes have some work before plan tree execution.
This infrastructure provides such functionality
---
 src/backend/executor/execMain.c  | 32 
 src/backend/executor/execUtils.c |  2 ++
 src/include/nodes/execnodes.h| 22 ++
 3 files changed, 56 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a1561ce..51a86b2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -764,6 +764,35 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 		PreventCommandIfParallelMode(CreateCommandTag((Node *) plannedstmt));
 }
 
+/*
+ * Register callbacks to be called just before plan execution.
+ */
+void
+RegisterPreExecCallback(PreExecCallback callback, EState *es, Node *nd,
+		void *arg)
+{
+	PreExecCallbackItem *item;
+
+	item = (PreExecCallbackItem*)
+		MemoryContextAlloc(es-es_query_cxt, sizeof(PreExecCallbackItem));
+	item-callback = callback;
+	item-node = nd;
+	item-arg = arg;
+
+	/* add the new node at the end of the chain */
+	item-next = es-es_preExecCallbacks;
+	es-es_preExecCallbacks = item;	
+}
+
+/* Execute registered pre-exec callbacks */
+void
+RunPreExecCallbacks(EState *es)
+{
+	PreExecCallbackItem *item;
+
+	for (item = es-es_preExecCallbacks ; item

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-01 Thread Amit Langote

On 2015-07-02 AM 11:41, Rahila Syed wrote:
 Hello,
 
 I though about the possibilities of progress visualization - and one
 possibility is one or two special column in pg_stat_activity table - this
 info can be interesting for VACUUM started by autovacuum too.
 
 Thank you for suggestion. The design with hooks and a separate view was
 mainly to keep most of the code outside core as the feature proposed is
 specific to VACUUM command. Also, having a separate view can give more
 flexibility in terms of displaying various progress parameters.
 

Unless I am missing something, I guess you can still keep the actual code that
updates counters outside the core if you adopt an approach that Simon
suggests. Whatever the view (existing/new), any related counters would have a
valid (non-NULL) value when read off the view iff hooks are set perhaps
because you have an extension that sets them. I guess that means any operation
that supports progress tracking would have an extension with suitable hooks
implemented.

Of course unless I misinterpreted Simon's words.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

On Thu, May 21, 2015 at 10:19 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown t...@linux.com wrote:
  Looks like one of the patches I applied is newer than the one in your
list:
 
  HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
  parallel-mode-v9.patch
  assess-parallel-safety-v4.patch
  parallel-heap-scan.patch
  parallel_seqscan_v11.patch

 This patch hasn't been updated for a while, so here is a rebased
 version now that we're hopefully mostly done making changes to
 pg_proc.h for 9.5.  parallel-mode-v10 was mostly committed, and
 parallel-heap-scan has now been incorporated into the parallel_seqscan
 patch.  So you should now be able to test this feature by applying
 just this patch, and the new version of the parallel seqscan patch
 which I am given to understand that Amit will be posting pretty soon.


This patch again needs rebase, but anyway I think this should be present
in the CF-1 as parallel seq scan patch is dependent on it.  So I am moving
this to upcoming CF unless there is any objection for doing so.

I know that CF-1 time is already started, but just now realized that this
patch
should also be present in it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

On Sat, Jun 27, 2015 at 12:54 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jun 15, 2015 at 2:52 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  Attached patch provides a fix as per above discussion.

 I think we should emit some LOG messages here.  When we detect the
 file is there:

 LOG: ignoring tablespace_map file because no backup_label file exists

 If the rename fails:

 LOG: could not rename file %s to %s: %m

 If it works:

 LOG: renamed file %s to %s


Added the above log messages in attached patch with small change
such that in message, file names will be displayed with quotes as most
of other usages of rename (failure) in that file uses quotes to display
filenames.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


rename_mapfile_if_backupfile_not_present_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Wed, Jul 1, 2015 at 11:58 PM, Sawada Masahiko wrote:
 On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus wrote:

 BTW, ALTER SYSTEM is a strong reason to use JSON for the synch rep GUC
 (assuming it's one parameter) instead of some custom syntax.  If it's
 JSON, we can validate it in psql, whereas if it's some custom syntax we
 have to wait for the db to reload and fail to figure out that we forgot
 a comma.  Using JSON would also permit us to use jsonb_set and
 jsonb_delete to incrementally change the configuration.

 Sounds convenience and flexibility. I agree with this json format
 parameter only if we don't combine both quorum and prioritization.
 Because of  backward compatibility.
 I tend to use json format value and it's new separated GUC parameter.

This is going to make postgresql.conf unreadable. That does not look
very user-friendly, and a JSON object is actually longer in characters
than the formula spec proposed upthread.

 Anyway, if we use json, I'm imaging parameter values like below.
 [JSON]
 Question: what happens *today* if we have two different synch rep
 strings in two different *.conf files?  I wouldn't assume that anyone
 has tested this ...
 We use last defied parameter even if sync rep strings in several file, right?

The last one wins, that's the rule in GUCs. Note that
postgresql.auto.conf has the top priority over the rest, and that
files included in postgresql.conf have their value considered when
they are opened by the parser.

Well, the JSON format has merit, if stored as metadata in PGDATA such
as it is independent on WAL, in something like pg_syncdata/ and if it
can be modified with a useful interface, which is where Josh's first
idea could prove to be useful. We just need a clear representation of
the JSON schema we would use and with what kind of functions we could
manipulate it on top of a get/set that can be used to retrieve and
update the metadata as wanted.

In order to preserve backward-compatibility, set s_s_names as
'special_value' and switch to the old interface. We could consider
dropping it after a couple of releases and being sure that the new
system is stable.

Also, I think that we should rely on SIGHUP as a first step of the
implementation to update the status of sync nodes in backend
processes. As a future improvement we could perhaps get rid. Still it
seems safer to me to rely on a signal to update the in-memory status
as a first step as this is what we have now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL-related tools and .paritial WAL file

On Wed, Jul 1, 2015 at 5:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 1 July 2015 at 07:52, Michael Paquier michael.paqu...@gmail.com wrote:

 On Wed, Jul 1, 2015 at 12:39 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
  WAL-related tools, i.e., pg_archivecleanup, pg_resetxlog and
  pg_xlogdump don't seem to properly handle .paritial WAL file.
  I think that we should fix at least pg_archivecleanup, otherwise,
  in the system using pg_archivecleanup to clean up old archived
  files, the archived .paritial WAL file will never be removed.
  Thought?

 +1 to handle it properly in pg_archivecleanup. If people use the
 archive cleanup command in recovery.conf and they remove WAL files
 before the fork point of promotion they should not see the partial
 file as well.


  To make pg_archivecleanup detect .paritial WAL file, I'd like to
  use IsPartialXLogFileName() macro that commit 179cdd0 added.
  Fortunately Michael proposed the patch which makes the macros
  in xlog_internal.h usable in WAL-related tools, and it's better
  to apply the fix based on his patch. So my plan is to apply his
  patch first and then apply the fix to pg_archivecleanup.
 
  http://www.postgresql.org/message-id/CAB7nPqSiKU+8HjVe_J05btonC5E7kvmRaLpGS6EaEQe=dy3...@mail.gmail.com

 As we already use extensively XLogFromFileName in a couple of frontend
 utilies, I suspect that the buildfarm is not going to blow up, but it
 is certainly better to have certitude with a full buildfarm cycle
 running on it...

  Regarding pg_resetxlog and pg_xlogdump, do we need to change
  also them so that they can handle .paritial file properly?
  pg_resetxlog cannot clean up .partial file even if it should be
  removed. But the remaining .paritial file is harmless and will be
  recycled later by checkpoint, so extra treatment of .paritial
  file in pg_resetxlog may not be necessary. IOW, maybe we can
  document that's the limitation of pg_resetxlog.

 I would rather vote for having pg_resetxlog remove the .partial
 segment as well. That's a two-liner in KillExistingArchiveStatus and
 KillExistingXLOG at quick glance. It looks to me that as pg_resetxlog
 is a reset utility, it should do its jib.

  Also regarding pg_xlogdump, we can just document, for example,
  please get rid of .paritial suffix from the WAL file name if
  you want to dump it by pg_xlogdump.

 For pg_xlogdump, I am on board for only documenting the limitation
 (renaming the file by yourself) as it is after all only a debug tool.
 This would also make fuzzy_open_file more complicated by having to
 detect two times more potential grammars... That's a bit crazy for a
 very narrow use-case.


 +1 to all

I also agree with Michael. I implemented the patch accordingly. Patch attached.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/ref/pg_xlogdump.sgml
--- b/doc/src/sgml/ref/pg_xlogdump.sgml
***
*** 215,220  PostgreSQL documentation
--- 215,226 
  Only the specified timeline is displayed (or the default, if none is
  specified). Records in other timelines are ignored.
/para
+ 
+   para
+ applicationpg_xlogdump/ cannot read WAL files with suffix
+ literal.partial/. If those files need to be read, literal.partial/
+ suffix needs to be removed from the filename.
+   /para
   /refsect1
  
   refsect1
*** a/src/bin/pg_archivecleanup/pg_archivecleanup.c
--- b/src/bin/pg_archivecleanup/pg_archivecleanup.c
***
*** 125,131  CleanupPriorWALFiles(void)
  			 * file. Note that this means files are not removed in the order
  			 * they were originally written, in case this worries you.
  			 */
! 			if (IsXLogFileName(walfile) 
  strcmp(walfile + 8, exclusiveCleanupFileName + 8)  0)
  			{
  /*
--- 125,131 
  			 * file. Note that this means files are not removed in the order
  			 * they were originally written, in case this worries you.
  			 */
! 			if ((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) 
  strcmp(walfile + 8, exclusiveCleanupFileName + 8)  0)
  			{
  /*
***
*** 181,187  CleanupPriorWALFiles(void)
   * SetWALFileNameForCleanup()
   *
   *	  Set the earliest WAL filename that we want to keep on the archive
!  *	  and decide whether we need_cleanup
   */
  static void
  SetWALFileNameForCleanup(void)
--- 181,187 
   * SetWALFileNameForCleanup()
   *
   *	  Set the earliest WAL filename that we want to keep on the archive
!  *	  and decide whether we need cleanup
   */
  static void
  SetWALFileNameForCleanup(void)
***
*** 192,199  SetWALFileNameForCleanup(void)
  
  	/*
  	 * If restartWALFileName is a WAL file name then just use it directly. If
! 	 * restartWALFileName is a .backup filename, make sure we use the prefix
! 	 * of the filename, otherwise we will remove wrong files since
  	 * 00010010.0020.backup is after
  	 * 00010010.
  	 */
--- 192,199 
  
  	/*
  	 * If restartWALFileName is a WAL file name then just use

Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-01 Thread Kouhei Kaigai

Folks,

 Moved this patch to next CF 2015-02 because of lack of review(ers).

Do we still need this patch as contrib module?
It was originally required it as example of custom-scan interface last
summer, however, here was no strong requirement after that, then, it
was bumped to v9.6 development cycle.

If somebody still needs it, I'll rebase and adjust the patch towards
the latest custom-scan interface. However, I cannot be motivated for
the feature nobody wants.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
 Sent: Friday, February 13, 2015 4:41 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Jim Nasby; Robert Haas; Simon Riggs; Kohei KaiGai; PgHacker
 Subject: Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] 
 Custom
 Plan API)
 
 
 
 On Tue, Jan 6, 2015 at 10:51 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 
 
   Jim, Thanks for your reviewing the patch.
 
   The attached patch is revised one according to your suggestion,
   and also includes bug fix I could found.
 
   * Definitions of TIDOperator was moved to pg_operator.h
 as other operator doing.
   * Support the case of ctid (operator) Param expression.
   * Add checks if commutator of TID was not found.
   * Qualifiers gets extracted on plan stage, not path stage.
   * Adjust cost estimation logic to fit SeqScan manner.
   * Add some new test cases for regression test.
 
On 12/24/14, 7:54 PM, Kouhei Kaigai wrote:
 On Mon, Dec 15, 2014 at 4:55 PM, Kouhei Kaigai
 kai...@ak.jp.nec.com
 wrote:
 I'm not certain whether we should have this functionality in
 contrib from the perspective of workload that can help, but its
 major worth is for an example of custom-scan interface.
 worker_spi is now in src/test/modules. We may add it there as 
 well,
no?

 Hmm, it makes sense for me. Does it also make sense to add a
 test-case to the core regression test cases?

 The attached patch adds ctidscan module at test/module instead of
 contrib.
 Basic portion is not changed from the previous post, but file
 locations and test cases in regression test are changed.
   
First, I'm glad for this. It will be VERY valuable for anyone trying
 to
clean up the end of a majorly bloated table.
   
Here's a partial review...
   
+++ b/src/test/modules/ctidscan/ctidscan.c
   
+/* missing declaration in pg_proc.h */
+#ifndef TIDGreaterOperator
+#define TIDGreaterOperator   2800
...
If we're calling that out here, should we have a corresponding comment
 in
pg_proc.h, in case these ever get renumbered?
   
   It was a quick hack when I moved this module out of the tree.
   Yep, I should revert when I submit this patch again.
 
+CTidQualFromExpr(Node *expr, int varno)
...
+ if (IsCTIDVar(arg1, varno))
+ other = arg2;
+ else if (IsCTIDVar(arg2, varno))
+ other = arg1;
+ else
+ return NULL;
+ if (exprType(other) != TIDOID)
+ return NULL;/* should not happen */
+ /* The other argument must be a pseudoconstant */
+ if (!is_pseudo_constant_clause(other))
+ return NULL;
   
I think this needs some additional blank lines...
   
   OK. And, I also noticed the coding style around this function is
   different from other built-in plans, so I redefined the role of
   this function just to check whether the supplied RestrictInfo is
   OpExpr that involves TID inequality operator, or not.
 
   Expression node shall be extracted when Plan node is created
   from Path node.
 
+ if (IsCTIDVar(arg1, varno))
+ other = arg2;
+ else if (IsCTIDVar(arg2, varno))
+ other = arg1;
+ else
+ return NULL;
+
+ if (exprType(other) != TIDOID)
+ return NULL;/* should not happen */
+
+ /* The other argument must be a pseudoconstant */
+ if (!is_pseudo_constant_clause(other))
+ return NULL;
   
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the 
 given
+ * restriction clauses. Its logic to scan relations are almost same
 as
+ * SeqScan doing, because it uses

[HACKERS] Re: Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-07-01 Thread Noah Misch

On Sat, Jun 27, 2015 at 06:13:36PM +0200, Andres Freund wrote:
 On 2015-06-27 12:10:49 -0400, Tom Lane wrote:
  Andres Freund and...@anarazel.de writes:
   On 2015-06-27 15:07:05 +0900, Michael Paquier wrote:
   +1 for removing on master and just disabling on back-branches.
  
   The problem with that approach is that it leaves people hanging in the
   dry if they've uncommented the default value, or changed it. That
   doesn't seem nice to me.
  
  I think at least 99% of the people who are using a nondefault value of
  ssl_renegotiation_limit are using zero and so would have no problem with
  this at all.  Possibly 100% of them; there's not really much use-case for
  changing from 512MB to some other nonzero value, is there?
 
 While still at 2ndq I've seen some increase it to nonzero values to cope
 with the connection breaks.

We'd need to be triply confident that we know better than the DBA before
removing flexibility in back branches.  +1 for just changing the default.
Suppose some security policy mandates a particular key rotation interval; the
minor release would force an awkward decision on that user.  DBAs who have
customized ssl_renegotiation_limit are more likely than most to notice the
release note and make an informed decision.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_file_settings view vs. Windows

2015-07-01 Thread Noah Misch

On Sat, Jun 27, 2015 at 07:20:43PM -0400, Tom Lane wrote:
 Tatsuo Ishii is...@postgresql.org writes:
  I noticed that in EXEC_BACKEND builds (ie, Windows) the pg_file_settings
  view doesn't act as its author presumably intended.  Specifically, it
  reads as empty until/unless the current session processes a SIGHUP event.
 
  I'm just wondering why we did not catch this earlier. If this is
  because threre's no regression test case for pg_file_settings view,
 
 Yeah, exactly.  Unfortunately I see no way to add a useful test, at least
 not one that will work in installcheck mode.  There's no way to predict
 what will be in the view in that case.  Even for make check, the output
 would be pretty darn environment-dependent.

A TAP test case is the way to test this thoroughly.  It could feed any number
of specific postgresql.conf variations to a postmaster.  See
src/bin/pg_ctl/t/002_status.pl for a test doing something similar.  (Granted,
I would not have thought to write such a test for this feature.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Wed, Jul 1, 2015 at 11:45 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 6/26/15 1:46 AM, Michael Paquier wrote:
 - k(elt1,elt2,eltN) means that we need for the k elements in the set
 to return true (aka commit confirmation).
 - k[elt1,elt2,eltN] means that we need for the first k elements in the
 set to return true.

 I think the difference between (...) and [...] is not intuitive.  To me,
 {...} would be more intuitive to indicate order does not matter.

When defining a set of elements {} defines elements one by one, () and
[] are used for ranges. Perhaps the difference is better this way.

 When k is not defined for a group, k = 1.

 How about putting it at the end?  Like

 [foo,bar,baz](2)

I am less convinced by that, now I won't argue against it either.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error message diff with Perl 5.22.0

On Thu, Jul 2, 2015 at 11:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alex Hunsaker bada...@gmail.com writes:
 (What about the back branches? :D)

 Indeed.  dangomushi is complaining about this in the back branches now.

Yep, perl 5.22 is used there.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-01 Thread Sameer Thakur

Hello,
Thank you for suggestion. The design with hooks and a separate view was
mainly to keep most of the code outside core as the feature proposed is
specific to VACUUM command. Also, having a separate view can give more
flexibility in terms of displaying various progress parameters.
FWIW ,there was resistance to include columns in pg_stat_activity earlier
in the following thread,
http://www.postgresql.org/message-id/AANLkTi=TcuMA38oGUKX9p5WVPpY+M3L0XUp7=PLT+LCT@...

Perhaps as suggested in the link, the progress could be made available via a
function call which does progress calculation on demand. Then we do not
need a separate view, or clutter pg_stat_activity, and also has benefit of
calculating progress just when it's needed.

  



--
View this message in context: 
http://postgresql.nabble.com/PROPOSAL-VACUUM-Progress-Checker-tp5855849p5856192.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-01 Thread Guillaume Lelarge

Le 2 juil. 2015 7:28 AM, Simon Riggs si...@2ndquadrant.com a écrit :

 On 2 July 2015 at 03:00, Rahila Syed rahilasye...@gmail.com wrote:


 Yes, I suggest just a single column on pg_stat_activity called
pct_complete

 Reporting remaining time also can be crucial to make decisions regarding
continuing or aborting VACUUM.
 The same has been suggested  in  the thread below,

 http://www.postgresql.org/message-id/13072.1284826...@sss.pgh.pa.us

 trace_completion_interval = 5s (default)

 Every interval, we report the current % complete for any operation that
supports it. We just show NULL if the current operation has not reported
anything or never will.

 We do this for VACUUM first, then we can begin adding other operations
as we work out how (for that operation).

 Thank you for explaining. This design seems good to me except, adding
more than one columns(percent_complete, remaining_time)


 It is attractive to have a remaining_time column, or a
predicted_completion_timestamp, but those columns are prediction
calculations rather than actual progress reports. I'm interested in seeing
a report that relates to actual progress made.


Agreed.

 Predicted total work required is also interesting, but is much less
trustworthy figure.


And it is something a client app or an extension can compute. No need to
put this in core as long as we have the actual progress.

 I think we'll need to get wider input about the user interface for this
feature.



 if required to pg_stat_activity can be less user intuitive than having a
separate view for VACUUM.


 I think it is a mistake to do something just for VACUUM.

 Monitoring software will look at pg_stat_activity. I don't think we
should invent a separate view for progress statistics because it will cause
users to look in two places rather than just one. Reporting progress is
fairly cheap instrumentation, calculating a prediction of completion time
might be expensive.


+1

 Having said that, monitoring systems currently use a polling mechanism to
retrieve status data. They look at information published by the backend. We
don't currently have a mechanism to defer publication of expensive
monitoring information until requested by the monitoring system. If you
have a design for how that might work then say so, otherwise we need to
assume a simple workflow: the backend publishes whatever it chooses,
whenever it chooses and then that is made available via the monitoring
system via views.


 Your current design completely misses the time taken to scan indexes,
which is significant.

 There might be a justification to put this out of core, but measuring
progress of VACUUM wouldn't be it, IMHO.

 --
 Simon Riggshttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Expending the use of xlog_internal.h's macros

On Thu, Jul 2, 2015 at 10:37 AM, Fujii Masao wrote:
 On Wed, Jul 1, 2015 at 9:53 PM, Fujii Masao wrote:

 +/* Length of XLog file name */
 +#define XLOG_DATA_FNAME_LEN 24

 Shorten the name of this macro variable, to XLOG_FNAME_LEN,
 for more code readability.

 Thanks. You have more talent for naming than I do.

 Comments?

 Just reading it again, I think that XLogFileNameById should use
 MAXFNAMELEN, and that XLogFileName should call directly
 XLogFileNameById as both are doing the same operation like in the
 attached.

 We can refactor the code that way, but which looks a bit messy
 at least to me. The original coding looks simpler and easier-readable,
 so I'd like to adopt the original one here.

 It seems also safer to use MAXFNAMELEN instead of MAXPGPATH
 for exclusiveCleanupFileName in pg_standby.c and pg_archivecleanup.c.

 Yep.

 Pushed. Thanks!

Thanks! I was going to send a patch with what you wanted but you just
beat me at it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-01 Thread Rahila Syed

Hello,

Thank you for suggestions.

Yes, I suggest just a single column on pg_stat_activity called pct_complete

Reporting remaining time also can be crucial to make decisions regarding
continuing or aborting VACUUM.
The same has been suggested  in  the thread below,

http://www.postgresql.org/message-id/13072.1284826...@sss.pgh.pa.us

trace_completion_interval = 5s (default)

Every interval, we report the current % complete for any operation that
supports it. We just show NULL if the current operation has not reported
anything or never will.

We do this for VACUUM first, then we can begin adding other operations as
we work out how (for that operation).

Thank you for explaining. This design seems good to me except, adding more
than one columns(percent_complete, remaining_time) if required to
pg_stat_activity can be less user intuitive than having a separate view for
VACUUM.

-Rahila Syed



















On Tue, Jun 30, 2015 at 2:02 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 30 June 2015 at 08:52, Pavel Stehule pavel.steh...@gmail.com wrote:


 I though about the possibilities of progress visualization - and one
 possibility is one or two special column in pg_stat_activity table - this
 info can be interesting for VACUUM started by autovacuum too.


 Yes, I suggest just a single column on pg_stat_activity called pct_complete

 trace_completion_interval = 5s (default)

 Every interval, we report the current % complete for any operation that
 supports it. We just show NULL if the current operation has not reported
 anything or never will.

 We do this for VACUUM first, then we can begin adding other operations as
 we work out how (for that operation).

 --
 Simon Riggshttp://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-01 Thread Rahila Syed

Hello,

I though about the possibilities of progress visualization - and one
possibility is one or two special column in pg_stat_activity table - this
info can be interesting for VACUUM started by autovacuum too.

Thank you for suggestion. The design with hooks and a separate view was
mainly to keep most of the code outside core as the feature proposed is
specific to VACUUM command. Also, having a separate view can give more
flexibility in terms of displaying various progress parameters.

FWIW ,there was resistance to include columns in pg_stat_activity earlier
in the following thread,
http://www.postgresql.org/message-id/AANLkTi=TcuMA38oGUKX9p5WVPpY+M3L0XUp7=plt+...@mail.gmail.com

Thank you,
Rahila Syed

On Tue, Jun 30, 2015 at 1:22 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:

 Hi

 2015-06-30 9:37 GMT+02:00 Rahila Syed rahilasye...@gmail.com:

 Hello Hackers,

 Following is a proposal for feature to calculate VACUUM progress.


 interesting idea - I like to see it integrated to core.



 Use Case : Measuring progress of long running VACUUMs to help DBAs make
 informed decision
 whether to continue running VACUUM or abort it.

 Design:

 A shared preload library to store progress information from different
 backends running VACUUM, calculate remaining time for each and display
 progress in the
 in the form a view.


 probably similar idea can be used for REINDEX, CREATE INDEX, COPY TO
 statements

 I though about the possibilities of progress visualization - and one
 possibility is one or two special column in pg_stat_activity table - this
 info can be interesting for VACUUM started by autovacuum too.

 Regards

 Pavel



 VACUUM  needs to be instrumented with a hook to collect progress
 information (pages vacuumed/scanned) periodically.

 The patch attached implements a new hook to store vacuumed_pages and
 scanned_pages count at the end of each page scanned by VACUUM.

 This information is stored in a shared memory structure.

 In addition to measuring progress this function using hook also
 calculates remaining time for VACUUM.



 The frequency of collecting progress information can be reduced by
 appending delays in between hook function calls.

 Also, a GUC parameter

 log_vacuum_min_duration can be used.

 This will cause VACUUM progress to be calculated only if VACUUM runs more
 than specified milliseconds.

 A value of zero calculates VACUUM progress for each page processed. -1
 disables logging.


 Progress calculation :


 percent_complete = scanned_pages * 100 / total_pages_to_be_scanned;

 remaining_time = elapsed_time * (total_pages_to_be_scanned -
 scanned_pages) / scanned_pages;


 Shared memory struct:

 typedef struct PgStat_VacuumStats

 {

   Oid databaseoid;

   Oid tableoid;

   Int32   vacuumed_pages;

   Int32   total_pages;

   Int32   scanned_pages;

   doubleelapsed_time;

   doubleremaining_time;

  } PgStat_VacuumStats[max_connections];



 Reporting :

  A view named 'pg_maintenance_progress' can be created using the values
 in the struct above.

 pg_stat_maintenance can be called from any other backend and will display
 progress of

 each running VACUUM.


 Other uses of hook in VACUUM:


 Cost of VACUUM in terms of pages hit , missed and dirtied same as
 autovacuum can be collected using this hook.

 Autovacuum does it at the end of VACUUM for each table. It can be done
 while VACUUM on a table is in progress.
 This can be helpful to track manual VACUUMs also not just autovacuum.

 Read/Write(I/O) rates can be computed on the lines of autovacuum.
 Read rate patterns can be used to help tuning future vacuum on the
 table(like shared buffers tuning)
 Other resource usages can also be collected using progress checker hook.


 Attached patch is POC patch of progress calculation for a single backend.

 Also attached is a brief snapshot of the output log.




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

Andres Freund and...@anarazel.de writes:
 Since, buildfarm/quiet inline test issues aside, pademelon is the only
 animal not supporting inlines and varargs, I think we should just go
 ahead and start to use both.

I'm good with using inlines, since as I pointed out upthread, that won't
actually break anything.  I'm much less convinced that varargs macros
represent a winning tradeoff.  Using those *will* irredeemably break
pre-C99 compilers, and AFAICS we do not have an urgent need for them.

(BTW, where are you drawing the conclusion that all these compilers
support varargs?  I do not see a configure test for it.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

On 2015-07-01 19:05:08 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  Since, buildfarm/quiet inline test issues aside, pademelon is the only
  animal not supporting inlines and varargs, I think we should just go
  ahead and start to use both.
 
 I'm good with using inlines, since as I pointed out upthread, that won't
 actually break anything.  I'm much less convinced that varargs macros
 represent a winning tradeoff.  Using those *will* irredeemably break
 pre-C99 compilers, and AFAICS we do not have an urgent need for them.

Well, I'll happily take that.

 (BTW, where are you drawing the conclusion that all these compilers
 support varargs?  I do not see a configure test for it.)

There is, although not in all branches: PGAC_C_VA_ARGS. We optionally
use vararg macros today, for elog (b853eb9), so I assume it works ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

On 2015-07-01 23:39:06 +0200, Andres Freund wrote:
 On 2015-07-01 16:33:24 -0400, Tom Lane wrote:
  Andres Freund and...@anarazel.de writes:
   At the very least I think we should start to rely on 'static inline's
   working. There is not, and hasn't been for a while, any buildfarm animal
   that does not support it
 
  pademelon doesn't.

 Oh. I'd gone through all the animals somewhere in the middle of the 9.5
 cycle. pademelon was either dormant or not yet reincarnated back then.

 I'll go through all again then. Interesting that anole's compiler
 supports inlines.

Here are a list of animals and what they support. I left out several
redundant entries (don't need 30 reports of newish linux + gcc
supporting everything).

animal   OS  compiler inline   quiet inline 
varargs

anoleHPUX 11.31  HP C/aC++A.06.25 yy
y
axolotl  fed-pi  gcc-4.9  yy
y
baijivista   MSVC 2005yy
y
binturongsolaris 10  sun studio 12.3  yy
y
baijiwin 8   MSVC 2012yy
y
brolga   cygwin  gcc-4.3  yy
castoroides  solaris 10  sun studio 12.1  yy
y
catamountOSX 10.8llvm-gcc 4.2 yy
y
coypunetbsd 5.1  gcc-4.1  yy
y
currawongwin xp  MSVC 2008yy
y
damselflyillumos 201311  gcc-4.7  yy
y
dingosolaris 10  gcc-4.9  yy
y
dromedaryOSX 10.6gcc-4.2  yy
y
dunlin   gento 13.0  icc 13.1 yy
y
elk  freebsd 7.1 gcc-4.2  yy
y
frogmouthwin xp  gcc-4.5  yy
y
gaur HP-UX 10.20 gcc-2.95 yy
y
gharial  HP-UX 11.31 gcc-4.6  yy
y
locust   OSX 10.5gcc-4.0  yy
y
mallard  fedora 21   clang-3.5yy
y
mouflon  openbsd 5.7 gcc-4.2  yn*   
y
narwhal  win 2003gcc-3.4  yy
y
okapigentoo 12.14icc 11.1 yy
y
olinguitoopenbsd 5.3 gcc-4.2  yn*   
y
opossum  netbsd 5.2  gcc-4.1  yy
y
orangutanOSX 10.4gcc-4.0  yy
y
pademelonHP-UX 10.2  HP C Compiler 10.32  nn
n
pika netbsd 4.99 gcc-4.1  yy
?**
protosciurus solaris 10  gcc-3.4  yy
y
rover_firef  OmniOS  gcc-4.6  ?*** ?*** 
?***
smew debian 6.0  clang 2.9yy
y
spoonbillopenbsd 3.6 gcc-3.3  yy
y


* fails due to idiotic warnings we can't do much about:
/usr/local/lib/libxml2.so.15.1: warning: strcpy() is almost always misused, 
please use strlcpy()
/usr/local/lib/libxml2.so.15.1: warning: strcat() is almost always misused, 
please use strlcat()
/usr/local/lib/libxslt.so.3.8: warning: sprintf() is often misused, please use 
snprintf()
** hasn't run since check was added, but gcc-4.1 supports
*** doesn't report any details, too old buildfarm client? All supported by 4.6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

On 2015-07-02 00:15:14 +0200, Andres Freund wrote:
 animal   OS  compiler inline   quiet inline   
   varargs

 brolga   cygwin  gcc-4.3  yy

4.3 obviously supports varargs. Human error.

 pademelonHP-UX 10.2  HP C Compiler 10.32  nn  
   n

Since, buildfarm/quiet inline test issues aside, pademelon is the only
animal not supporting inlines and varargs, I think we should just go
ahead and start to use both.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More logging for autovacuum

2015-07-01 Thread Gurjeet Singh

On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera alvhe...@commandprompt.com
wrote:

 Gregory Stark wrote:
 
  I'm having trouble following what's going on with autovacuum and I'm
 finding
  the existing logging insufficient. In particular that it's only logging
 vacuum
  runs *after* the vacuum finishes makes it hard to see what vacuums are
 running
  at any given time. Also, I want to see what is making autovacuum decide
 to
  forgo vacuuming a table and the log with that information is at DEBUG2.

 So did this idea go anywhere?


Assuming the thread stopped here, I'd like to rekindle the proposal.

log_min_messages acts as a single gate for everything headed for the server
logs; controls for per-background process logging do not exist. If one
wants to see DEBUG/INFO messages for just the Autovacuum (or checkpointer,
bgwriter, etc.), they have to set log_min_messages to that level, but the
result would be a lot of clutter from other processes to grovel through, to
see the messages of interest.

The facilities don't yet exist, but it'd be nice if such parameters when
unset (ie NULL) pick up the value of log_min_messages. So by default, the
users will get the same behaviour as today, but can choose to tweak per
background-process logging when needed.

Absent such a feature, one hack is to set the desired log_min_messages
value in conf file and send SIGHUP to just the process of interest and
revert the conf file back; but it's a hack.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] raw output from copy

2015-07-01 Thread Pavel Golub

Hello Pavel.

I looked through the patch. Sources are OK. However I didn't find any docs
and test cases. Would you please provide me with short description on this
feature and why it is important. Because I didn't manage to find the old
Andrew Dunstan's post either.

On Sat, Apr 11, 2015 at 12:26 AM, Pavel Stehule pavel.steh...@gmail.com
wrote:

 Hi

 I wrote a prototype of this patch, and it works well

 postgres=# set client_encoding to 'latin2';
 SET
 Time: 1.488 ms
 postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format
 'raw')
 COPY 1
 Time: 1.108 ms
 postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format
 'raw') ;
 ?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký kůň/xxTime:
 1.000 ms

 Regards

 Pavel

 2015-04-09 20:48 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 This thread was finished without real work. I have a real use case -
 export XML doc in non utf8 encoding.

 http://www.postgresql.org/message-id/16174.1319228...@sss.pgh.pa.us

 I propose to implement new format option RAW like Tom proposed.

 It requires only one row, one column result - and result is just raw
 binary data without size.

 Objections? Ideas?

 Regards

 Pavel




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Nullus est in vitae sensus ipsa vera est sensus.

Re: [HACKERS] Parallel Seq Scan

2015-07-01 Thread Amit Langote

On 2015-07-01 PM 02:37, Amit Kapila wrote:
 
 In terms of completeness, I think we should add some documentation
 for this patch, one way is to update about the execution mechanism in
 src/backend/access/transam/README.parallel and then explain about
 new configuration knobs in documentation (.sgml files).  Also we
 can have a separate page in itself in documentation under Server
 Programming Section (Parallel Query - Parallel Scan;
 Parallel Scan Examples; ...)
 
 Another thing to think about this patch at this stage do we need to
 breakup this patch and if yes, how to break it up into multiple patches,
 so that it can be easier to complete the review. I could see that it
 can be splitted into 2 or 3 patches.
 a. Infrastructure for parallel execution, like some of the stuff in
 execparallel.c, heapam.c,tqueue.c, etc and all other generic
 (non-nodes specific) code.
 b. Nodes (Funnel and PartialSeqScan) specific code for optimiser
 and executor.
 c. Documentation
 
 Suggestions?

A src/backend/executor/README.parallel?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL-related tools and .paritial WAL file

On Wed, Jul 1, 2015 at 2:52 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Jul 1, 2015 at 9:09 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Also regarding pg_xlogdump, we can just document, for example,
 please get rid of .paritial suffix from the WAL file name if
 you want to dump it by pg_xlogdump.


 Can't we skip such files in pg_xlogdump?

.partial files are at the end of a timeline, so they will be ignored
now (and this even if a node on the old timeline archives the same
segment as the .partial one sent by the promoted standby). And as
pg_xlogdump is not able to follow a timeline jump there is nothing we
would need to do:
$ pg_xlogdump 00010006 00020008
pg_xlogdump: FATAL:  could not find file 00020006:
No such file or directory
$ pg_xlogdump -t 1 00010006 00020008
pg_xlogdump: FATAL:  could not find file 00020006:
No such file or directory
I am not convinced that it is worth to make pg_xlogdump follow
timeline jumps as well.. One could just run it twice on the old and
new timeline segments to get the output he wants.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in bttext_abbrev_convert()

On Wed, Jul 1, 2015 at 2:38 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jun 30, 2015 at 10:35 PM, Peter Geoghegan p...@heroku.com wrote:
 The regression tests have zero coverage for this
 tuplesort_performsort() btspool2 case. That's a fairly common case
 to have no coverage for, and that took me all of 5 minutes to find.

 BTW, I looked here because I added a bunch of sortsupport stuff to
 _bt_load() in 9.5. It appears that that new code is entirely without
 coverage.

That's not cool. A patch for the src/test/regress area would be welcome.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop/truncate table sucks for large values of shared buffers

On Wed, Jul 1, 2015 at 8:26 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 1 July 2015 at 15:39, Amit Kapila amit.kapil...@gmail.com wrote:


 Okay. I think we can maintain the list in similar way as we do for
 UNLINK_RELATION_REQUEST in RememberFsyncRequest(), but
 why to wait till 64 tables?


 I meant once per checkpoint cycle OR every N tables, whichever is sooner.
N would need to vary according to size of Nbuffers.


That sounds sensible to me, see my reply below what I think we can do
for this to work.


 We already scan whole buffer list in each
 checkpoint cycle, so during that scan we can refer this dropped relation
 list and avoid syncing such buffer contents.  Also for ENOENT error
 handling for FileWrite, we can use this list to refer relations for
which we
 need to ignore the error.  I think we are already doing something
similar in
 mdsync to avoid the problem of Dropped tables, so it seems okay to
 have it in mdwrite as well.

 The crucial thing in this idea to think about is avoiding reassignment of
 relfilenode (due to wrapped OID's) before we have ensured that none of
 the buffers contains tag for that relfilenode.  Currently we avoid this
for
 Fsync case by retaining the first segment of relation (which will avoid
 reassignment of relfilenode) till checkpoint ends, I think if we just
postpone
 it till we have validated it in shared_buffers, then we can avoid this
problem
 in new scheme and it should be delay of maximum one checkpoint cycle
 for unlinking such file assuming we refer dropped relation list in each
checkpoint
 cycle during buffer scan.


 Yes

 So you are keeping more data around for longer, right?

Yes and we already do it for the sake of Fsyncs.

 I think we would need some way to trigger a scan when the amount of
deferred dropped data files hits a certain size.


Okay, I think we can keep it as if the number of dropped
relations reached 64 or 0.01% (or 0.1%) of shared_buffers
whichever is minimum as a point to trigger checkpoint.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] error message diff with Perl 5.22.0

2015-07-01 Thread Alex Hunsaker

On Sat, Jun 6, 2015 at 7:03 PM, Peter Eisentraut pete...@gmx.net wrote:

 With the recently released Perl 5.22.0, the tests fail thus:

 -ERROR:  Global symbol $global requires explicit package name at line 3.
 -Global symbol $other_global requires explicit package name at line 4.
 +ERROR:  Global symbol $global requires explicit package name (did you
 forget to declare my $global?) at line 3.
 +Global symbol $other_global requires explicit package name (did you
 forget to declare my $other_global?) at line 4.
  CONTEXT:  compilation of PL/Perl function uses_global


FWIW the committed expected file seems fine to me. There is not a perl
option to toggle this behavior (and even if there was, I think the
resulting changes to pl/perl functions we be quite ugly).

(What about the back branches? :D)

Re: [HACKERS] Expending the use of xlog_internal.h's macros