Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Fujii Masao
On Fri, Jan 20, 2012 at 1:01 PM, Steve Singer ssinger...@sympatico.ca wrote:
 Here is my review of this verison of the patch. I think this patch has been
 in every CF for 9.2 and I feel it is getting close to being committed.

Thanks for the review!

 Testing Review
 

 I encountered this on my first replica (the one based on the master).  I am
 not sure if it is related to this patch, it happened after the pg_basebackup
 against the replica finished.

 TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File:
 twophase.c, Line: 1238)
 LOG:  startup process (PID 1) was terminated by signal 6: Aborted

I spent one hour to reproduce that issue, but finally I was not able
to do that :(
For now I have no idea what causes that issue. But basically the patch doesn't
touch any codes related to that issue, so I'm guessing that it's a problem of
the HEAD rather than the patch...

I will spend more time to diagnose the issue. If you notice something, please
let me know.

 - set full page writes=off and did a checkpoint
 - Started the pg_basebackup
 - set full_page_writes=on and did a HUP + some database activity that might
 have forced a checkpoint.

 I got this message from pg_basebackup.
 ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
 pg_basebackup: could not get WAL end position from server

 I point this out because the message is different than the normal could not
 initiate base backup: FATAL:  WAL generated with full_page_writes=off thatI
 normally see.

I guess that's because you started pg_basebackup before checkpoint record
with full_page_writes=off had been replicated and replayed to the standby.
In this case, when you starts pg_basebackup, it uses the previous checkpoint
record with maybe full_page_writes=on as the backup starting checkpoint, so
pg_basebackup passes the check of full_page_writes at the start of backup.
Then, it fails the check at the end of backup, so you got such an error message.

 We might want to add a PQerrorMessage(conn)) to
 pg_basebackup to print the error details.  Since this patch didn't actually
 change pg_basebackup I don't think your required to improve the error
 messages in it.  I am just mentioning this because it came up in testing.

Agreed.

When PQresultStatus() returns an unexpected status, basically the error
message from PQerrorMessage() should be reported. But only when
pg_basebackup could not get WAL end position, PQerrorMessage() was
not reported... This looks like a oversight of pg_basebackup... I think that
it's better to fix that as a separate patch (attached). Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4007680..873ef64 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -914,7 +914,7 @@ BaseBackup(void)
 	res = PQexec(conn, IDENTIFY_SYSTEM);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, _(%s: could not identify system: %s\n),
+		fprintf(stderr, _(%s: could not identify system: %s),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1049,8 +1049,8 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, _(%s: could not get WAL end position from server\n),
-progname);
+		fprintf(stderr, _(%s: could not get WAL end position from server: %s),
+progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
 	if (PQntuples(res) != 1)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Erik Rijkers
On Fri, January 20, 2012 05:01, Steve Singer wrote:
 On 12-01-17 05:38 AM, Fujii Masao wrote:
 On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masaomasao.fu...@gmail.com  wrote:
 The amount of code changes to allow pg_basebackup to make a backup from
 the standby seems to be small. So I ended up merging that changes and the
 infrastructure patch. WIP patch attached. But I'd happy to split the patch 
 again
 if you want.
 Attached is the updated version of the patch. I wrote the limitations of
 standby-only backup in the document and changed the error messages.


 Here is my review of this verison of the patch. I think this patch has
 been in every CF for 9.2 and I feel it is getting close to being
 committed.  The only issue of significants is a crash I encountered
 while testing, see below.

 I am fine with including the pg_basebackup changes in the patch it also
 makes testing some of the other changes possible.


 The documentation updates you have are good

 I don't see any issues looking at the code.



 Testing Review
 

 I encountered this on my first replica (the one based on the master).  I
 am not sure if it is related to this patch, it happened after the
 pg_basebackup against the replica finished.

 TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File:
 twophase.c, Line: 1238)
 LOG:  startup process (PID 1) was terminated by signal 6: Aborted
 LOG:  terminating any other active server processes

 A little earlier this postmaster had printed.

 LOG:  restored log file 0001001F from archive
 LOG:  restored log file 00010020 from archive
 cp: cannot stat
 `/usr/local/pgsql92git/archive/00010021': No such file
 or directory
 LOG:  unexpected pageaddr 0/1900 in log file 0, segment 33, offset 0
 cp: cannot stat
 `/usr/local/pgsql92git/archive/00010021': No such file
 or directory


 I have NOT been able to replicate this error  and I am not sure exactly
 what I had done in my testing prior to that point.


I'm not sure, but it does look like this is the mystery bug that I 
encountered repeatedly
already in 9.0devel; but I was never able to reproduce it reliably.  But I 
don't think it was ever
solved.

  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

Erik Rijkers
















 In another test run I had

 - set full page writes=off and did a checkpoint
 - Started the pg_basebackup
 - set full_page_writes=on and did a HUP + some database activity that
 might have forced a checkpoint.

 I got this message from pg_basebackup.
 ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
 pg_basebackup: could not get WAL end position from server

 I point this out because the message is different than the normal could
 not initiate base backup: FATAL:  WAL generated with
 full_page_writes=off thatI normally see.We might want to add a
 PQerrorMessage(conn)) to pg_basebackup to print the error details.
 Since this patch didn't actually change pg_basebackup I don't think your
 required to improve the error messages in it.  I am just mentioning this
 because it came up in testing.


 The rest of the tests I did involving changing full_page_writes
 with/without checkpoints and sighups and promoting the replica seemed to
 work as expected.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Restore process during recovery

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao masao.fu...@gmail.com wrote:

Requested update


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..469e6d6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include pgstat.h
 #include postmaster/bgwriter.h
 #include postmaster/startup.h
+#include postmaster/walrestore.h
 #include replication/walreceiver.h
 #include replication/walsender.h
 #include storage/bufmgr.h
@@ -187,7 +188,6 @@ static bool InArchiveRecovery = false;
 static bool restoredFromArchive = false;
 
 /* options taken from recovery.conf for archive recovery */
-static char *recoveryRestoreCommand = NULL;
 static char *recoveryEndCommand = NULL;
 static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
@@ -575,8 +575,8 @@ bool reachedConsistency = false;
 
 static bool InRedo = false;
 
-/* Have we launched bgwriter during recovery? */
-static bool bgwriterLaunched = false;
+/* Have we launched background procs during archive recovery yet? */
+static bool ArchRecoveryBgProcsActive = false;
 
 /*
  * Information logged when we detect a change in one of the parameters
@@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
 			 bool randAccess);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
-static bool RestoreArchivedFile(char *path, const char *xlogfname,
-	const char *recovername, off_t expectedSize);
 static void ExecuteRecoveryCommand(char *command, char *commandName,
 	   bool failOnerror);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 
 	XLogFileName(xlogfname, tli, log, seg);
 
+#define TMPRECOVERYXLOG	RECOVERYXLOG
+
 	switch (source)
 	{
 		case XLOG_FROM_ARCHIVE:
+			/*
+			 * Check to see if the WALRestore process has already put the
+			 * next file in place while we were working. If so, use that.
+			 * If not, get it ourselves. This makes it easier to handle
+			 * initial state before the WALRestore is active, and also
+			 * handles the stop/start logic correctly when we have both
+			 * streaming and file based replication active.
+			 *
+			 * We queue up the next task for WALRestore after we've begun to
+			 * use this file later in XLogFileRead().
+			 *
+			 * If the WALRestore process is still active, the lock wait makes
+			 * us wait, which is just like we were executing the command
+			 * ourselves and so doesn't alter the logic elsewhere.
+			 */
+			if (XLogFileIsNowFullyRestored(tli, log, seg))
+			{
+snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG);
+restoredFromArchive = true;
+break;
+			}
+
 			/* Report recovery progress in PS display */
 			snprintf(activitymsg, sizeof(activitymsg), waiting for %s,
 	 xlogfname);
 			set_ps_display(activitymsg, false);
 
 			restoredFromArchive = RestoreArchivedFile(path, xlogfname,
-	  RECOVERYXLOG,
+	  TMPRECOVERYXLOG,
 	  XLogSegSize);
+
 			if (!restoredFromArchive)
+			{
+LWLockRelease(WALRestoreCommandLock);
 return -1;
+			}
 			break;
 
 		case XLOG_FROM_PG_XLOG:
@@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 		if (stat(xlogfpath, statbuf) == 0)
 		{
 			if (unlink(xlogfpath) != 0)
+			{
+LWLockRelease(WALRestoreCommandLock);
 ereport(FATAL,
 		(errcode_for_file_access(),
 		 errmsg(could not remove file \%s\: %m,
 xlogfpath)));
+			}
 			reload = true;
 		}
 
 		if (rename(path, xlogfpath)  0)
+		{
+			LWLockRelease(WALRestoreCommandLock);
 			ereport(ERROR,
 (errcode_for_file_access(),
  errmsg(could not rename file \%s\ to \%s\: %m,
 		path, xlogfpath)));
+		}
+
+		/*
+		 * Make sure we recover from the new filename, so we can reuse the
+		 * temporary filename for asynchronous restore actions.
+		 */
+		strcpy(path, xlogfpath);
+
+		/*
+		 * Tell the WALRestore process to get the next file now.
+		 * Hopefully it will be ready for use in time for the next call the
+		 * Startup process makes to XLogFileRead().
+		 *
+		 * It might seem like we should do that earlier but then there is a
+		 * race condition that might lead to replacing RECOVERYXLOG with
+		 * another file before we've copied it.
+		 */
+		SetNextWALRestoreLogSeg(tli, log, seg);
+		LWLockRelease(WALRestoreCommandLock);
 
 		/*
 		 * If the existing segment was replaced, since walsenders might have
@@ -2911,8 +2961,11 @@ XLogFileClose(void)
  * For fixed-size files, the caller may pass the expected size as an
  * additional crosscheck on successful recovery.  If the file size is not
  * known, set expectedSize = 0.
+ *
+ * Must be called with 

Re: [HACKERS] WAL Restore process during recovery

2012-01-20 Thread Fujii Masao
On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Requested update

Thanks! Will review.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Fujii Masao
On Fri, Jan 20, 2012 at 7:37 PM, Erik Rijkers e...@xs4all.nl wrote:
 I'm not sure, but it does look like this is the mystery bug that I 
 encountered repeatedly
 already in 9.0devel; but I was never able to reproduce it reliably.  But I 
 don't think it was ever
 solved.

  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

I also encountered the same issue one year before:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01579.php

At that moment, I identified its cause:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01700.php

At last it was fixed:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01910.php

But Steve encountered it again, which means that the above fix is not
sufficient. Unless the issue is derived from my patch, we should do
another cycle of diagnosis of it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Simon Riggs
On Tue, Jan 17, 2012 at 10:38 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao masao.fu...@gmail.com wrote:
 The amount of code changes to allow pg_basebackup to make a backup from
 the standby seems to be small. So I ended up merging that changes and the
 infrastructure patch. WIP patch attached. But I'd happy to split the patch 
 again
 if you want.

 Attached is the updated version of the patch. I wrote the limitations of
 standby-only backup in the document and changed the error messages.


I'm looking at this patch and wondering why we're doing so many
press-ups to ensure full_page_writes parameter is on. This will still
fail if you use a utility that removes the full page writes, but fail
silently.

I think it would be beneficial to explicitly check that all WAL
records have full page writes actually attached to them until we
achieve consistency.

Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
and it was removed. Not sure why? We already track other parameters
when they change, so I don't want to introduce a whole new WAL record
for each new parameter whose change needs tracking.

Please make a note for committer that wal version needs bumping.

I think its probably time to start a README.recovery to explain why
this works the way it does. Other changes can then start to do that as
well, so we can keep this to sane levels of complexity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IDLE in transaction introspection

2012-01-20 Thread Magnus Hagander
On Thu, Jan 19, 2012 at 15:42, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jan 19, 2012 at 10:31 PM, Magnus Hagander mag...@hagander.net wrote:
 Applied with fairly extensive modifications. I moved things around,
 switched to using enum instead of int+#define and a few things like
 that. Also changed most of the markup in the docs - I may well have
 broken some previously good language that, so if I did and someone
 spots it, please mention it :-)

 The attached patch seems to need to be committed.

Thanks, applied.


 BTW, the following change in the patch is not required, but ISTM that
 it's better to change that way.

 -SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
 -       pg_stat_get_backend_activity(s.backendid) AS current_query
 +SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 +       pg_stat_get_backend_activity(s.backendid) AS query

Agreed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 11:04 AM, Fujii Masao masao.fu...@gmail.com wrote:

 But Steve encountered it again, which means that the above fix is not
 sufficient. Unless the issue is derived from my patch, we should do
 another cycle of diagnosis of it.

It's my bug, and I've posted a fix but not yet applied it, just added
to open items list. The only reason for that was time pressure, which
is now gone, so I'll look to apply it sooner.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-20 Thread Simon Riggs
On Mon, Jan 16, 2012 at 4:17 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Done. Attached is the updated version of the patch.

 Thanks.

 I'll review this first, but can't start immediately. Please expect
 something back in 2 days.

On initial review this looks fine.

I'll do a more thorough hands-on review now and commit if still OK.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-20 Thread Magnus Hagander
On Tue, Jan 17, 2012 at 21:39, Tomas Vondra t...@fuzzy.cz wrote:
 On 20.12.2011 19:59, Tomas Vondra wrote:
 On 20.12.2011 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.

 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!

 I've added the docs (see the attachment) and rebased to current head.

 Tomas

 Fixed a failing regression test (check of pg_stat_database structure).

I'm wondering if this (and also my deadlocks stats patch that's int he
queue) should instead of inventing new pgstats messages, add fields to
the tabstat message. It sounds like that one is just for tables, but
it's already the one collecting info about commits and rollbacks, and
it's already sent on every commit.

Adding two fields to that one would add some extra bytes on every
send, but I wonder if that woudl ever affect performance, given the
total size of the packet? And it would certainly be lower overhead in
the cases that there *have* been temp tables used.

Thoughts?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Fujii Masao
Thanks for the review!

On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I'm looking at this patch and wondering why we're doing so many
 press-ups to ensure full_page_writes parameter is on. This will still
 fail if you use a utility that removes the full page writes, but fail
 silently.

 I think it would be beneficial to explicitly check that all WAL
 records have full page writes actually attached to them until we
 achieve consistency.

I agree that it's worth adding such a safeguard. That can be a self-contained
feature, so I'll submit a separate patch for that, to keep each patch small.

 Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
 and it was removed. Not sure why? We already track other parameters
 when they change, so I don't want to introduce a whole new WAL record
 for each new parameter whose change needs tracking.

I revived that because whenever full_page_writes must be WAL-logged
or replayed, there is no need to WAL-log or replay the HS parameters.
The opposite is also true. Logging or replaying all of them every time
seems to be a bit useless, and to make the code unreadable. ISTM that
XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
WAL activity by merging them into one WAL record.

 Please make a note for committer that wal version needs bumping.

Okay, will add the note about bumping XLOG_PAGE_MAGIC.

 I think its probably time to start a README.recovery to explain why
 this works the way it does. Other changes can then start to do that as
 well, so we can keep this to sane levels of complexity.

In this CF, there are other patches which change recovery codes. So
I think that it's better to do that after all of them will have been committed.
No need to hurry up to do that now.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup is not checking IDENTIFY_SYSTEM numbre of columns

2012-01-20 Thread Magnus Hagander
On Sun, Jan 15, 2012 at 22:00, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Jan 11, 2012 at 5:11 PM, Magnus Hagander mag...@hagander.net wrote:

 No, no reason. Adding such a check would be a good idea.


 ok. patch attached, it also adds a few PQclear() calls before
 disconnect_and_exit().

I don't think we need to care about all those PQclear() - it does an
exit() right after them anyway, so what's the point? (the disconnect
part is important of course, since otherwise we get a message in the
log on the server)

I've applied the patch without the PQclear()s, and changed it around
so that the error message shown is actually the same in all the
different places.

 btw, in BaseBackup() in line 1149 (after the patch is applied) there
 is an exit instead of disconnect_and_exit() and that is probably a
 typo too

It is, indeed. Fixed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-01-20 Thread Robert Haas
On Sat, Jan 14, 2012 at 9:32 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Here's another version of the patch to make XLogInsert less of a bottleneck
 on multi-CPU systems. The basic idea is the same as before, but several bugs
 have been fixed, and lots of misc. clean up has been done.

This seems to need a rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas
On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

This seems to need a rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Robert Haas robertmh...@gmail.com writes:
 I guess the question is: for what purpose?

 Indeed, it seems like such a thing is not an extension at all anymore,
 or at least it gives up many of the useful properties of extensions.

I'm thinking that a common name and version number tracked in the
database for a set of related functions (that usually form an API) is
useful enough a property to be wanting to have extension support more
use cases than contrib-like “module centric” extensions (meaning, C
coded and shipped with a .so).

 Given the entire lack of demand from the field for such a cut-down
 concept of extension, I think we should not be in a hurry to introduce
 it.  Maybe in a year or two when we have a clearer idea of how people
 are actually using extensions, there will be a better argument for it.

Fair enough I guess (or at least I'm understanding how alone I am here),
let's hear from the field first.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 12:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 01/19/2012 04:12 PM, Robert Haas wrote:
 On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net  wrote:
 The spec only allows unescaped Unicode chars (and for our purposes that
 means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
 result in something that's not legal JSON.

 I understand.  I'm proposing that we not care.  In other words, if the
 server encoding is UTF-8, it'll really be JSON.  But if the server
 encoding is something else, it'll be almost-JSON.

 Of course, for data going to the client, if the client encoding is UTF8,
 they should get legal JSON, regardless of what the database encoding is,
 and conversely too, no?

 Yes.  I think this argument has been mostly theologizing, along the
 lines of how many JSON characters can dance on the head of a pin.
 From a user's perspective, the database encoding is only a constraint on
 which characters he can store.

Bingo.

 He does not know or care what the bit
 representation is inside the server.  As such, if we store a non-ASCII
 character in a JSON string, it's valid JSON as far as the user is
 concerned, so long as that character exists in the Unicode standard.
 If his client encoding is UTF8, the value will be letter-perfect JSON
 when it gets to him; and if his client encoding is not UTF8, then he's
 already pretty much decided that he doesn't give a fig about the
 Unicode-centricity of the JSON spec, no?

Also agreed.  Personally, I think it may not have been a great idea to
tie the JSON spec so closely to Unicode, but I understand that it
would have been difficult to define an encoding-agnostic equivalent of
\u, since it's hard to know for sure whether an arbitrary encoding
even has a (sensible?) definition of code points, and they probably
wanted to avoid ambiguity.  But, it's bound to cause problems for any
system that runs in some other encoding, which, when so requested, we
do.  Even if we had the ability to support multiple encodings in the
same database, I'm not sure I'd be very excited about insisting that
JSON data always be stored in UTF-8, because that would introduce a
lot of unnecessary transcoding for people using other encodings and
basically unnecessarily handicap the functionality provided by the
datatype.  But at least if we had that, people would have the *option*
to use JSON with UTF-8 and get the fully spec-compliant behavior.  As
it is, they don't; the system we have forces the database encoding on
all datatypes whether they like it or not, and that ain't changing for
9.2.

 So I'm with Robert: we should just plain not care.  I would further
 suggest that maybe what we should do with incoming JSON escape sequences
 is convert them to Unicode code points and then to the equivalent
 character in the database encoding (or throw error if there is none).

The code I've written so far does no canonicalization of the input
value of any kind, just as we do for XML.  I'm inclined to leave it
that way.  Eventually, we might want to make the JSON datatype support
equality comparisons and so on, and that will require the system to
knowing that the letter r can be encoded as some \u sequence and
that the escape \r is equivalent to some other escape \u, but
right now all the code does is try to validate that the JSON is legal,
NOT second-guess the user's choice about how to spell things or where
to insert whitespace.  I think that's a good choice because (1) AFAIK,
there's no official canonicalization method for JSON, so whatever we
pick will be something we think is best, not an official method
sanction by the spec, (2) users might prefer the way they chose to
represent a given value over the way we choose to represent it, and
(3) by simply validating and storing the JSON object, rather than
doing any canonicalization, the input function avoids the need to do
any data copying, hopefully maximizing speed.  Canonicalization can be
added on top of what I've done here and people who want or need it can
use it; I have some ideas around how to make that leverage the
existing code that I intend to pursue for 9.3, but right now I'd
rather not go there.

So, given that framework, what the patch does is this: if you're using
UTF-8, then \u is accepted, provided that  is something that
equates to a legal Unicode code point.  It isn't converted to the
corresponding character: it's just validated.  If you're NOT using
UTF-8, then it allows \u for code points up through 127 (which we
assume are the same in all encodings) and anything higher than that is
rejected.  If someone knows an easy way to check whether a \u
sequence for   007F is a legal Unicode code point that has an
equivalent in the current server encoding, then we can add logic to
allow that case also, but personally I'm not that excited about it.
Anyone who is using \u escapes with a non-Unicode coding 

Re: [HACKERS] Command Triggers

2012-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 My advice is to forget about trying to provide the command string to
 the user for the first version of this patch.  As you're finding out,
 there's no simple, easy, obvious way of doing it, and there are N0
 useful things that can be done without that functionality.

Actually, none of my current examples and tests are relying on it. For
the trigger based replications Jan said he would need the full parse
tree rather than just the command string anyway, so we're not loosing
anything more that we already were ready to postpone.

  I continue
 to think that for a first version of this, we should be satisfied to
 pass just the OID.  I know that's not really what you want, but
 there's much to be said for picking a goal that is achievable in the
 limited time available, and I fear that setting your sights too high
 will lead to something that either (a) doesn't get committed, or (b)
 gets committed, but turns out not to work very well, either of which
 would be less than ideal.

Agreed, mostly.

From the code I already have I think we should still pass in the command
tag, the schema name (or null) and the object name (or null) as input
parameters to the trigger's procedure.

I'm now working on that and then the concurrency angle of the command
triggers.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-20 Thread Heikki Linnakangas

On 03.01.2012 17:56, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 3:18 PM, Robert Haasrobertmh...@gmail.com  wrote:


2. When a backend can't find a free buffer, it spins for a long time
while holding the lock. This makes the buffer strategy O(N) in its
worst case, which slows everything down. Notably, while this is
happening the bgwriter sits doing nothing at all, right at the moment
when it is most needed. The Clock algorithm is an approximation of an
LRU, so is already suboptimal as a perfect cache. Tweaking to avoid
worst case behaviour makes sense. How much to tweak? Well,...


I generally agree with this analysis, but I don't think the proposed
patch is going to solve the problem.  It may have some merit as a way
of limiting the worst case behavior.  For example, if every shared
buffer has a reference count of 5, the first buffer allocation that
misses is going to have a lot of work to do before it can actually
come up with a victim.  But I don't think it's going to provide good
scaling in general.  Even if the background writer only spins through,
on average, ten or fifteen buffers before finding one to evict, that
still means we're acquiring ten or fifteen spinlocks while holding
BufFreelistLock. I don't currently have the measurements to prove
that's too expensive, but I bet it is.


I think its worth reducing the cost of scanning, but that has little
to do with solving the O(N) problem. I think we need both.

I've left the way open for you to redesign freelist management in many
ways. Please take the opportunity and go for it, though we must
realise that major overhauls require significantly more testing to
prove their worth. Reducing spinlocking only sounds like a good way to
proceed for this release.

If you don't have time in 9.2, then these two small patches are worth
having. The bgwriter locking patch needs less formal evidence to show
its worth. We simply don't need to have a bgwriter that just sits
waiting doing nothing.


I'd like to see some benchmarks that show a benefit from these patches, 
before committing something like this that complicates the code. These 
patches are fairly small, but nevertheless. Once we have a test case, we 
can argue whether the benefit we're seeing is worth the extra code, or 
if there's some better way to achieve it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup option for handling symlinks

2012-01-20 Thread Magnus Hagander
On Mon, Jan 16, 2012 at 19:52, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-01-08 at 22:22 +0100, Magnus Hagander wrote:
 On Sun, Jan 8, 2012 at 21:53, Peter Eisentraut pete...@gmx.net wrote:
  I've recently had a possible need for telling pg_basebackup how to
  handle symlinks in the remote data directory, instead of ignoring them,
  which is what currently happens.  Possible options were recreating the
  symlink locally (pointing to a file on the local system) or copying the
  file the symlink points to.  This is mainly useful in scenarios where
  configuration files are symlinked from the data directory.  Has anyone
  else had the need for this?  Is it worth pursuing?

 Yes.

 I came up to the same issue though - in one case it would've been best
 to copy the link, in the other case it would've been best to copy the
 contents of the file :S Couldn't decide which was most important...
 Maybe a switch would be needed?

 Yes.  Do we need to preserve the third behavior of ignoring symlinks?

I don't think we do.


 tar has an -h option that causes symlinks to be followed.  The default
 there is to archive the symlink itself.

Seems like a reasonable pattern to follow (though I think using -h is
a really bad idea, but the pattern of by default archiving the symlink
itself)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for the review!

 On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I'm looking at this patch and wondering why we're doing so many
 press-ups to ensure full_page_writes parameter is on. This will still
 fail if you use a utility that removes the full page writes, but fail
 silently.

 I think it would be beneficial to explicitly check that all WAL
 records have full page writes actually attached to them until we
 achieve consistency.

 I agree that it's worth adding such a safeguard. That can be a self-contained
 feature, so I'll submit a separate patch for that, to keep each patch small.

Maybe, but you mean do this now as well? Not sure I like silent errors.

 Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
 and it was removed. Not sure why? We already track other parameters
 when they change, so I don't want to introduce a whole new WAL record
 for each new parameter whose change needs tracking.

 I revived that because whenever full_page_writes must be WAL-logged
 or replayed, there is no need to WAL-log or replay the HS parameters.
 The opposite is also true. Logging or replaying all of them every time
 seems to be a bit useless, and to make the code unreadable. ISTM that
 XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
 WAL activity by merging them into one WAL record.

I don't agree, but for the sake of getting on with things I say this
is minor so is no reason to block this.

 Please make a note for committer that wal version needs bumping.

 Okay, will add the note about bumping XLOG_PAGE_MAGIC.

 I think its probably time to start a README.recovery to explain why
 this works the way it does. Other changes can then start to do that as
 well, so we can keep this to sane levels of complexity.

 In this CF, there are other patches which change recovery codes. So
 I think that it's better to do that after all of them will have been 
 committed.
 No need to hurry up to do that now.

Agreed.

Will proceed to final review and if all OK, commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 2:11 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 20.01.2012 15:32, Robert Haas wrote:

 On Sat, Jan 14, 2012 at 9:32 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 Here's another version of the patch to make XLogInsert less of a
 bottleneck
 on multi-CPU systems. The basic idea is the same as before, but several
 bugs
 have been fixed, and lots of misc. clean up has been done.


 This seems to need a rebase.


 Here you go.

I put myself down as reviewer for this. I'm planning to review this
early next week, once I've finished Fujii-san's patches.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 8:52 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 Robert Haas robertmh...@gmail.com writes:
 I guess the question is: for what purpose?

 Indeed, it seems like such a thing is not an extension at all anymore,
 or at least it gives up many of the useful properties of extensions.

 I'm thinking that a common name and version number tracked in the
 database for a set of related functions (that usually form an API) is
 useful enough a property to be wanting to have extension support more
 use cases than contrib-like “module centric” extensions (meaning, C
 coded and shipped with a .so).

I see that there is some usefulness there, but I'm not sure that this
is the best way to get our hands around it.  For one thing, people can
and do schema versioning and schema upgrade scripts entirely in
userland.  My last implementation worked by keeping a schema_versions
table on the server with one column, a UUID.  The deployment tarball
contained a file with a list of UUIDs in it, each one associated to an
SQL script.  At install time, the install script ran through that file
in order and ran any scripts whose UUID didn't yet appear in the
table, and then added the UUIDs of the run scripts to the table.  This
might not be what any given person wants, but there's a lot of
flexibility to do things like that without any particular support from
the DBMS.  (Incidentally, this experience is what convinced me that
CREATE TABLE IF EXISTS and ALTER TABLE IF EXISTS are good things to
have; my system could have been a lot simpler if I'd had those.)

One random design idea I had related to providing this functionality
in the DB core is to have a command that creates an empty extension,
maybe just CREATE EXTENSION foo EMPTY, and an ALTER command that
forcibly changes the DB's notion of what version is installed, like
ALTER EXTENSION foo FORCE VERSION TO '1.1'.  That would allow the
same sort of thing you're after here by using those two features plus
ALTER EXTENSION ADD/DROP, and could also be used to do other things.
For example, suppose you have DB servers A and B.  A is running an old
version of some extension and is in a shared hosting environment where
you can't get access to the box.  B, on the other hand, is your
brand-new, dedicated server.  You could upgrade the extension on the
old machine manually, by issuing SQL commands to forcibly change its
state, and then do a dump and reload onto B.  This might be useful if,
for example, B is also running a newer DB server version that won't
support the very old version of the extension running on A.   This is
probably an unusual situation, but maybe there's some value in
allowing users who want to do such things a cleaner way to do it than
direct catalog hackery.

Anyway, I'm just thinking out loud here - I don't actually have a very
strong feeling that I know what all of the solutions are in this area,
or even all the problems.  I'm interested in hearing about your
experiences with the system, and other people's, because I certainly
do agree that there's room for improvement.  One of my personal pet
peeves is that the system doesn't know how to do an install of v1.1 by
running the v1.0 script followed by the 1.0-1.1 upgrade script, which
I fear is going to lead to a rapid uptick in the number of copies of
almost-identical scripts in our git repository.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-20 Thread Marko Kreen
On Tue, Jan 17, 2012 at 05:53:33PM +0900, Kyotaro HORIGUCHI wrote:
 Hello,  This is revised and rebased version of the patch.
 
 a. Old term `Add Tuple Function' is changed to 'Store
Handler'. The reason why not `storage' is simply length of the
symbols.
 
 b. I couldn't find the place to settle PGgetAsCString() in. It is
removed and storeHandler()@dblink.c touches PGresAttValue
directly in this new patch. Definition of PGresAttValue stays
in lipq-fe.h and provided with comment.
 
 c. Refine error handling of dblink.c. I think it preserves the
previous behavior for column number mismatch and type
conversion exception.
 
 d. Document is revised.

First, my priority is one-the-fly result processing,
not the allocation optimizing.  And this patch seems to make
it possible, I can process results row-by-row, without the
need to buffer all of them in PQresult.  Which is great!

But the current API seems clumsy, I guess its because the
patch grew from trying to replace the low-level allocator.

I would like to propose better one-shot API with:

void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);

where the PGresAttValue * is allocated once, inside PQresult.
And the pointers inside point directly to network buffer.
Ofcourse this requires replacing the current per-column malloc+copy
pattern with per-row parse+handle pattern, but I think resulting
API will be better:

1) Pass-through processing do not need to care about unnecessary
   per-row allocations.

2) Handlers that want to copy of the row (like regular libpq),
   can optimize allocations by having global view of the row.
   (Eg. One allocation for row header + data).

This also optimizes call patterns - first libpq parses packet,
then row handler processes row, no unnecessary back-and-forth.


Summary - current API has various assumptions how the row is
processed, let's remove those.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

 This seems to need a rebase.

OT: It would save lots of time if we had 2 things for the CF app:

1. Emails that go to appropriate people when status changes. e.g. when
someone sets Waiting for Author the author gets an email so they
know the reviewer is expecting something. No knowing that wastes lots
of days, so if we want to do this in less days that seems like a great
place to start.

2. Something that automatically tests patches. If you submit a patch
we run up a blank VM and run patch applies on all patches. As soon as
we get a fail, an email goes to patch author. That way authors know as
soon as a recent commit invalidates something.

Those things have wasted time for me in the past, so they're
opportunities to improve the process, not must haves.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 9:29 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I'd like to see some benchmarks that show a benefit from these patches,
 before committing something like this that complicates the code. These
 patches are fairly small, but nevertheless. Once we have a test case, we can
 argue whether the benefit we're seeing is worth the extra code, or if
 there's some better way to achieve it.

Agreed.  I don't really like either of these patches stylistically:
when you start sometimes locking things and sometimes not locking
things, it gets hard to reason about how the code will behave, and you
limit what you can in the future do inside the critical section
because you have to keep in mind that you don't really have full
mutual exclusion.  There are places where it may be worth making that
trade-off if we can show a large performance benefit, but I am wary of
quick hacks that may get in the way of more complete solutions we want
to implement later, especially if the performance benefit is marginal.

I'm in the process of trying to pull together benchmark numbers for
the various performance-related patches that are floating around this
CommitFest, but a big problem is that many of them need to be tested
in different ways, which are frequently not explicitly laid out in the
emails in which they are submitted.  Some of them are calculated to
improve latency, and others throughput.  Some require pgbench run with
a small scale factor, some require pgbench with a large scale factor,
and some need some completely other type of testing altogether.  Some
need short tests, others need long tests, still others require testing
with very specific parameters, and some don't really need more testing
so much as they need profiling.  Many (like the double-write patch)
need more thought about worst case scenarios, like a sequential scan
on a large, unhinted table.  Some only really need testing in one or
two scenarios, but others could affect a broad variety of workloads
and really ought to be tested in many different ways to fully
understand the behavior.   Some of them may be interdependent in
complex ways.

My current belief, based on the testing I did before Christmas and my
gut feelings about the remaining patches, is that the patches which
have the best chance of making a major difference on general workloads
are your XLOG insert scaling patch and the group commit patch.  Those
are, in fact, virtually the only ones that have had more than zero
test results posted to the list so far, and those test results are
extremely promising.  The various checkpoint-related patches also seem
somewhat promising to me, despite the lack (so far) of any test
results.  Everything else strikes me as likely to make a difference
that is either very small or only applicable in a fairly circumscribed
set of cases.  This preliminary opinion is subject to change as more
evidence comes in, of course.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 2:47 AM, Greg Smith g...@2ndquadrant.com wrote:
 The updated patch looks good, marking as 'Ready for Committer'

 Patches without documentation are never ready for commit.  For this one, I'm
 not sure if that should be in the form of a reference example in contrib, or
 just something that documents that the hook exists and what the ground rules
 are for grabbing it.

Hooks are frequently not documented, and we only sometimes even bother
to include an example in contrib.  We should probably at least have a
working example for testing purposes, though, whether or not we end up
committing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 9:28 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 My advice is to forget about trying to provide the command string to
 the user for the first version of this patch.  As you're finding out,
 there's no simple, easy, obvious way of doing it, and there are N0
 useful things that can be done without that functionality.

 Actually, none of my current examples and tests are relying on it. For
 the trigger based replications Jan said he would need the full parse
 tree rather than just the command string anyway, so we're not loosing
 anything more that we already were ready to postpone.

Cool.

  I continue
 to think that for a first version of this, we should be satisfied to
 pass just the OID.  I know that's not really what you want, but
 there's much to be said for picking a goal that is achievable in the
 limited time available, and I fear that setting your sights too high
 will lead to something that either (a) doesn't get committed, or (b)
 gets committed, but turns out not to work very well, either of which
 would be less than ideal.

 Agreed, mostly.

 From the code I already have I think we should still pass in the command
 tag, the schema name (or null) and the object name (or null) as input
 parameters to the trigger's procedure.

I think the OID is better than the name, but if it's easy to pass the
name and schema, then I'm fine with it.  But I do think this is one of
those problems that is complex enough that we should be happy to get
the core infrastructure in in the first commit, even with an extremely
limited amount of functionality, and then build on it.
Enhance-and-extend is so much easier than a monolithic code drop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 9:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

 This seems to need a rebase.

 OT: It would save lots of time if we had 2 things for the CF app:

 1. Emails that go to appropriate people when status changes. e.g. when
 someone sets Waiting for Author the author gets an email so they
 know the reviewer is expecting something. No knowing that wastes lots
 of days, so if we want to do this in less days that seems like a great
 place to start.

 2. Something that automatically tests patches. If you submit a patch
 we run up a blank VM and run patch applies on all patches. As soon as
 we get a fail, an email goes to patch author. That way authors know as
 soon as a recent commit invalidates something.

 Those things have wasted time for me in the past, so they're
 opportunities to improve the process, not must haves.

Yeah, I agree that that would be nice.  I just haven't had time to
implement much of anything for the CF application in a long time.  My
management has been very interested in the performance and scalability
stuff, so that's been my main focus for 9.2.  I'm going to see if I
can carve out some time for this once the dust settles.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-20 Thread Marti Raudsepp
On Fri, Jan 20, 2012 at 17:01, Robert Haas robertmh...@gmail.com wrote:
 We should probably at least have a
 working example for testing purposes, though, whether or not we end up
 committing it.

Martin Pihlak sent a short description of how to test the patch with
his pg_logforward module:
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00790.php

That's what I used when reviewing, too.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The code I've written so far does no canonicalization of the input
 value of any kind, just as we do for XML.

Fair enough.

 So, given that framework, what the patch does is this: if you're using
 UTF-8, then \u is accepted, provided that  is something that
 equates to a legal Unicode code point.  It isn't converted to the
 corresponding character: it's just validated.  If you're NOT using
 UTF-8, then it allows \u for code points up through 127 (which we
 assume are the same in all encodings) and anything higher than that is
 rejected.

This seems a bit silly.  If you're going to leave the escape sequence as
ASCII, then why not just validate that it names a legal Unicode code
point and be done?  There is no reason whatever that that behavior needs
to depend on the database encoding.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-20 Thread Robert Haas
On Thu, Jan 19, 2012 at 10:46 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 19 January 2012 17:40, Robert Haas robertmh...@gmail.com wrote:
 I don't know what you mean by this.  I think removing wal_writer_delay
 is premature, because I think it still may have some utility, and the
 patch removes it.  That's a separate change that should be factored
 out of this patch and discussed separately.

 FWIW, I don't really care too much if we keep wal_writer_delay,
 provided it is only used in place of the patch's
 WALWRITER_NORMAL_TIMEOUT constant. I will note in passing that I doubt
 that the effect with asynchronous commits and hint bits is pronounced
 enough to have ever transferred through to someone making a practical
 recommendation to reduce wal_writer_delay to ameliorate clog
 contention.

It was very visible in some benchmarking Heikki did, and I was able to
reproduce it.

 Yeah, I guess the shutdown sequence could get a bit complex if we try
 to make everyone go through the WAL writer all the time.  But I wonder
 if we could rejiggering things somehow so that everything goes through
 WAL writer if its dead.

 I'm not sure what you mean by this last bit. It sounds a bit hazardous.

That last if was supposed to say unless, which may contribute to
the confusion.

 My problem with nominating a backend to the status of an auxiliary is
 that no matter what way you cut it, it increases the failure surface
 area, so to speak.

I think that's the wrong way of thinking about it.  Imagine this: we
maintain a queue of people who are waiting on the current WAL flush,
the current-flush-to LSN, and a queue of people who are waiting on the
next WAL flush, and a leader.  All this data is protected by a
spinlock.  When you want to flush WAL, you grab the spinlock.  If the
current-flush-to LSN is greater than the LSN you need, you add
yourself to the waiting-on-current-flush queue, release the spinlock,
and go to sleep.  Otherwise, if there's no leader, you become the
leader, enter your flush LSN as the current-flush-to-LSN, and release
the spinlock.  If there is a leader, you add yourself to the
waiting-on-next-flush queue, release the spinlock, and sleep.

If you become the leader, you perform the flush.  Then you retake the
spinlock, dequeue anyone waiting on the current flush, move all of the
next flush waiters (or those within a certain LSN distance) to the
current flush list, remember who is at the head of that new queue, and
release the spinlock.  You then set a flag in the PGPROC of the
backend now at the head of the next-flush queue and wake him up; he
checks that flag on waking to know whether he is the leader or whether
he's just been woken because his flush is done.  After waking him so
the next flush can get started, you wake all the people who were
waiting on the flush you already did.

This may or may not be a good design, but I don't think it has any
more failure surface area than what you have here.  In particular,
whether or not the WAL writer is running doesn't matter; the system
can run just fine without it, and can even still do group commit.  To
look at it another way, it's not a question of whether we're treating
a regular backend as an auxiliary process; there's no rule anywhere
that backends can't be dependent on the proper operation of other
backends for proper functioning - there are MANY places that have that
property, including LWLockAcquire() and LWLockRelease().  Nor is there
any rule that background processes are more reliable than foreground
processes, nor do I believe they are.  Much of the existing patch's
failure surface seems to me to come from the coordination it requires
between ordinary backends and the background writer, and possible race
conditions appertaining thereto: WAL writer dies, backend dies,
postmaster dies, postmaster and WAL writer die together, etc.

 +       if (delta  XLOG_SEG_SIZE * CheckPointSegments ||
 +                       !ProcGlobal-groupCommitAvailable)

 That seems like a gigantic value.  I would think that we'd want to
 forget about group commit any time we're behind by more than one
 segment (maybe less).

 I'm sure that you're right - I myself described it as horrible in my
 original mail. I think that whatever value we set this to ought to be
 well reasoned. Right now, your magic number doesn't seem much better
 than my bogus heuristic (which only ever served as a placeholder
 implementation that hinted at a well-principled formula). What's your
 basis for suggesting that that limit would always be close to optimal?

It's probably not - I suspect even a single WAL segment still too
large.  I'm pretty sure that it would be safe to always flush up to
the next block boundary, because we always write the whole block
anyway even if it's only partially filled.  So there's no possible
regression there.  Anything larger than the end of the current 8kB
block is going to be a trade-off between latency and throughput, and
it seems to me that there's 

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 10:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The code I've written so far does no canonicalization of the input
 value of any kind, just as we do for XML.

 Fair enough.

 So, given that framework, what the patch does is this: if you're using
 UTF-8, then \u is accepted, provided that  is something that
 equates to a legal Unicode code point.  It isn't converted to the
 corresponding character: it's just validated.  If you're NOT using
 UTF-8, then it allows \u for code points up through 127 (which we
 assume are the same in all encodings) and anything higher than that is
 rejected.

 This seems a bit silly.  If you're going to leave the escape sequence as
 ASCII, then why not just validate that it names a legal Unicode code
 point and be done?  There is no reason whatever that that behavior needs
 to depend on the database encoding.

Mostly because that would prevent us from adding canonicalization in
the future, AFAICS, and I don't want to back myself into a corner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 10:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

 This seems to need a rebase.

 Still applies and compiles cleanly for me.

D'oh.  You're right.  Looks like I accidentally tried to apply this to
the 9.1 sources.  Sigh...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-20 Thread Robert Haas
On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith g...@2ndquadrant.com wrote:
 I chewed a bit on Heikki's comment that similarity to the query planning
 parameters might be useful, and Robert's that being able to explain how the
 feature works more easily has value.  I have an initial adjustment of my
 general idea that I think moves usefully in both those directions.

 The existing VACUUM cost constants look like this:

 vacuum_cost_page_hit = 1
 vacuum_cost_page_miss = 10
 vacuum_cost_page_dirty = 20

 These could be adjusted to instead be ratios like the query planner ones
 (seq_page_cost, random_page_cost, etc.), referenced off a value of 1.0 for
 page miss ~= a read is expected:

 vacuum_cost_page_hit = 0.1
 vacuum_cost_page_miss = 1.0
 vacuum_cost_page_dirty = 2.0

 Now add in the new setting, which is explicitly said to be the read value:

 vacuum_cost_read_limit = 8000 # maximum page miss read rate in
 kilobytes/second

 And I can shuffle the numbers around internally such that things still work
 exactly the same, at the default parameters.  And then anyone who spends
 time learning how either the query planning or vacuum cost ratio constants
 work will find the learning curve to pick up the other set easier.

That may be a little better, but I still don't think it's worth
breaking backward compatibility for.  I mean, suppose I don't care
about read rate, but I want to limit my dirty data rate to 1MB/s.
What parameters should I set?

 It might be a bit more straightforward yet if things were renamed so it was
 more obvious that page miss~=read, but I haven't seen a good way to do that
 yet.  Renaming the reference cost value to vacuum_cost_page_read has two
 problems.  It makes the backward compatibility issues larger, and it's not
 quite true.  The way I think this should be explained, they really aren't
 the same; that's why I used ~= above.  A page miss is not guaranteed to be a
 read, it just is expected to be one in the worst case.  The read rate that
 vacuum page misses introduce will not be exactly the same as
 vacuum_cost_read_limit--but it will be below that limit, which is all it
 claims to be.

Maybe, but I still think having the read rate limit the dirty rate or
visca versa is *really* weird.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 3:32 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 20, 2012 at 10:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

 This seems to need a rebase.

 Still applies and compiles cleanly for me.

 D'oh.  You're right.  Looks like I accidentally tried to apply this to
 the 9.1 sources.  Sigh...

No worries. It's Friday.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 10:38 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 3:32 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 20, 2012 at 10:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

 This seems to need a rebase.

 Still applies and compiles cleanly for me.

 D'oh.  You're right.  Looks like I accidentally tried to apply this to
 the 9.1 sources.  Sigh...

 No worries. It's Friday.

http://www.youtube.com/watch?v=kfVsfOSbJY0

Of course, I even ran git log to check that I had the latest
sources... but what I had, of course, was the latest 9.1 sources,
which still have recently-timestamped commits, and I didn't look
carefully enough.  Sigh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Andrew Dunstan



On 01/20/2012 09:19 AM, Robert Haas wrote:

On Fri, Jan 20, 2012 at 12:07 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 01/19/2012 04:12 PM, Robert Haas wrote:

On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.netwrote:

The spec only allows unescaped Unicode chars (and for our purposes that
means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
result in something that's not legal JSON.

I understand.  I'm proposing that we not care.  In other words, if the
server encoding is UTF-8, it'll really be JSON.  But if the server
encoding is something else, it'll be almost-JSON.

Of course, for data going to the client, if the client encoding is UTF8,
they should get legal JSON, regardless of what the database encoding is,
and conversely too, no?

Yes.  I think this argument has been mostly theologizing, along the
lines of how many JSON characters can dance on the head of a pin.
 From a user's perspective, the database encoding is only a constraint on
which characters he can store.

Bingo.


He does not know or care what the bit
representation is inside the server.  As such, if we store a non-ASCII
character in a JSON string, it's valid JSON as far as the user is
concerned, so long as that character exists in the Unicode standard.
If his client encoding is UTF8, the value will be letter-perfect JSON
when it gets to him; and if his client encoding is not UTF8, then he's
already pretty much decided that he doesn't give a fig about the
Unicode-centricity of the JSON spec, no?

Also agreed.  Personally, I think it may not have been a great idea to
tie the JSON spec so closely to Unicode, but I understand that it
would have been difficult to define an encoding-agnostic equivalent of
\u, since it's hard to know for sure whether an arbitrary encoding
even has a (sensible?) definition of code points, and they probably
wanted to avoid ambiguity.  But, it's bound to cause problems for any
system that runs in some other encoding, which, when so requested, we
do.  Even if we had the ability to support multiple encodings in the
same database, I'm not sure I'd be very excited about insisting that
JSON data always be stored in UTF-8, because that would introduce a
lot of unnecessary transcoding for people using other encodings and
basically unnecessarily handicap the functionality provided by the
datatype.  But at least if we had that, people would have the *option*
to use JSON with UTF-8 and get the fully spec-compliant behavior.  As
it is, they don't; the system we have forces the database encoding on
all datatypes whether they like it or not, and that ain't changing for
9.2.


So I'm with Robert: we should just plain not care.  I would further
suggest that maybe what we should do with incoming JSON escape sequences
is convert them to Unicode code points and then to the equivalent
character in the database encoding (or throw error if there is none).

The code I've written so far does no canonicalization of the input
value of any kind, just as we do for XML.  I'm inclined to leave it
that way.  Eventually, we might want to make the JSON datatype support
equality comparisons and so on, and that will require the system to
knowing that the letter r can be encoded as some \u sequence and
that the escape \r is equivalent to some other escape \u, but
right now all the code does is try to validate that the JSON is legal,
NOT second-guess the user's choice about how to spell things or where
to insert whitespace.  I think that's a good choice because (1) AFAIK,
there's no official canonicalization method for JSON, so whatever we
pick will be something we think is best, not an official method
sanction by the spec, (2) users might prefer the way they chose to
represent a given value over the way we choose to represent it, and
(3) by simply validating and storing the JSON object, rather than
doing any canonicalization, the input function avoids the need to do
any data copying, hopefully maximizing speed.  Canonicalization can be
added on top of what I've done here and people who want or need it can
use it; I have some ideas around how to make that leverage the
existing code that I intend to pursue for 9.3, but right now I'd
rather not go there.

So, given that framework, what the patch does is this: if you're using
UTF-8, then \u is accepted, provided that  is something that
equates to a legal Unicode code point.  It isn't converted to the
corresponding character: it's just validated.  If you're NOT using
UTF-8, then it allows \u for code points up through 127 (which we
assume are the same in all encodings) and anything higher than that is
rejected.  If someone knows an easy way to check whether a \u
sequence for   007F is a legal Unicode code point that has an
equivalent in the current server encoding, then we can add logic to
allow that case also, but personally I'm not that excited about it.
Anyone who is using \u escapes 

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Simon Riggs
On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.

 This seems to need a rebase.

Still applies and compiles cleanly for me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 peeves is that the system doesn't know how to do an install of v1.1 by
 running the v1.0 script followed by the 1.0-1.1 upgrade script, which

Did you try

  CREATE EXTENSION foo FROM 1.0;

Maybe you would want the system to be able to determine the oldest
version to start from to reach the current default_version given in the
control file, but I guess it would be better to add another property
like default_full_version or such (last_stop?).

Looks like a simple enough project, the fact that it helps our own
shipping and script maintenance could maybe allow us to work on that
this late, dunno.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Marti Raudsepp
On Fri, Jan 20, 2012 at 01:33, Erik Rijkers e...@xs4all.nl wrote:
 Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I 
 suppose this is
 normally necessary for perfomance reasons, but it would be useful te be able 
 to compile a test
 version that allows it.

I believe being interruptible is a requirement for the patch to be accepted.

CHECK_FOR_INTERRUPTS(); should be added to the indeterminate loops.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 11:29 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 peeves is that the system doesn't know how to do an install of v1.1 by
 running the v1.0 script followed by the 1.0-1.1 upgrade script, which

 Did you try

  CREATE EXTENSION foo FROM 1.0;

Well, yes, that works, but I'm going for what you wrote next:

 Maybe you would want the system to be able to determine the oldest
 version to start from to reach the current default_version given in the
 control file, but I guess it would be better to add another property
 like default_full_version or such (last_stop?).

Possibly that might be a better design, yes.  Especially since we
don't order version numbers intrinsically.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan and...@dunslane.net wrote:
 XML's #; escape mechanism is more or less the equivalent of JSON's
 \u. But XML documents can be encoded in a variety of encodings,
 including non-unicode encodings such as Latin-1. However, no matter what the
 document encoding, #; designates the character with Unicode code point
 , whether or not that is part of the document encoding's charset.

OK.

 Given that precedent, I'm wondering if we do need to enforce anything other
 than that it is a valid unicode code point.

 Equivalence comparison is going to be difficult anyway if you're not
 resolving all \u escapes. Possibly we need some sort of canonicalization
 function to apply for comparison purposes. But we're not providing any
 comparison ops today anyway, so I don't think we need to make that decision
 now. As you say, there doesn't seem to be any defined canonical form - the
 spec is a bit light on in this respect.

Well, we clearly have to resolve all \u to do either comparison or
canonicalization.  The current patch does neither, but presumably we
want to leave the door open to such things.  If we're using UTF-8 and
comparing two strings, and we get to a position where one of them has
a character and the other has \u, it's pretty simple to do the
comparison: we just turn  into a wchar_t and test for equality.
That should be trivial, unless I'm misunderstanding.  If, however,
we're not using UTF-8, we have to first turn \u into a Unicode
code point, then covert that to a character in the database encoding,
and then test for equality with the other character after that.  I'm
not sure whether that's possible in general, how to do it, or how
efficient it is.  Can you or anyone shed any light on that topic?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread David E. Wheeler
On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:

 If his client encoding is UTF8, the value will be letter-perfect JSON
 when it gets to him; and if his client encoding is not UTF8, then he's
 already pretty much decided that he doesn't give a fig about the
 Unicode-centricity of the JSON spec, no?

Don’t entirely agree with this. Some folks are stuck with other encodings and 
cannot change them for one reason or another. That said, they can convert JSON 
from their required encoding into UTF-8 on the client side, so there is a 
workaround.

Not that this changes anything, and I agree with the overall direction of the 
discussion here. I just want to make sure we keep in mind folks who don’t 
necessarily have the freedom to switch to UTF-8. (And I say this as someone who 
*always* uses UTF-8!)

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread David E. Wheeler
On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:

 If, however,
 we're not using UTF-8, we have to first turn \u into a Unicode
 code point, then covert that to a character in the database encoding,
 and then test for equality with the other character after that.  I'm
 not sure whether that's possible in general, how to do it, or how
 efficient it is.  Can you or anyone shed any light on that topic?

If it’s like the XML example, it should always represent a Unicode code point, 
and *not* be converted to the other character set, no?

At any rate, since the JSON standard requires UTF-8, such distinctions having 
to do with alternate encodings are not likely to be covered, so I suspect we 
can do whatever we want here. It’s outside the spec.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I think the OID is better than the name, but if it's easy to pass the
 name and schema, then I'm fine with it.  But I do think this is one of

It's quite easy to get name and schema from the command yes, here's an
example of how I'm doing it for some commands:

case T_CreateStmt:
{
CreateStmt *node = (CreateStmt *)parsetree;
cmd-schemaname = RangeVarGetNamespace(node-relation);
cmd-objectname = node-relation-relname;
break;
}

case T_AlterTableStmt:
{
AlterTableStmt *node = (AlterTableStmt *)parsetree;
cmd-schemaname = RangeVarGetNamespace(node-relation);
cmd-objectname = node-relation-relname;
break;
}

case T_CreateExtensionStmt:
{
cmd-schemaname = NULL;
cmd-objectname = ((CreateExtensionStmt 
*)parsetree)-extname;
break;
}

Getting the OID on the other hand is much harder, because each command
implements that part as it wants to, and DefineWhatever() functions are
returning void. We could maybe have them return the main Oid of the
object created, or we could have the CommandContext structure I'm using
be a backend global variable that any command would stuff.

In any case, adding support for the OID only works for after trigger
when talking about CREATE commands and for before trigger if talking
about DROP commands, assuming that the trigger procedure is run after
we've been locating said Oid.

It seems to me to be a couple orders of magnitude more work to get the
Oid here compared to just get the schemaname and objectname. And getting
those works in all cases as we take them from the command itself (we
fill in the schema with the first search_path entry when it's not given
explicitly in the command)

 CREATE TABLE foo();
 NOTICE:  tag:  CREATE TABLE
 NOTICE:  enforce_local_style{public.foo}: foo

 those problems that is complex enough that we should be happy to get
 the core infrastructure in in the first commit, even with an extremely
 limited amount of functionality, and then build on it.
 Enhance-and-extend is so much easier than a monolithic code drop.

I'm happy to read that, and I'm preparing next patch version (v6) with
that goal in mind.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remembering bug #6123

2012-01-20 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Tom Lane  wrote:
 
 Well, the bottom line that's concerning me here is whether
 throwing errors is going to push anyone's application into an
 unfixable corner.  I'm somewhat encouraged that your Circuit
 Courts software can adapt to it, since that's certainly one of
 the larger and more complex applications out there. Or at least I
 would be if you had actually verified that the CC code was okay
 with the recently-proposed patch versions. Do you have any
 thorough tests you can run against whatever we end up with?
 
 To test the new version of this patch, we would need to pick an
 application release, and use the patch through the development,
 testing, and staging cycles,  We would need to look for all
 triggers needing adjustment, and make the necessary changes.  We
 would need to figure out which triggers were important to cover,
 and ensure that testing covered all of them.
 
 Given the discussions with my new manager this past week, I'm
 pretty sure we can work this into a release that would complete
 testing and hit pilot deployment in something like three months,
 give or take a little.  I can't actually make any promises on that
 until I talk to her next week.
 
After a couple meetings, I have approval to get this into an
application release currently in development.  Assuming that your
patch from the 13th is good for doing the testing, I think I can
post test results in about three weeks.  I'll also work on a
follow-on patch to add couple paragraphs and an example of the issue
to the docs by then.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Garick Hamlin
On Fri, Jan 20, 2012 at 09:12:13AM -0800, David E. Wheeler wrote:
 On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:
 
  If his client encoding is UTF8, the value will be letter-perfect JSON
  when it gets to him; and if his client encoding is not UTF8, then he's
  already pretty much decided that he doesn't give a fig about the
  Unicode-centricity of the JSON spec, no?
 
 Don’t entirely agree with this. Some folks are stuck with other encodings and
 cannot change them for one reason or another. That said, they can convert
 JSON from their required encoding into UTF-8 on the client side, so there is
 a workaround.

Perhaps in addition to trying to just 'do the right thing by default',
it makes sense to have a two canonicalization functions?

Say: json_utf8() and json_ascii().

They could give the same output no matter what encoding was set? 

json_utf8 would give nice output where characters were canonicalized to 
native utf8 characters and json_ascii() would output only non-control
ascii characters literally and escape everything else or something
like that?

Garick

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Andrew Dunstan



On 01/20/2012 11:58 AM, Robert Haas wrote:

On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstanand...@dunslane.net  wrote:

XML's#; escape mechanism is more or less the equivalent of JSON's
\u. But XML documents can be encoded in a variety of encodings,
including non-unicode encodings such as Latin-1. However, no matter what the
document encoding,#; designates the character with Unicode code point
, whether or not that is part of the document encoding's charset.

OK.


Given that precedent, I'm wondering if we do need to enforce anything other
than that it is a valid unicode code point.

Equivalence comparison is going to be difficult anyway if you're not
resolving all \u escapes. Possibly we need some sort of canonicalization
function to apply for comparison purposes. But we're not providing any
comparison ops today anyway, so I don't think we need to make that decision
now. As you say, there doesn't seem to be any defined canonical form - the
spec is a bit light on in this respect.

Well, we clearly have to resolve all \u to do either comparison or
canonicalization.  The current patch does neither, but presumably we
want to leave the door open to such things.  If we're using UTF-8 and
comparing two strings, and we get to a position where one of them has
a character and the other has \u, it's pretty simple to do the
comparison: we just turn  into a wchar_t and test for equality.
That should be trivial, unless I'm misunderstanding.  If, however,
we're not using UTF-8, we have to first turn \u into a Unicode
code point, then covert that to a character in the database encoding,
and then test for equality with the other character after that.  I'm
not sure whether that's possible in general, how to do it, or how
efficient it is.  Can you or anyone shed any light on that topic?



We know perfectly well how to turn two strings from encoding x to utf8 
(see mb_utils.c::pg_do_encoding_conversion() ). Once we've done that 
ISTM we have reduced this to the previous problem, as the mathematicians 
like to say.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-20 Thread Greg Smith

On 01/20/2012 10:37 AM, Robert Haas wrote:

On Thu, Jan 19, 2012 at 11:29 PM, Greg Smithg...@2ndquadrant.com  wrote:

vacuum_cost_page_hit = 0.1
vacuum_cost_page_miss = 1.0
vacuum_cost_page_dirty = 2.0

Now add in the new setting, which is explicitly said to be the read value:

vacuum_cost_read_limit = 8000 # maximum page miss read rate in
kilobytes/second

That may be a little better, but I still don't think it's worth
breaking backward compatibility for.  I mean, suppose I don't care
about read rate, but I want to limit my dirty data rate to 1MB/s.
What parameters should I set?


vacuum_cost_page_dirty = 8.0

The resulting maximum rates will then be:

hit = 80MB/s
miss = 8MB/s
dirty = 1MB/s

The question you should ask yourself next is how do I limit my dirty data rate to 
1MB/s in 9.1?  Working that out by hand is a good exercise, to show just how much 
less complicated this proposal is over the current state of things.  Show me how it's 
possible to do that in way we can expect new DBAs to follow, then the idea of keeping 
strong backwards compatibility here would have some weight.  I see sticking too closely 
to the current scheme as being more bug-level compatibility; it's fundamentally broken, 
by being too difficult to use, to most people in its current form.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-20 Thread Heikki Linnakangas

On 04.01.2012 13:14, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Jim Nasbyj...@nasby.net  writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.



Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.


No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.


My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.


So, you're proposing that we remove freelist altogether? Sounds 
reasonable, but that needs to be performance tested somehow. I'm not 
sure what exactly the test should look like, but it probably should 
involve an OLTP workload, and large tables being created, populated to 
fill the cache with pages from the table, and dropped, while the OLTP 
workload is running. I'd imagine that the worst case scenario looks 
something like that.


Removing the freelist should speed up the drop, but the question is how 
costly are the extra clock sweeps that the backends have to do.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 12:14 PM, David E. Wheeler da...@kineticode.com wrote:
 On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:

 If, however,
 we're not using UTF-8, we have to first turn \u into a Unicode
 code point, then covert that to a character in the database encoding,
 and then test for equality with the other character after that.  I'm
 not sure whether that's possible in general, how to do it, or how
 efficient it is.  Can you or anyone shed any light on that topic?

 If it’s like the XML example, it should always represent a Unicode code 
 point, and *not* be converted to the other character set, no?

Well, you can pick which way you want to do the conversion.  If the
database encoding is SJIS, and there's an SJIS character in a string
that gets passed to json_in(), and there's another string which also
gets passed to json_in() which contains \u, then any sort of
canonicalization or equality testing is going to need to convert the
SJIS character to a Unicode code point, or the Unicode code point to
an SJIS character, to see whether they match.

Err, actually, now that I think about it, that might be a problem:
what happens if we're trying to test two characters for equality and
the encoding conversion fails?  We really just want to return false -
the strings are clearly not equal if either contains even one
character that can't be converted to the other encoding - so it's not
good if an error gets thrown in there anywhere.

 At any rate, since the JSON standard requires UTF-8, such distinctions having 
 to do with alternate encodings are not likely to be covered, so I suspect we 
 can do whatever we want here. It’s outside the spec.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 12:14 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think the OID is better than the name, but if it's easy to pass the
 name and schema, then I'm fine with it.  But I do think this is one of

 It's quite easy to get name and schema from the command yes, here's an
 example of how I'm doing it for some commands:

                case T_CreateStmt:
                {
                        CreateStmt *node = (CreateStmt *)parsetree;
                        cmd-schemaname = RangeVarGetNamespace(node-relation);
                        cmd-objectname = node-relation-relname;
                        break;
                }

                case T_AlterTableStmt:
                {
                        AlterTableStmt *node = (AlterTableStmt *)parsetree;
                        cmd-schemaname = RangeVarGetNamespace(node-relation);
                        cmd-objectname = node-relation-relname;
                        break;
                }

                case T_CreateExtensionStmt:
                {
                        cmd-schemaname = NULL;
                        cmd-objectname = ((CreateExtensionStmt 
 *)parsetree)-extname;
                        break;
                }

 Getting the OID on the other hand is much harder, because each command
 implements that part as it wants to, and DefineWhatever() functions are
 returning void. We could maybe have them return the main Oid of the
 object created, or we could have the CommandContext structure I'm using
 be a backend global variable that any command would stuff.

 In any case, adding support for the OID only works for after trigger
 when talking about CREATE commands and for before trigger if talking
 about DROP commands, assuming that the trigger procedure is run after
 we've been locating said Oid.

 It seems to me to be a couple orders of magnitude more work to get the
 Oid here compared to just get the schemaname and objectname. And getting
 those works in all cases as we take them from the command itself (we
 fill in the schema with the first search_path entry when it's not given
 explicitly in the command)

  CREATE TABLE foo();
  NOTICE:  tag:  CREATE TABLE
  NOTICE:  enforce_local_style{public.foo}: foo

Hmm, OK.  But what happens if the user doesn't specify a schema name explicitly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 12:35 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 01/20/2012 10:37 AM, Robert Haas wrote:

 On Thu, Jan 19, 2012 at 11:29 PM, Greg Smithg...@2ndquadrant.com  wrote:

 vacuum_cost_page_hit = 0.1

 vacuum_cost_page_miss = 1.0
 vacuum_cost_page_dirty = 2.0

 Now add in the new setting, which is explicitly said to be the read
 value:

 vacuum_cost_read_limit = 8000 # maximum page miss read rate in
 kilobytes/second

 That may be a little better, but I still don't think it's worth

 breaking backward compatibility for.  I mean, suppose I don't care
 about read rate, but I want to limit my dirty data rate to 1MB/s.
 What parameters should I set?

 vacuum_cost_page_dirty = 8.0

 The resulting maximum rates will then be:

 hit = 80MB/s
 miss = 8MB/s
 dirty = 1MB/s

That is, of course, not quite what I asked for.  In fact it's likely
that my actual rate will be less than 1MB/s, because there will
probably be a miss for ever dirty.  It will probably be about 8/9ths
of a MB/s.

 The question you should ask yourself next is how do I limit my dirty data
 rate to 1MB/s in 9.1?  Working that out by hand is a good exercise, to show
 just how much less complicated this proposal is over the current state of
 things.

OK, sure.  Our block size is 8kB, so we need every 128 blocks to
involve 1000 ms of delay.   Obviously there are many combinations of
parameters that will make that work, but here's one: delay for 125
milliseconds after each 16 blocks:

vacuum_cost_page_hit = 0
vacuum_cost_page_miss = 0
vacuum_cost_page_dirty = 1
vacuum_cost_limit = 16
autovacuum_vacuum_cost_delay = 125ms

Maybe that strikes you as worse than what you're proposing; it strikes
me as better.  Either way I think it's not going to be a good day for
people who are bad at math.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Maybe you would want the system to be able to determine the oldest
 version to start from to reach the current default_version given in the
 control file, but I guess it would be better to add another property
 like default_full_version or such (last_stop?).

 Possibly that might be a better design, yes.  Especially since we
 don't order version numbers intrinsically.

It's quite simple to implement too, see attached.  As for the version
number ordering, that's a giant rat hole I don't want to be digging in,
and I think we can bypass that problem entirely.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index e9e5e53..2d624d3 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -5,7 +5,7 @@ OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
 	crc32.o
 
 EXTENSION = hstore
-DATA = hstore--1.0.sql hstore--1.1.sql hstore--1.0--1.1.sql \
+DATA = hstore--1.0.sql hstore--1.1.sql.orig hstore--1.0--1.1.sql \
 	hstore--unpackaged--1.0.sql
 
 REGRESS = hstore
diff --git a/contrib/hstore/hstore--1.1.sql b/contrib/hstore/hstore--1.1.sql
deleted file mode 100644
index e95ad32..000
--- a/contrib/hstore/hstore--1.1.sql
+++ /dev/null
@@ -1,524 +0,0 @@
-/* contrib/hstore/hstore--1.1.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use CREATE EXTENSION hstore to load this file. \quit
-
-CREATE TYPE hstore;
-
-CREATE FUNCTION hstore_in(cstring)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_out(hstore)
-RETURNS cstring
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_recv(internal)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_send(hstore)
-RETURNS bytea
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE TYPE hstore (
-INTERNALLENGTH = -1,
-INPUT = hstore_in,
-OUTPUT = hstore_out,
-RECEIVE = hstore_recv,
-SEND = hstore_send,
-STORAGE = extended
-);
-
-CREATE FUNCTION hstore_version_diag(hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_version_diag'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION fetchval(hstore,text)
-RETURNS text
-AS 'MODULE_PATHNAME','hstore_fetchval'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = fetchval
-);
-
-CREATE FUNCTION slice_array(hstore,text[])
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_slice_to_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = slice_array
-);
-
-CREATE FUNCTION slice(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_slice_to_hstore'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION isexists(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION exist(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ? (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = exist,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_any(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_any'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ?| (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = exists_any,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_all(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_all'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ? (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = exists_all,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION isdefined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION defined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_hstore'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = delete
-);
-
-CREATE FUNCTION hs_concat(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_concat'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR || (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_concat
-);
-
-CREATE FUNCTION hs_contains(hstore,hstore)
-RETURNS 

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Err, actually, now that I think about it, that might be a problem:
 what happens if we're trying to test two characters for equality and
 the encoding conversion fails?

This is surely all entirely doable given the encoding infrastructure
we already have.  We might need some minor refactoring, eg to have
a way of not throwing an error, but it's not going to be that hard
to achieve if somebody wants to do it.  So I still see little reason
for making the JSON type behave visibly differently in non-UTF8 database
encodings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Hmm, OK.  But what happens if the user doesn't specify a schema name 
 explicitly?

For the creation case, RangeVarGetCreationNamespace should handle that.
The code Dimitri quoted is wrong, but not that hard to fix.

Unfortunately, the code he quoted for the ALTER case is also wrong,
and harder to fix.  Until you've done the lookup you don't know which
schema the referenced object is in.  And I don't care for the idea of
doing the lookup twice, as (a) it'll be slower and (b) there are race
cases where it will give the wrong answer, ie return an object other
than the one the ALTER eventually acts on.

Really I think there is not any single point where you can put the
command-trigger hook and be done.  In almost every case, the right
place is going to be buried somewhere within the execution of the
command.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 For the creation case, RangeVarGetCreationNamespace should handle that.
 The code Dimitri quoted is wrong, but not that hard to fix.

Ok.

 Unfortunately, the code he quoted for the ALTER case is also wrong,
 and harder to fix.  Until you've done the lookup you don't know which
 schema the referenced object is in.  And I don't care for the idea of
 doing the lookup twice, as (a) it'll be slower and (b) there are race
 cases where it will give the wrong answer, ie return an object other
 than the one the ALTER eventually acts on.

Oh. Yes.

 Really I think there is not any single point where you can put the
 command-trigger hook and be done.  In almost every case, the right
 place is going to be buried somewhere within the execution of the
 command.

I'm finally realizing it. I already introduced a structure called
CommandContextData to keep track of the current command elements we want
to pass down to command triggers, I think we're saying that this should
be a static variable that commands will need to be editing.

The main problem with that approach is that we will need to spread
calling the command trigger entry points to every supported command, I
wanted to avoid that first. It's no big deal though, as the API is
simple enough.

Expect a new patch made this way early next week.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpoint sync pause

2012-01-20 Thread Robert Haas
On Mon, Jan 16, 2012 at 8:59 PM, Greg Smith g...@2ndquadrant.com wrote:
 [ interesting description of problem scenario and necessary conditions for 
 reproducing it ]

This is about what I thought was happening, but I'm still not quite
sure how to recreate it in the lab.

Have you had a chance to test with Linux 3.2 does any better in this
area?  As I understand it, it doesn't do anything particularly
interesting about the willingness of the kernel to cache gigantic
amounts of dirty data, but (1) supposedly it does a better job not
yanking the disk head around by just putting foreground processes to
sleep while writes happen in the background, rather than having the
foreground processes compete with the background writer for control of
the disk head; and (2) instead of having a sharp edge where background
writing kicks in, it tries to gradually ratchet up the pressure to get
things written out.

Somehow I can't shake the feeling that this is fundamentally a Linux
problem, and that it's going to be nearly impossible to work around in
user space without some help from the kernel.  I guess in some sense
it's reasonable that calling fsync() blasts the data at the platter at
top speed, but if that leads to starving everyone else on the system
then it starts to seem a lot less reasonable: part of the kernel's job
is to guarantee all processes fair access to shared resources, and if
it doesn't do that, we're always going to be playing catch-up.

 Just one random thought: I wonder if it would make sense to cap the
 delay after each sync to the time spending performing that sync.  That
 would make the tuning of the delay less sensitive to the total number
 of files, because we won't unnecessarily wait after each sync when
 they're not actually taking any time to complete.

 This is one of the attractive ideas in this area that didn't work out so
 well when tested.  The problem is that writes into a battery-backed write
 cache will show zero latency for some time until the cache is filled...and
 then you're done.  You have to pause anyway, even though it seems write
 speed is massive, to give the cache some time to drain to disk between syncs
 that push data toward it.  Even though it absorbed your previous write with
 no delay, that doesn't mean it takes no time to process that write.  With
 proper write caching, that processing is just happening asynchronously.

Hmm, OK.  Well, to borrow a page from one of your other ideas, how
about keeping track of the number of fsync requests queued for each
file, and make the delay proportional to that number?  We might have
written the same block more than once, so it could be an overestimate,
but it rubs me the wrong way to think that a checkpoint is going to
finish late because somebody ran a CREATE TABLE statement that touched
5 or 6 catalogs, and now we've got to pause for 15-18 seconds because
they've each got one dirty block.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas
On Fri, Jan 20, 2012 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Err, actually, now that I think about it, that might be a problem:
 what happens if we're trying to test two characters for equality and
 the encoding conversion fails?

 This is surely all entirely doable given the encoding infrastructure
 we already have.  We might need some minor refactoring, eg to have
 a way of not throwing an error, but it's not going to be that hard
 to achieve if somebody wants to do it.  So I still see little reason
 for making the JSON type behave visibly differently in non-UTF8 database
 encodings.

OK.  It feels a little grotty to me, but I'll go with the flow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remembering bug #6123

2012-01-20 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 After a couple meetings, I have approval to get this into an
 application release currently in development.  Assuming that your
 patch from the 13th is good for doing the testing, I think I can
 post test results in about three weeks.  I'll also work on a
 follow-on patch to add couple paragraphs and an example of the issue
 to the docs by then.

Well, the issues about wording of the error message are obviously just
cosmetic, but I think we'd better do whatever we intend to do to the
callers of heap_lock_tuple before putting the patch through testing.
I'm inclined to go ahead and make them throw errors, just to see if
that has any effects we don't like.

I'm up to my elbows in planner guts at the moment, but will try to
fix up the patch this weekend if you want.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remembering bug #6123

2012-01-20 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I'm up to my elbows in planner guts at the moment, but will try to
 fix up the patch this weekend if you want.
 
They have scheduled testers to check on this issue next week, so it
would be great to get as close as we can on the stuff that matters.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Daniel Farina
On Thu, Jan 19, 2012 at 8:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Frankly I don't see the point of this. If the extension is an independent
 piece of (SQL) code, developed separately from an application, with its own
 lifecycle, a .sql file seems like the best way to distribute it. If it's
 not, ie. if it's an integral part of the database schema, then why package
 it as an extension in the first place?
 I'm with Heikki on not believing that this is a good idea.  If you are
 trying to do careful versioning of a set of object definitions, you want
 to stick the things in a file, you don't want them just flying by in
 submitted SQL.  Also, a large part of the point of the extension
 facility is to be able to do uninstall/reinstall and version
 upgrades/downgrades, none of which are possible unless the extension
 scripts are stored somewhere.

 ISTM your distribution concern would be much better addressed by
 installing contrib/adminpack and then just using pg_file_write()
 to put the new extension script into the remote server's library.

I think this is somewhat rube-goldberg-esque, and denies non-superuser
roles the ability to get more version management of schema and
operators.  As-is many organizations are submitting migrations via
plain SQL that include committing to a version management table that
is maintained by convention, and as-is that is considered a modern-day
best-practice.

Unless someone comes up with something fundamentally new in how to
handle the problem being solved in user-land via 'migrations' and
version tables in Postgres, I think tying completely safe, sandboxed,
non-superuser operators to super-user-only (and awkward besides) file
management via FEBE on the server-side is not going to do anything but
redelegate the problem of soundness to every migration/change
management regime, and decrease reuse of useful operators that do not
require superuser.

The ship has sailed.  Encouraging use of files and .sql buy no
soundness, because everyone is moving towards is overlaying version
management via pure FEBE anyway.  At best, this means to me that
Postgres is completely neutral about what version management regime
you want to use for operators and schemas.  At worst, it means
Postgres frustratingly unhelpful in common use cases.  And somewhere
in the middle is this may have value, but not enough to be worth
maintaining.  Given the lack of a fast and widely deployed trusted PL
integrations, I my interpretation of the truth falls somewhere between
the latter two interpretations.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-20 Thread Heikki Linnakangas
I spent some time cleaning this up. Details below, but here are the 
highlights:


* Reverted the removal of wal_writer_delay
* Doesn't rely on WAL writer. Any process can act as the leader now.
* Removed heuristic on big flushes
* Uses PGSemaphoreLock/Unlock instead of latches

On 20.01.2012 17:30, Robert Haas wrote:

On Thu, Jan 19, 2012 at 10:46 PM, Peter Geogheganpe...@2ndquadrant.com  wrote:

+   if (delta  XLOG_SEG_SIZE * CheckPointSegments ||
+   !ProcGlobal-groupCommitAvailable)

That seems like a gigantic value.  I would think that we'd want to
forget about group commit any time we're behind by more than one
segment (maybe less).


I'm sure that you're right - I myself described it as horrible in my
original mail. I think that whatever value we set this to ought to be
well reasoned. Right now, your magic number doesn't seem much better
than my bogus heuristic (which only ever served as a placeholder
implementation that hinted at a well-principled formula). What's your
basis for suggesting that that limit would always be close to optimal?


It's probably not - I suspect even a single WAL segment still too
large.  I'm pretty sure that it would be safe to always flush up to
the next block boundary, because we always write the whole block
anyway even if it's only partially filled.  So there's no possible
regression there.  Anything larger than the end of the current 8kB
block is going to be a trade-off between latency and throughput, and
it seems to me that there's probably only one way to figure out what's
reasonable: test a bunch of different values and see which ones
perform well in practice.  Maybe we could answer part of the question
by writing a test program which does repeated sequential writes of
increasingly-large chunks of data.  We might find out for example that
64kB is basically the same as 8kB because most of the overhead is in
the system call anyway, but beyond that the curve kinks.  Or it may be
that 16kB is already twice as slow as 8kB, or that you can go up to
1MB without a problem.  I don't see a way to know that without testing
it on a couple different pieces of hardware and seeing what happens.


I ripped away that heuristic for a flush that's too large. After 
pondering it for a while, I came to the conclusion that as implemented 
in the patch, it was pointless. The thing is, if the big flush doesn't 
go through the group commit machinery, it's going to grab the 
WALWriteLock straight away. Before any smaller commits can proceed, they 
will need to grab that lock anyway, so the effect is the same as if the 
big flush had just joined the queue.


Maybe we should have a heuristic to split a large flush into smaller 
chunks. The WAL segment boundary would be a quite natural split point, 
for example, because when crossing the file boundary you have to issue 
separate fsync()s for the files anyway. But I'm happy with leaving that 
out for now, it's not any worse than it used to be without group commit 
anyway.



Ugh.  Our current system doesn't even require this code to be
interruptible, I think: you can't receive cancel or die interrupts
either while waiting for an LWLock or while holding it.


Right. Or within HOLD/RESUME_INTERRUPTS blocks.

The patch added some CHECK_FOR_INTERRUPTS() calls into various places in 
the commit/abort codepaths, but that's all dead code; they're all within 
a HOLD/RESUME_INTERRUPTS blocks.


I replaced the usage of latches with the more traditional 
PGSemaphoreLock/Unlock. It semaphore model works just fine in this case, 
where we have a lwlock to guard the wait queue, and when a process is 
waiting we know it will be woken up or something messed up at a pretty 
low level. We don't need a timeout or to wake up at other signals while 
waiting. Furthermore, the WAL writer didn't have a latch before this 
patch; it's not many lines of code to initialize the latch and set up 
the signal handler for it, but it already has a semaphore that's ready 
to use.


I wonder if we should rename the file into xlogflush.c or something 
like that, to make it clear that this works with any XLOG flushes, not 
just commits? Group commit is the usual term for this feature, so we 
should definitely mention that in the comments, but it might be better 
to rename the files/functions to emphasize that this is about WAL 
flushing in general.


This probably needs some further cleanup to fix things I've broken, and 
I haven't done any performance testing, but it's progress. Do you have a 
shell script or something that you used for the performance tests that I 
could run? Or would you like to re-run the tests you did earlier with 
this patch?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 1760,1809  SET ENABLE_SEQSCAN TO OFF;
/listitem
   /varlistentry
  
-  varlistentry id=guc-commit-delay xreflabel=commit_delay
-   

Re: [HACKERS] Inline Extension

2012-01-20 Thread Heikki Linnakangas

On 21.01.2012 00:00, Daniel Farina wrote:

I think this is somewhat rube-goldberg-esque, and denies non-superuser
roles the ability to get more version management of schema and
operators.  As-is many organizations are submitting migrations via
plain SQL that include committing to a version management table that
is maintained by convention, and as-is that is considered a modern-day
best-practice.


Even if you give the version number in the CREATE EXTENSION command, 
it's by convention that people actually maintain a sane versioning 
policy. If people don't take version management seriously, you will 
quickly end up with five different versions of an extension, all with 
version number 0.1.


Another approach is to use comments on the objects saying version 
1.23. Those generally move together with the objects themselves; they 
are included in pg_dump schema-only dump, for example, while the 
contents of a table are not.



The ship has sailed.  Encouraging use of files and .sql buy no
soundness, because everyone is moving towards is overlaying version
management via pure FEBE anyway.


What is FEBE?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gistVacuumUpdate

2012-01-20 Thread Heikki Linnakangas

On 18.01.2012 23:38, YAMAMOTO Takashi wrote:

i'm wondering because what gistVacuumUpdate used to do does not seem to
be necessarily tied to the old-style VACUUM FULL.
currently, no one will re-union keys after tuple removals, right?


Right. I believe gistVacuumUpdate needed an AccessExclusiveLock, so now 
that VACUUM FULL is gone, there is no good moment to do it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-20 Thread Daniel Farina
On Fri, Jan 20, 2012 at 2:48 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Even if you give the version number in the CREATE EXTENSION command, it's by
 convention that people actually maintain a sane versioning policy. If people
 don't take version management seriously, you will quickly end up with five
 different versions of an extension, all with version number 0.1.

Projects are taking it seriously, and invest a lot of effort in it.
There is no shortage of schema versioning frameworks, of varying
levels of maturitybut some are quite complete by the standards of
their users.  However, there is little knowledge shared between them,
and the no database gives them much support, so idiosyncrasy becomes
inevitable.

Unless one makes loading regular, unpackaged operators via SQL
impossible (which is, on the face of it, doesn't sound like a good
idea to me), the only people who will bother with CREATE EXTENSION
with inline content will be those who care and want to maintain
version control.

I stick to my original assessment: the ship has sailed, or perhaps,
more accurately, is sailing.  Perhaps one could gain wisdom by
thinking of this problem differently: I'm going to write a migration
tool to help developers in my framework or workflow deal with change,
and I really would like it if my database helped me by.

I don't think we should fool ourselves that we'll be letting people
shoot themselves in the foot with regard to versioning if we give them
versions.  People have been painfully and assiduously trying to avoid
version problems for some time now, with no help from the database,
and they do it entirely via pure SQL statements from Postgres' point
of view.

 Another approach is to use comments on the objects saying version 1.23.
 Those generally move together with the objects themselves; they are included
 in pg_dump schema-only dump, for example, while the contents of a table are
 not.

Idiosyncratic, but perhaps useful for some people.  Again, workarounds
are possible, and per my previous statements, even pervasive.  That
doesn't make it a desirable thing to avoid assisting with.

 What is FEBE?

FrontEnd BackEnd Protocol, I thought? The one libpq speaks. Is there a
better name?

Here are some reasons not do this feature, in my mind:

* Is the world ready yet to think about versioning operators in a
database the same way software is versioned?

* Related to that: If the feature is written, it will have to be
supported.  Does the feature have enough impact at this time?

* Do we know how to design the feature correctly so it will attract
use, especially as a Postgres-ism, which is related in some part to
the decaying importance (my perception) of 'database agnostic'

* Are the awkward user-land versioning strategies good enough? Would
it be better to focus elsewhere where things are even *more* awkward?

My assessment is: I know some people who would use this feature, but a
broad swathe are not loading too many operators into their database.
It's hard to know if that's because packaging operators in databases
has been so abysmally bad in the industry at large, or because the
choice of sandboxed languages are just not appealing for, say, a
Python or Ruby developer to write things that are not a series of SQL
statements, ala pgsql, and a lot of that (sans triggers) can be done
over the wire anyway.  Maybe embedded Javascript can help with this,
but it's not the Here and Now.

If I had to invest with complexity with regard to versioning, I'd
rather invest in Postgres being able to tell me a hashed version of
all or a controllable subset of the tables, types, attributes in the
database, as to quickly pick out drift between an freshly created
database (produced by the migrations on-disk against development) and
any cruft that sneaks into a production server that is thought to be
the same as the others.  That would have huge impact with common use
cases at this very moment in time, so it provides a reasonable
backdrop to evaluate the cost/impact of the proposed feature.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gistVacuumUpdate

2012-01-20 Thread Jaime Casanova
On Wed, Jan 18, 2012 at 10:05 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 13.01.2012 06:24, YAMAMOTO Takashi wrote:

 hi,

 gistVacuumUpdate was removed when old-style VACUUM FULL was removed.
 i wonder why.
 it was not practical and REINDEX is preferred?

 anyway, the removal seems incomplete and there are some leftovers:
        F_TUPLES_DELETED
        F_DELETED
        XLOG_GIST_PAGE_DELETE


 Hmm, in theory we might bring back support for deleting pages in the future,
 I'm guessing F_DELETED and the WAL record type were left in place because of
 that.

I guess it was an oversight, because GistPageSetDeleted() is being
used in gistRedoPageDeleteRecord() and GistPageIsDeleted() in a few
other places

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring relation free space

2012-01-20 Thread Jaime Casanova
On Wed, Jan 18, 2012 at 7:01 PM, Noah Misch n...@leadboat.com wrote:
 On Wed, Jan 18, 2012 at 09:46:20AM -0500, Jaime Casanova wrote:

 ignoring all non-leaf pages still gives a considerable difference
 between pgstattuple and relation_free_space()

 pgstattuple() counts the single B-tree meta page as always-full, while
 relation_free_space() skips it for all purposes.  For tiny indexes, that can
 shift the percentage dramatically.


ok, i will reformulate the question. why is fine ignoring non-leaf
pages but is not fine to ignore the meta page?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] REVIEW: pg_stat_statements with query tree based normalization

2012-01-20 Thread Kääriäinen Anssi
This is a short review of pg_stat_statements based on quick testing of the 
feature.

1. Installation: after managing to actually build PostgreSQL and contrib 
modules + changing
shared_preload_libraries to include pg_stat_statements I got this error:
FATAL:  could not create shared memory segment: Invalid argument
DETAIL:  Failed system call was shmget(key=5431001, size=34627584, 03600)

So, I needed to rise my SMH limits. I guess there isn't anything surprising or 
erroneous about
this but figured this is worth mentioning.

2. Usability:
  - If you have two similarly named tables in different schemas, for example 
public.tbl and
some_schema.tbl these tables will get different entries in pg_stat_statements. 
However, the
table names are not schema-qualified, so it is impossible to see which table is 
which.

# select query, calls from pg_stat_statements where query like 'select%test%';
query| calls 
-+---
 select * from test; | 4
 select * from test; | 2

# select * from tmp.test;
# select query, calls from pg_stat_statements where query like 'select%test%';
query| calls 
-+---
 select * from test; | 5
 select * from test; | 2

# select * from test;
# select query, calls from pg_stat_statements where query like 'select%test%';
query| calls 
-+---
 select * from test; | 5
 select * from test; | 3

- It would be nice from user perspective to transform where id in (list of 
values) to
where id in(?) always, regardless of the length of the list. Now where id in 
(1, 2) is
grouped to different pool than where id in (1, 2, 3).

3. I tried to run Django's test suite a few times and see if there would be any 
unexpected
behavior. Some results (note that I haven't tried to reproduce this error on 
master
without the patch):

test_django_testdb_default=# SELECT aggregation_publisher.id, 
aggregation_publisher.name, aggregation_publisher.num_awards, 
MIN(aggregation_book.pubdate) AS earliest_book FROM 
aggregation_publisher LEFT OUTER JOIN aggregation_book ON 
(aggregation_publisher.id = aggregation_book.publisher_id) GROUP BY 
aggregation_publisher.id, aggregation_publisher.name, 
aggregation_publisher.num_awards HAVING NOT 
(MIN(aggregation_book.pubdate) IS NULL) ORDER BY earliest_book ASC;

ERROR:  unrecognized node type for havingclause node: 315
test_django_testdb_default=# \d aggregation_publisher
   Table public.aggregation_publisher
   Column   |  Type  | Modifiers
  
++
 id | integer| not null default 
nextval('aggregation_publisher_id_seq'::regclass)
 name   | character varying(255) | not null
 num_awards | integer| not null
Indexes:
aggregation_publisher_pkey PRIMARY KEY, btree (id)
Referenced by:
TABLE aggregation_book CONSTRAINT aggregation_book_publisher_id_fkey 
FOREIGN KEY (publisher_id) REFERENCES aggregation_publisher(id) DEFERRABLE 
INITIALLY DEFERRED


The time used for insert statements seems suspiciously low. Maybe PostgreSQL is 
just faster than I thought :)

query  | INSERT INTO django_content_type (id, name, app_label, 
model) VALUES (?, ?, ?, ?)
calls  | 5490
total_time | 0.8231193

Multi-values inserts do not seem to be normalized:
query  | INSERT INTO custom_pk_business_employees (business_id, 
employee_id) VALUES ('Sears', 456), ('Sears', 123)
calls  | 1256
total_time | 0.619693

I did not see any noticeable difference in runtimes with pg_stat_statements 
installed or uninstalled (as extension).
Not tested on master without the patch at all.

Overall the feature seems to be really useful.

 - Anssi
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring relation free space

2012-01-20 Thread Noah Misch
On Fri, Jan 20, 2012 at 07:03:22PM -0500, Jaime Casanova wrote:
 On Wed, Jan 18, 2012 at 7:01 PM, Noah Misch n...@leadboat.com wrote:
  On Wed, Jan 18, 2012 at 09:46:20AM -0500, Jaime Casanova wrote:
 
  ignoring all non-leaf pages still gives a considerable difference
  between pgstattuple and relation_free_space()
 
  pgstattuple() counts the single B-tree meta page as always-full, while
  relation_free_space() skips it for all purposes. ?For tiny indexes, that can
  shift the percentage dramatically.
 
 
 ok, i will reformulate the question. why is fine ignoring non-leaf
 pages but is not fine to ignore the meta page?

pgstattuple() figures the free_percent by adding up all space available to
hold tuples and dividing that by the simple size of the relation.  Non-leaf
pages and the meta page get identical treatment: both never hold tuples, so
they do not contribute to the free space.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-20 Thread Robert Haas
On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

 No, I haven't done a full review yet - I just noticed that right at
 the outset and wanted to be sure that got addressed.  I can look it
 over more carefully, and test it.

 Corrected your point, and another found during retest.

 Works as advertised, docs corrected.

This comment needs updating:

/*
 * To drop an index safely, we must grab exclusive lock on its parent
 * table.  Exclusive lock on the index alone is insufficient because
 * another backend might be about to execute a query on the parent table.
 * If it relies on a previously cached list of index OIDs, then it could
 * attempt to access the just-dropped index.  We must therefore take a
 * table lock strong enough to prevent all queries on the table from
 * proceeding until we commit and send out a shared-cache-inval notice
 * that will make them update their index lists.
 */

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table?  If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there.  There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough.  In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid.  There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it.  That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen.  I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere.  I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work.  If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready.  That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
   [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
[, ...] [ CASCADE | RESTRICT ]
   CONCURRENTLY replaceable class=PARAMETERname/replaceable

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict.  After that you must write
CONCURRENTLY, followed by exactly one name.  I think what you want is:

DROP INDEX [ IF EXISTS ] replaceable
class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY replaceable class=PARAMETERname/replaceable

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY.  It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code.  So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] replaceable
class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time.  That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob

Re: [HACKERS] REVIEW: pg_stat_statements with query tree based normalization

2012-01-20 Thread Peter Geoghegan
On 21 January 2012 00:24, Kääriäinen Anssi anssi.kaariai...@thl.fi wrote:
 I did not see any noticeable difference in runtimes with pg_stat_statements 
 installed or uninstalled (as extension).
 Not tested on master without the patch at all.

 Overall the feature seems to be really useful.

Thanks for taking the time to review the patch!

I am going to produce another revision in response to feedback already
received. I intend to outline the steps that it will take to resolve
some outstanding issues in the next day or so. It would be nice if you
could take a look at the revised patch that is ultimately produced.
Should I keep you posted?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-20 Thread Peter Geoghegan
On 20 January 2012 22:30, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Maybe we should have a heuristic to split a large flush into smaller chunks.
 The WAL segment boundary would be a quite natural split point, for example,
 because when crossing the file boundary you have to issue separate fsync()s
 for the files anyway. But I'm happy with leaving that out for now, it's not
 any worse than it used to be without group commit anyway.

Let's quantify how much of a problem that is first.

 The patch added some CHECK_FOR_INTERRUPTS() calls into various places in the
 commit/abort codepaths, but that's all dead code; they're all within a
 HOLD/RESUME_INTERRUPTS blocks.

Fair enough, but do you think it's acceptable to say well, we can't
have errors within critical blocks anyway, and nor can the driver, so
just assume that the driver will successfully service the request?

 Furthermore, the WAL writer didn't have a latch before this patch; it's not
 many lines of code to initialize the latch and set up the signal handler for
 it, but it already has a semaphore that's ready to use.

Uh, yes it did - WalWriterDelay is passed to WaitLatch(). It didn't
use the process latch as I did (which is initialised anyway), though I
believe it should have (on general principal, to avoid invalidation
issues when generic handlers are registered, plus because the process
latch is already initialised and available), which is why I changed
it. Whatever you do with group commit, you're going to want to look at
the changes made to the WAL Writer in my original patch outside of the
main loop, because there are one or two fixes for it included
(registering a usr1 signal handler and saving errno in existing
handlers), and because we need an alternative way of power saving if
you're not going to include the mechanism originally proposed - maybe
something similar to what has been done for the BGWriter in my patch
for that. At 5 wake-ups per second by default, the process is by a
wide margin the biggest culprit (except BGWriter, which is also 5 by
default, but that is addressed by my other patch that you're
reviewing). I want to fix that problem too, and possibly investigate
if there's something to be done about the checkpointer (though that
only has a 5 second timeout, so it's not a major concern). In any
case, we should encourage the idea that auxiliary processes will use
the proc latch, unless perhaps they only use a local latch like the
avlauncher does, imho.

Why did you remove the new assertions in unix_latch.c/win32_latch.c? I
think you should keep them, as well as my additional comments on latch
timeout invalidation issues in latch.h which are untouched in your
revision (though this looks to be a rough revision, so I shouldn't
read anything into that either way I suppose). In general, we should
try and use the process latch whenever we can.

 I wonder if we should rename the file into xlogflush.c or something like
 that, to make it clear that this works with any XLOG flushes, not just
 commits? Group commit is the usual term for this feature, so we should
 definitely mention that in the comments, but it might be better to rename
 the files/functions to emphasize that this is about WAL flushing in general.

Okay.

 This probably needs some further cleanup to fix things I've broken, and I
 haven't done any performance testing, but it's progress. Do you have a shell
 script or something that you used for the performance tests that I could
 run? Or would you like to re-run the tests you did earlier with this patch?

No, I'm using pgbench-tools, and there's no reason to think that you
couldn't get similar results on ordinary hardware, which is all I used
- obviously you'll want to make sure that you're using a file system
that supports granular fsyncs, like ext4. All of the details,
including the config for pgbench-tools, are in my original e-mail. I
have taken the time to re-run the benchmark and update the wiki with
that new information - I'd call it a draw.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-20 Thread Peter Geoghegan
On 21 January 2012 03:13, Peter Geoghegan pe...@2ndquadrant.com wrote:
 I have taken the time to re-run the benchmark and update the wiki with
 that new information - I'd call it a draw.

On second though, the occasional latency spikes that we see with my
patch (which uses the poll() based latch in the run that is
benchmarked) might be significant - difficult to say. The averages are
about the same though.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Alexander Korotkov
Hi!

Thank you for your feedback!

On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers e...@xs4all.nl wrote:

 The patch yields spectacular speedups with small, simple-enough regexen.
  But it does not do a
 good enough job when guessing where to use the index and where fall back
 to Seq Scan.  This can
 lead to (also spectacular) slow-downs, compared to Seq Scan.

Could you give some examples of regexes where index scan becomes slower
than seq scan?


 I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in
 your patch), is that
 true?   I can understand you want that value to be low to limit the above
 risk, but now it reduces
 the usability of the feature a bit: one has to split up larger
 char-classes into several smaller
 ones to make a statement use the index: i.e.:

Yes, MAX_COLOR_CHARS is number of maximum character in automata color when
that color is divided to a separated characters. And it's likely there
could be better solution than just have this hard limit.


 Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I
 suppose this is
 normally necessary for perfomance reasons, but it would be useful te be
 able to compile a test
 version that allows it.  I don't know how hard that would be.

I seems that Ctrl-C was impossible because procedure of trigrams
exctraction becomes so long while it is not breakable. It's not difficult
to make this procedure breakable, but actually it just shouldn't take so
long.


 There is also a minor bug, I think, when running with  'set
 enable_seqscan=off'  in combination
 with a too-large regex:

Thanks for pointing. Will be fixed.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Alexander Korotkov
On Fri, Jan 20, 2012 at 8:45 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Fri, Jan 20, 2012 at 01:33, Erik Rijkers e...@xs4all.nl wrote:
  Btw, it seems impossible to Ctrl-C out of a search once it is submitted;
 I suppose this is
  normally necessary for perfomance reasons, but it would be useful te be
 able to compile a test
  version that allows it.

 I believe being interruptible is a requirement for the patch to be
 accepted.

 CHECK_FOR_INTERRUPTS(); should be added to the indeterminate loops.

Sure. It's easy to fix. But it seems that in this case gin extract_query
method becomes slow (because index scan itself is breakable). So, it just
shouldn't work so long.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Alexander Korotkov
On Fri, Jan 20, 2012 at 12:54 AM, Alexander Korotkov
aekorot...@gmail.comwrote:

 On Fri, Jan 20, 2012 at 12:30 AM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:

 Apart from that, the multibyte issue seems like the big one. Any way
 around that?

 Conversion of pg_wchar to multibyte character is the only way I found to
 avoid serious hacking of existing regexp code. Do you think additional
 function in pg_wchar_tbl which converts pg_wchar back to multibyte
 character is possible solution?

Do you have any notes on it? I could make the patch which adds such
function into core.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Erik Rijkers
On Sat, January 21, 2012 06:26, Alexander Korotkov wrote:
 Hi!

 Thank you for your feedback!

 On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers e...@xs4all.nl wrote:

 The patch yields spectacular speedups with small, simple-enough regexen.
  But it does not do a
 good enough job when guessing where to use the index and where fall back
 to Seq Scan.  This can
 lead to (also spectacular) slow-downs, compared to Seq Scan.

 Could you give some examples of regexes where index scan becomes slower
 than seq scan?



x[aeio]+q takes many minutes, uninterruptible.  It's now running for almost 30 
minutes, I'm sure
it will come back eventually, but I think I'll kill it later on; I suppose you 
get the point  ;-)

Even with {n,m} quantifiers it's easy to hit slowdowns:

   (table azjunk6 is 112 MB, the index 693 MB.)
   (MAX_COLOR_CHARS=4  = your original patch)


instance   tableregex  plan   time

HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3566.088 
ms
HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3540.606 
ms
HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3495.034 
ms
HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3510.403 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3724.131 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3844.999 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3835.190 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3724.016 
ms


HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3617.997 
ms
HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3644.215 
ms
HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3636.976 
ms
HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3625.493 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   7885.247 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   8799.082 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   7754.152 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   7721.332 
ms


This is with your patch as is; in instances compiled with higher 
MAX_COLOR_CHARS (I did 6 and 9),
it is of course even easier to dream up a slow regex...



Erik Rijkers






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-01-20 Thread Hitoshi Harada
On Sun, Dec 18, 2011 at 6:36 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hi,

 The extensions work we began in 9.1 is not yet finished entirely
 (*cough*), so I'm opening a new patch series here by attacking the
 dependency problems.

 Some people want us to manage extension version numbers with sorting
 semantics so that we are able to depend on foo = 1.2 and crazy things
 like this, and I think the need is reasonable and easier than that to
 address.

The patch applies with one reject, which I could fix easily. The make
check passed.

extension-provides.v1.patch  extensions.docx
haradh1-mac:postgresql-head haradh1$ patch -p1 
~/Downloads/extension-provides.v1.patch
patching file contrib/pg_upgrade_support/pg_upgrade_support.c
patching file doc/src/sgml/catalogs.sgml
Hunk #2 succeeded at 3063 (offset 11 lines).
Hunk #3 succeeded at 6865 (offset 11 lines).
patching file doc/src/sgml/extend.sgml
patching file src/backend/catalog/Makefile
patching file src/backend/catalog/dependency.c
patching file src/backend/catalog/system_views.sql
patching file src/backend/commands/extension.c
patching file src/include/catalog/dependency.h
patching file src/include/catalog/indexing.h
patching file src/include/catalog/pg_extension_feature.h
patching file src/include/catalog/pg_proc.h
Hunk #1 succeeded at 4341 (offset 25 lines).
patching file src/include/commands/extension.h
patching file src/test/regress/expected/rules.out
patching file src/test/regress/expected/sanity_check.out
Hunk #1 succeeded at 103 (offset 1 line).
Hunk #2 FAILED at 163.
1 out of 2 hunks FAILED -- saving rejects to file
src/test/regress/expected/sanity_check.out.rej

What this patch does is basically:
- add pg_extension_feature to store feature (name) provided by extensions
- extension control file now has provide parameter to indicate
feature, which is comma separated
- when creating an extension, the backend looks for feature required
in control file
- the installed extension has dependency on feature

So, the first thing is catalog.

db1=# \d pg_extension_feature;
Table pg_catalog.pg_extension_feature
   Column   | Type | Modifiers
+--+---
 extoid | oid  | not null
 extfeature | name | not null
Indexes:
pg_extension_feature_index UNIQUE, btree (extoid, extfeature)
pg_extension_feature_oid_index UNIQUE, btree (oid)

* I'm not quit sure why pg_extension_feature_index needs extoid column.
* I have a big question to add two-column catalog. I don't mind the
actual number of columns, but if the table has only two columns, it
implies the design may be bad. Only two column catalog other than this
is pg_largeobject_metadata.

Next, some questions:
- Why is the finer dependency needed? Do you have tangible example
that struggles with the dependency granularity? I feel so good about
the existing dependency on extension as an extension developer of
several ones.
- What happens if DROP EXTENSION ... CASCADE? Does it work?
- How does pg_upgrade interact with this? The dependency changes.

A minor memo.
list_extension_features(): I guess the size argument for repalloc is bogus.

So, that's pretty much I've reviewed quickly. I'll need more time to
look in detail, but I'd like more inputs for the high-level design and
direction.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers