Re: [HACKERS] leakproof

2012-02-23 Thread Christian Ullrich

* Andrew Dunstan wrote:


Returning to the original point, I've come to the conclusion that
pure isn't the right way to go. The trouble with leakproof is that
it doesn't point to what it is that's not leaking, which is
information rather than memory, as many might imagine (and I did)
without further hints. I'm not sure any single English word would be
as descriptive as I'd like.


Jumping into the bikeshedding here, I'm not convinced that all that 
many users would immediately jump to the wrong conclusion (that being 
free of memory leaks). Rather the opposite, indeed.


IMHO, you may be looking at this through C developer colored 
glasses, where any leak must immediately and without doubt be a 
resource leak of some kind. As Don Baccus pointed out, it would be a 
highly unusual function that was not at least intended to be free of 
memory leaks.


A DBA, on the other hand, might -- and, again, this is MHO only -- not 
decide what the attribute must mean without consulting the 
documentation. If she was especially concerned about information 
security/data protection, she might even guess right about what kind 
of leak is meant. There is no chance of that with terms like SILENT 
or PURE.


Of all the suggestions I have seen in this thread, I think LEAKPROOF 
is actually the best fit for the purpose. My favorite alternative, 
just to suggest one, would be NONDISCLOSING/NOT DISCLOSING, but I 
prefer LEAKPROOF even over that, not just because it's shorter.


--
Christian Ullrich



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit a445cb92 not tested without OpenSSL support?

2012-02-23 Thread Peter Eisentraut
On tor, 2012-02-23 at 00:26 +, Affan Salman wrote:
 Hey folks
 
 It appears that the commit a445cb92ef5b3a31313ebce30e18cc1d6e0bdecb
 causes ld to bail when building *without* OpenSSL support:
 
 utils/misc/guc.o:(.data+0x4d80): undefined reference to `ssl_cert_file'
 utils/misc/guc.o:(.data+0x4ddc): undefined reference to `ssl_key_file'
 utils/misc/guc.o:(.data+0x4e38): undefined reference to `ssl_ca_file'
 utils/misc/guc.o:(.data+0x4e94): undefined reference to `ssl_crl_file'
 
 Tested; the conditional compilation predicated upon #ifdef USE_SSL in
 be-secure.c is satisfied with configure (--)with(-)openssl to succeed.

Fixed, thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers with DO functionality

2012-02-23 Thread Thom Brown
On 23 February 2012 07:15, Gianni Ciolli gianni.cio...@2ndquadrant.it wrote:
 On Fri, Feb 17, 2012 at 11:43:53AM -0500, Andrew Dunstan wrote:
 On 02/17/2012 11:29 AM, David E. Wheeler wrote:
 On Feb 17, 2012, at 5:22 AM, Thom Brown wrote:
 The purpose being to only have a single statement to set up the
 trigger rather than setting up a separate trigger function which will
 unlikely be re-used by other triggers... or is this of dubious
 benefit?
 +1, though I imagine it would just give it a generated name and save it 
 anyway, eh?
 Before we rush into this, let's consider all the wrinkles. For
 example, what if you need to change the function? And how would you
 edit the function in psql? It might be a bit more involved that it
 seems at first glance, although my initial reaction was the same as
 David's.

 Another complication: anonymous triggers would either have to be
 alone, or provide a mechanism to manage a sequence of anonymous
 triggers on the same table (such as replace the third trigger with
 ... or move trigger #4 in position #2, or deciding their order of
 execution).

Isn't the order of execution alphabetical by trigger name in
PostgreSQL?  The Triggers themselves wouldn't be anonymous, we'd still
be naming them.  It's the referenced functions that would no longer
need defining, and even those probably won't technically be anonymous
as they'll need cataloguing somewhere.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-23 Thread Simon Riggs
On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The industry accepted description for non-sequential access is random
 access whether or not the function that describes the movement is
 entirely random. To argue otherwise is merely hairsplitting.

 I don't think so.

PostgreSQL already uses  a parameter called random_page_cost to
describe non-sequential access. Perhaps that is wrong and we need a
third parameter?

 For example, a bitmap index scan contrives to speed
 things up by arranging for the table I/O to happen in ascending block
 number order, with skips, rather than in random order, as a plain
 index scan would do, and that seems to be a pretty effective
 technique.  Except to the extent that it interferes with the kernel's
 ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
 and 5 than to read blocks 1, 2, 4, and 5.  Not reading block 3 can't
 require more effort than reading it.

By that argument, ANALYZE never could run longer than VACUUM ANALYZE,
so you disagree with Tom and I and you can't explain Pavel's
results

cost_bitmap_heap_scan() uses random_page_cost to evaluate the cost
of accessing blocks, even though the author knew the access was in
ascending block number order. Why was that?

Note that the cost_bitmap_heap_scan() cost can be  than
cost-seqscan() for certain parameter values.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REASSIGN OWNED lacks support for FDWs

2012-02-23 Thread Etsuro Fujita

(2012/02/23 5:32), Alvaro Herrera wrote:

My only concern on the patch is

+static void
+AlterForeignServerOwner_internal(Relation rel, HeapTuple tup, Oid
newOwnerId)
+{
+Form_pg_foreign_server form;

-srvId = HeapTupleGetOid(tup);
  form = (Form_pg_foreign_server) GETSTRUCT(tup);

  if (form-srvowner != newOwnerId)
@@ -366,10 +388,15 @@ AlterForeignServerOwner(const char *name, Oid
newOwnerId)
  /* Superusers can always do it */
  if (!superuser())
  {

I wonder if superusers can always do it.  For example, is it OK for
superusers to change the ownership of a foreign server owned by old_role
to new_role that doesn't have USAGE privilege on its foreign data wrapper.


Well, permission checking are just what they were before the patch.  I
did not change them here.  I didn't participate in the discussions that
led to the current behavior, but as far as I know the guiding principle
here is that superusers always can do whatever they please.  Maybe what
you point out is a bug in the behavior (both before and after my patch),
but if so, please raise it separately.


OK.  Thanks.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers with DO functionality

2012-02-23 Thread Gianni Ciolli
On Thu, Feb 23, 2012 at 08:26:47AM +, Thom Brown wrote:
 On 23 February 2012 07:15, Gianni Ciolli gianni.cio...@2ndquadrant.it wrote:
  Another complication: anonymous triggers would either have to be
  alone, or provide a mechanism to manage a sequence of anonymous
  triggers on the same table (such as replace the third trigger with
  ... or move trigger #4 in position #2, or deciding their order of
  execution).
 
 Isn't the order of execution alphabetical by trigger name in
 PostgreSQL?  The Triggers themselves wouldn't be anonymous, we'd still
 be naming them.  It's the referenced functions that would no longer
 need defining, and even those probably won't technically be anonymous
 as they'll need cataloguing somewhere.

You're right, sorry.

I misread the proposal as anonymous triggers when instead it is
(named) triggers each implemented via an anonymous function.

Cheers,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] incompatible pointer types with newer zlib

2012-02-23 Thread Peter Eisentraut
According to me update logs, somewhere between zlib versions 1.2.3.4 and
1.2.6, the definition of the gzFile type was changed from void * to
struct gzFile_s *, an opaque struct.  Note that gzFile is already the
pointer in either case.  Our code has assumed, however, that you use
gzFile like FILE, namely that the APIs take a pointer to the type, but
that is not the case.  So code like

gzFile *handle = gzopen(...)

is wrong.

This used to pass silently because you can assign a void* to a void**,
but with the newer definition you get a bunch of warnings like

pg_backup_files.c: In function ‘_StartData’:
pg_backup_files.c:256:11: warning: assignment from incompatible pointer type 
[enabled by default]
pg_backup_files.c: In function ‘_WriteData’:
pg_backup_files.c:271:2: warning: passing argument 1 of ‘gzwrite’ from 
incompatible pointer type [enabled by default]
/usr/include/zlib.h:1318:12: note: expected ‘gzFile’ but argument is of type 
‘struct gzFile_s **’

Affected are pg_dump and pg_basebackup.

Fixing most of this is not difficult, see attached patch.  The only
ugliness is in pg_backup_archiver.h

FILE   *FH; /* General purpose file handle */

which is used throughout pg_dump as sometimes a real FILE* and sometimes
a gzFile handle.  There are also some fileno() calls on this, so just
replacing this with an #ifdef isn't going to work.  This might need some
more restructuring to make the code truly type-safe.  My quick patch
replaces the type with void*, thus sort of restoring the original
situation that allowed this to work.

Note that these are only warnings, so we probably don't need to worry
about backpatching this in a hurry.

*** i/src/bin/pg_basebackup/pg_basebackup.c
--- w/src/bin/pg_basebackup/pg_basebackup.c
***
*** 82,88  static bool segment_callback(XLogRecPtr segendpos, uint32 timeline);
  
  #ifdef HAVE_LIBZ
  static const char *
! get_gz_error(gzFile *gzf)
  {
  	int			errnum;
  	const char *errmsg;
--- 82,88 
  
  #ifdef HAVE_LIBZ
  static const char *
! get_gz_error(gzFile gzf)
  {
  	int			errnum;
  	const char *errmsg;
***
*** 450,456  ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
  	FILE	   *tarfile = NULL;
  
  #ifdef HAVE_LIBZ
! 	gzFile	   *ztarfile = NULL;
  #endif
  
  	if (PQgetisnull(res, rownum, 0))
--- 450,456 
  	FILE	   *tarfile = NULL;
  
  #ifdef HAVE_LIBZ
! 	gzFile		ztarfile = NULL;
  #endif
  
  	if (PQgetisnull(res, rownum, 0))
*** i/src/bin/pg_dump/pg_backup_archiver.h
--- w/src/bin/pg_dump/pg_backup_archiver.h
***
*** 247,253  typedef struct _archiveHandle
  	int			blobCount;		/* # of blobs restored */
  
  	char	   *fSpec;			/* Archive File Spec */
! 	FILE	   *FH;/* General purpose file handle */
  	void	   *OF;
  	int			gzOut;			/* Output file */
  
--- 247,253 
  	int			blobCount;		/* # of blobs restored */
  
  	char	   *fSpec;			/* Archive File Spec */
! 	void	   *FH;/* General purpose file handle */
  	void	   *OF;
  	int			gzOut;			/* Output file */
  
*** i/src/bin/pg_dump/pg_backup_files.c
--- w/src/bin/pg_dump/pg_backup_files.c
***
*** 60,66  typedef struct
  typedef struct
  {
  #ifdef HAVE_LIBZ
! 	gzFile	   *FH;
  #else
  	FILE	   *FH;
  #endif
--- 60,66 
  typedef struct
  {
  #ifdef HAVE_LIBZ
! 	gzFile		FH;
  #else
  	FILE	   *FH;
  #endif
*** i/src/bin/pg_dump/pg_backup_tar.c
--- w/src/bin/pg_dump/pg_backup_tar.c
***
*** 58,73  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
  #define K_STD_BUF_SIZE 1024
  
  
  #ifdef HAVE_LIBZ
!  /* typedef gzFile	 ThingFile; */
! typedef FILE ThingFile;
  #else
! typedef FILE ThingFile;
  #endif
- 
- typedef struct
- {
- 	ThingFile  *zFH;
  	FILE	   *nFH;
  	FILE	   *tarFH;
  	FILE	   *tmpFH;
--- 58,70 
  #define K_STD_BUF_SIZE 1024
  
  
+ typedef struct
+ {
  #ifdef HAVE_LIBZ
! 	gzFile		zFH;
  #else
! 	FILE	   *zFH;
  #endif
  	FILE	   *nFH;
  	FILE	   *tarFH;
  	FILE	   *tmpFH;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:

 All in all, I think this is in pretty much final shape.  Only pg_upgrade
 bits are still missing.  If sharp eyes could give this a critical look
 and knuckle-cracking testers could give it a spin, that would be
 helpful.

 Lack of pg_upgrade support leaves this version incomplete, because that
 omission would constitute a blocker for beta 2.  This version changes as much
 code compared to the version I reviewed at the beginning of the CommitFest as
 that version changed overall.  In that light, it's time to close the books on
 this patch for the purpose of this CommitFest; I'm marking it Returned with
 Feedback.  Thanks for your efforts thus far.

My view would be that with 90 files touched this is a very large
patch, so that alone makes me wonder whether we should commit this
patch, so I agree with Noah and compliment him on an excellent
detailed review.

However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_statements normalization: re-review

2012-02-23 Thread Daniel Farina
Hello again,

I'm reviewing the revised version of pg_stat_statements again.  In
particular, this new version has done away with the mechanical but
invasive change to the lexing that preserved *both* the position and
length of constant values. (See
http://archives.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn+O1VX3+OGrc4Bscn4=a...@mail.gmail.com)

I did the review front-matter again (check against a recent version,
make sure it does what it says it'll do, et al..) and did trigger an
assertion failure that seems to be an oversight, already reported to
Peter.  I found that oversight by running the SQLAlchemy Python
query-generation toolkit's tests, which are voluminous.  The
functionality is seemingly mostly unchanged.

What I'm going to do here is focus on the back-end source changes that
are not part of the contrib.  The size of the changes are much
reduced, but their semantics are also somewhat more complex...so, here
goes.  Conclusion first:

* The small changes to hashing are probably not strictly required,
unless collisions are known to get terrible.

* The hook to analyze is pretty straight-forward and seem like the other hooks

* The addition of a field to scribble upon in the Query and Plan nodes
is something I'd like to understand better, as these leave me with a
bit of disquiet.

What follows is in much more detail, and broken up by functionality
and file grouping in the backend.

 src/include/access/hash.h|4 ++-
 src/backend/access/hash/hashfunc.c   |   21 ++-

This adjusts the hash_any procedure to support returning two possible
precisions (64 bit and 32 bit) by tacking on a Boolean flag to make
the precision selectable.  The hash_any operator is never used
directly, and instead via macro, and a second macro to support 64-bit
precision has been added.  It seems a bit wonky to me to use a flag to
select this rather than encapsulating the common logic in a procedure
and just break this up into three symbols, though.  I think the longer
precision is used to try to get fewer collisions with not too much
slowdown.  Although it may meaningfully improve the quality of the
hashing for pg_stat_statements or living without the extra hashing
bits might lead to unusable results (?), per the usual semantics of
hashing it is probably not *strictly* necessary.

A smidgen of avoidable formatting niggles ( 80 columns) are in this section.

 src/backend/nodes/copyfuncs.c|5 
 src/backend/nodes/equalfuncs.c   |4 +++
 src/backend/nodes/outfuncs.c |   10 +
 src/backend/optimizer/plan/planner.c |1 +
 src/backend/nodes/readfuncs.c|   12 +++
 src/include/nodes/parsenodes.h   |3 ++
 src/include/nodes/plannodes.h|2 +

These have all been changed in the usual manner to support one new
field, the queryId, on the toplevel Plan and Query nodes.  The queryId
is 64-bit field copied from queries to plans to tie together plans to
be used by plugins (quoth the source) as they flow through the
system.  pg_stat_statements fills them with the 64-bit hash of the
query text -- a reasonable functionality to possess, I think, but the
abstraction seems iffy: plugins cannot compose well if they all need
to use the queryId for different reasons.  Somehow I get the feeling
that this field can be avoided or its use case can be satisfied in a
more satisfying way.

Mysteriously, in the Query representation the symbol uses an
underscored-notation (query_id) and in the planner it uses a camelcase
one, queryId.  Overall, the other fields in those structs all use
camelcase, so I'd recommend normalizing it.

 src/backend/parser/analyze.c |   37 ++---
 src/include/parser/analyze.h |   12 +++

These changes implement hooks for the once-non-hookable symbols
parse_analyze_varparams and parse_analyze, in seemingly the same way
they are implemented in other hooks in the system.  These are neatly
symmetrical with the planner hook.  I only ask if there is a way to
have one hook and not two, but I suppose that would be a similar
question as why is are there two ways to symbols to enter into
parsing, and not one.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-23 Thread Kyotaro HORIGUCHI
Hello, this is new version of the patch.

# This patch is based on the commit
# 2bbd88f8f841b01efb073972b60d4dc1ff1f6fd0 @ Feb 13 to avoid the
# compile error caused by undeclared LEAKPROOF in kwlist.h.

 It must free the PGresults that PQgetResult() returns.

 I'm sorry. It slipped out of my mind. Add PQclear() for the
 return value.

 Also, please fix 2 issues mentined here:

- PQsetRowProcessorErrMsg() now handles msg as const string.

- Changed the order of the parameters of the type PQrowProcessor.
  New order is (PGresult *res, PGrowValue *columns, void *params).

# PQsetRowProcessorErrMsg outside of callback is not implemented.

- Documentation and dblink are modified according to the changes
  above.


By the way, I would like to ask you one question. What is the
reason why void* should be head or tail of the parameter list?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..7e02497 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams  157
 PQping158
 PQpingParams  159
 PQlibVersion  160
+PQsetRowProcessor	  161
+PQgetRowProcessor	  162
+PQsetRowProcessorErrMsg	  163
+PQskipResult		  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn-wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(conn-errorMessage);
 	initPQExpBuffer(conn-workBuffer);
 
+	/* set up initial row buffer */
+	conn-rowBufLen = 32;
+	conn-rowBuf = (PGrowValue *)malloc(conn-rowBufLen * sizeof(PGrowValue));
+
 	if (conn-inBuffer == NULL ||
 		conn-outBuffer == NULL ||
+		conn-rowBuf == NULL ||
 		PQExpBufferBroken(conn-errorMessage) ||
 		PQExpBufferBroken(conn-workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn-inBuffer);
 	if (conn-outBuffer)
 		free(conn-outBuffer);
+	if (conn-rowBuf)
+		free(conn-rowBuf);
 	termPQExpBuffer(conn-errorMessage);
 	termPQExpBuffer(conn-workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..cd287cd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, PGrowValue *columns, void *param);
 
 
 /* 
@@ -160,6 +161,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result-curBlock = NULL;
 	result-curOffset = 0;
 	result-spaceLeft = 0;
+	result-rowProcessorErrMsg = NULL;
 
 	if (conn)
 	{
@@ -701,7 +703,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn-result)
 		PQclear(conn-result);
 	conn-result = NULL;
-	conn-curTuple = NULL;
 }
 
 /*
@@ -756,7 +757,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn-result;
 	conn-result = NULL;		/* handing over ownership to caller */
-	conn-curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +828,87 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn-rowProcessor = (func ? func : pqAddRow);
+	conn-rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+	if (param)
+		*param = conn-rowProcessorParam;
+
+	return conn-rowProcessor;
+}
+
+/*
+ * PQsetRowProcessorErrMsg
+ *Set the error message pass back to the caller of RowProcessor.
+ *
+ *You can replace the previous message by alternative mes, or clear
+ *it with NULL.
+ */
+void
+PQsetRowProcessorErrMsg(PGresult *res, const char *msg)
+{
+	if (msg)
+		res-rowProcessorErrMsg = pqResultStrdup(res, msg);
+	else
+		res-rowProcessorErrMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+	

[HACKERS] overriding current_timestamp

2012-02-23 Thread Peter Eisentraut
For (unit) testing, I have often had the need to override the current
timestamp in the database system.  For example, a column default,
function, or views would make use of the current timestamp in some way,
and to test the behavior, it's sometimes useful to tweak the current
timestamp.

What might be a good way to do that?

Just overwrite xactStartTimestamp?  Is that safe?  If it weren't static,
a user-loaded function could do it.

Overwrite pg_catalog.now() in the test database?

Other ideas?

Some semi-official support for this sort of thing would be good.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements normalization: re-review

2012-02-23 Thread Peter Geoghegan
On 23 February 2012 09:58, Daniel Farina dan...@heroku.com wrote:
 What I'm going to do here is focus on the back-end source changes that
 are not part of the contrib.  The size of the changes are much
 reduced, but their semantics are also somewhat more complex...so, here
 goes.  Conclusion first:

 * The small changes to hashing are probably not strictly required,
 unless collisions are known to get terrible.

I imagine that collisions would be rather difficult to demonstrate at
all with a 32-bit value.

 * The addition of a field to scribble upon in the Query and Plan nodes
 is something I'd like to understand better, as these leave me with a
 bit of disquiet.

 What follows is in much more detail, and broken up by functionality
 and file grouping in the backend.

  src/include/access/hash.h            |    4 ++-
  src/backend/access/hash/hashfunc.c   |   21 ++-

 This adjusts the hash_any procedure to support returning two possible
 precisions (64 bit and 32 bit) by tacking on a Boolean flag to make
 the precision selectable.  The hash_any operator is never used
 directly, and instead via macro, and a second macro to support 64-bit
 precision has been added.  It seems a bit wonky to me to use a flag to
 select this rather than encapsulating the common logic in a procedure
 and just break this up into three symbols, though.  I think the longer
 precision is used to try to get fewer collisions with not too much
 slowdown.  Although it may meaningfully improve the quality of the
 hashing for pg_stat_statements or living without the extra hashing
 bits might lead to unusable results (?), per the usual semantics of
 hashing it is probably not *strictly* necessary.

It might be unnecessary. It's difficult to know where to draw the line.

 A smidgen of avoidable formatting niggles ( 80 columns) are in this section.

  src/backend/nodes/copyfuncs.c        |    5 
  src/backend/nodes/equalfuncs.c       |    4 +++
  src/backend/nodes/outfuncs.c         |   10 +
  src/backend/optimizer/plan/planner.c |    1 +
  src/backend/nodes/readfuncs.c        |   12 +++
  src/include/nodes/parsenodes.h       |    3 ++
  src/include/nodes/plannodes.h        |    2 +

I'll look into those.

 These have all been changed in the usual manner to support one new
 field, the queryId, on the toplevel Plan and Query nodes.  The queryId
 is 64-bit field copied from queries to plans to tie together plans to
 be used by plugins (quoth the source) as they flow through the
 system.  pg_stat_statements fills them with the 64-bit hash of the
 query text -- a reasonable functionality to possess, I think, but the
 abstraction seems iffy: plugins cannot compose well if they all need
 to use the queryId for different reasons.  Somehow I get the feeling
 that this field can be avoided or its use case can be satisfied in a
 more satisfying way.

Maybe the answer here is to have a simple mechanism for modules to
claim the exclusive right to be able to set that flag?

 Mysteriously, in the Query representation the symbol uses an
 underscored-notation (query_id) and in the planner it uses a camelcase
 one, queryId.  Overall, the other fields in those structs all use
 camelcase, so I'd recommend normalizing it.

Fair enough.

There is one other piece of infrastructure that I think is going to
need to go into core that was not in the most recent revision. I
believe that it can be justified independently of this work.
Basically, there are some very specific scenarios in which the
location of Const nodes is incorrect, because the core system makes
that location the same location as another coercion-related node. Now,
this actually doesn't matter to the positioning of the caret in most
error messages, as they usually ought to be the same anyway, as in
this scenario:

select 'foo'::integer;

Here, both the Const and coercion node's location start from the
SCONST token - no problem there.

However, consider this query:

select cast('foo' as integer);

and this query:

select integer 'foo';

Now, the location of the Const and Coercion nodes is *still* the same,
but starting from the location of the cast in the first example and
integer in the second.

I believe that this is a bug. They should have different locations
here, so that I can always rely on the location of a Const having a
single token that is one of only a handful of possible CONST tokens,
as I currently can in all other scenarios.

Of course, this didn't particularly matter before now, as the Const
location field only dictated caret position in messages generated from
outside the parser. Any error message should be blaming the Coercion
node if there's a problem with coercion, so there's no reason why this
needs to break any error context messages. If a message wants to blame
a Const on something, surely it should have a caret placed in the
right location - the location of the const token. If it wants to blame
the Coercion node on something, that won't change, 

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-02-23 Thread Kohei KaiGai
2012/2/20 Yeb Havinga yebhavi...@gmail.com:
 On 2012-02-05 10:09, Kohei KaiGai wrote:

 The attached part-1 patch moves related routines from hooks.c to label.c
 because of references to static variables. The part-2 patch implements above
 mechanism.


 I took a short look at this patch but am stuck getting the regression test
 to run properly.

 First, patch 2 misses the file sepgsql.sql.in and therefore the creation
 function command for sepgsql_setcon is missing.

Thanks for your comments.

I added the definition of sepgsql_setcon function to sepgsql.sql.in file,
in addition to patch rebasing.

 So maybe this is because my start domain is not s0-s0:c0.c1023

 However, when trying to run bash or psql in domain
 unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 I get permission
 denied.

 Distribution is FC15, sestatus
 SELinux status:                 enabled
 SELinuxfs mount:                /selinux
 Current mode:                   enforcing
 Mode from config file:          enforcing
 Policy version:                 24
 Policy from config file:        targeted

The default security policy does not permit dynamic domain transition
even if unconfined domain, in contradiction to its name.
(IMO, it is fair enough design to avoid single point of failure like root user.)

The security policy of regression test contains a set of rules to reduce
categories assigned to unconfined domain.
So, could you try the following steps.
1. Build the latest policy
% make -f /usr/share/selinux/devel/Makefile -C contrib/sepgsql
2. Install the policy module
% sudo semodule -i contrib/sepgsql/sepgsql-regtest.pp
3. Turn on the sepgsql_regression_test_mode
% sudo setsebool -P sepgsql_regression_test_mode=1

I believe it allows to switch security label of the client, as long as we try to
reduce categories.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-v9.2-sepgsql-setcon.part-2.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-23 Thread Greg Smith
I've updated http://highperfpostgres.com/results-write-9.2-cf4/index.htm 
with more data including two alternate background writer configurations. 
 Since the sensitive part of the original results was scales of 500 and 
1000, I've also gone back and added scale=750 runs to all results. 
Quick summary is that I'm not worried about 9.2 performance now, I'm 
increasingly confident that the earlier problems I reported on are just 
bad interactions between the reinvigorated background writer and 
workloads that are tough to write to disk.  I'm satisfied I understand 
these test results well enough to start evaluating the pending 9.2 
changes in the CF queue I wanted to benchmark.


Attached are now useful client and scale graphs.  All of 9.0, 9.1, and 
9.2 have been run now with exactly the same scales and clients loads, so 
the graphs of all three versions can be compared.  The two 9.2 
variations with alternate parameters were only run at some scales, which 
means you can't compare them usefully on the clients graph; only on the 
scaling one.  They are very obviously in a whole different range of that 
graph, just ignore the two that are way below the rest.


Here's a repeat of the interesting parts of the data set with new 
points.  Here 9.2N is without no background writer, while 9.2H has a 
background writer set to half strength:  bgwriter_lru_maxpages = 50  I 
picked one middle client level out of the result=750 results just to 
focus better, relative results are not sensitive to that:


scale=500, db is 46% of RAM
Version Avg TPS
9.0  1961
9.1  2255
9.2  2525
9.2N 2267
9.2H 2300

scale=750, db is 69% of RAM; clients=16
Version Avg TPS
9.0  1298
9.1  1387
9.2  1477
9.2N 1489
9.2H 943

scale=1000, db is 94% of RAM; clients=4
Version TPS
9.0 535
9.1 491 (-8.4% relative to 9.0)
9.2 338 (-31.2% relative to 9.1)
9.2N 516
9.2H 400

The fact that almost all the performance regression against 9.2 goes 
away if the background writer is disabled is an interesting point.  That 
results actually get worse at scale=500 without the background writer is 
another.  That pair of observations makes me feel better that there's a 
tuning trade-off here being implicitly made by having a more active 
background writer in 9.2; it helps on some cases, hurts others.  That I 
can deal with.  Everything lines up perfectly at scale=500:  if I 
reorder on TPS:


scale=500, db is 46% of RAM
Version Avg TPS
9.2  2525
9.2H 2300
9.2N 2267
9.1  2255
9.0  1961

That makes you want to say the more background writer the better, right?

The new scale=750 numbers are weird though, and they keep this from 
being so clear.  I ran the parts that were most weird twice just because 
they seemed so odd, and it was repeatable.  Just like scale=500, with 
scale=750 the 9.2/no background writer has the best performance of any 
run.  But the half-intensity one has the worst!  It would be nice if it 
fell between the 9.2 and 9.2N results, instead it's at the other edge.


The only lesson I can think to draw here is that once we're in the area 
where performance is dominated by the trivia around exactly how writes 
are scheduled, the optimal ordering of writes is just too complicated to 
model that easily.  The rest of this is all speculation on how to fit 
some ideas to this data.


Going back to 8.3 development, one of the design trade-offs I was very 
concerned about was not wasting resources by having the BGW run too 
often.  Even then it was clear that for these simple pgbench tests, 
there were situations where letting backends do their own writes was 
better than speculative writes from the background writer.  The BGW 
constantly risks writing a page that will be re-dirtied before it goes 
to disk.  That can't be too common though in the current design, since 
it avoids things with high usage counts.  (The original BGW wrote things 
that were used recently, and that was a measurable problem by 8.3)


I think an even bigger factor now is that the BGW writes can disturb 
write ordering/combining done at the kernel and storage levels.  It's 
painfully obvious now how much PostgreSQL relies on that to get good 
performance.  All sorts of things break badly if we aren't getting 
random writes scheduled to optimize seek times, in as many contexts as 
possible.  It doesn't seem unreasonable that background writer writes 
can introduce some delay into the checkpoint writes, just by adding more 
random components to what is already a difficult to handle write/sync 
series.  That's what I think what these results are showing is that 
background writer writes can deoptimize other forms of write.


A second fact that's visible from the TPS graphs over the test run, and 
obvious if you think about it, is that BGW writes force data to physical 
disk earlier than they otherwise might go there.  That's a subtle 
pattern in the graphs.  I expect that though, given one element to do I 
write this? in Linux is how old the write is.  Wondering about this 
really 

Re: [HACKERS] Finer Extension dependencies

2012-02-23 Thread Hitoshi Harada
On Mon, Feb 13, 2012 at 3:18 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hi,

 Sorry for the delays, I'm back on PostgreSQL related work again.

 Hitoshi Harada umi.tan...@gmail.com writes:
 I just tried DROP EXTENSION now, and found it broken :(

 Please find v2 of the patch.  I did change the dependency management in
 between the simple cases and the more challenging ones and forgot that I
 had to retest it all in between, which is what happen on a tight
 schedule and when working at night, I guess.


The patch is partially rejected due to the pg_proc column changes from
leakproof, but I could apply manually.

I confirmed DROP EXTENSION is fixed now.  In turn, it seems to me
requires doesn't work.  My test ext2.control looks like:

comment = 'sample1'
default_version = '1.0'
requires = 'featZ'
relocatable = true

And simply this extension can be installed against cleanly-initialized
database.  I double-checked there's no entry for featz in
pg_extension_feature.

Also, I found that if control file has duplicate names in provides,
the error is not friendly (duplicate entry for pg_extension_feature,
or something).  This is same if provides has the extension name
itself.

I'll have a look more but give comments so far so that you can find
solutions to them soon.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Option for pg_dump to dump tables in clustered index order

2012-02-23 Thread Timothy Garnett
Hi All,

Having pg_dump dump tables in clustered index order is something we've
found we've needed a fair number of times (for ex. when copying a large
logging tables or sets of tables out of one database where the order is not
maintained into another for running a bunch of backend analysis) as it
saves us the clustering step which is often longer then the copy step
itself.

I wanted to gauge the interest in adding an option for this to pg_dump.  A
(not production ready) patch that we've been using off of the 9.1.2 tag to
implement this is attached or can be viewed
herehttps://github.com/tgarnett/postgres/commit/d4412aa4047e7a0822ee93fa47a1c0d282cb7925.
 It adds a --cluster-order option to pg_dump. If people have any
suggestions on better ways of pulling out the order clause or other
improvements that would be great too.

Tim
From d4412aa4047e7a0822ee93fa47a1c0d282cb7925 Mon Sep 17 00:00:00 2001
From: Timothy Garnett tgarn...@panjiva.com
Date: Fri, 10 Feb 2012 16:21:32 -0500
Subject: [PATCH] Support for pg_dump to dump tables in cluster order if a
 clustered index is defined on the table, a little hacked in
 with how the data is passed around and how the order is
 pulled out of the db. The latter is the only
 semi-problematic part as you might be able to generate
 (very odd) table column names that would break the regex
 used there which would cause the sql query to be invalid
 and therefore not dump data for that table.  But as long as
 you don't name an clustered column/function something like
 foo ) WHERE or the like should be ok.

---
 src/bin/pg_dump/pg_dump.c |   48 +---
 src/bin/pg_dump/pg_dump.h |1 +
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 57f2ed3..9ef9a71 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -134,6 +134,7 @@
 static int	binary_upgrade = 0;
 static int	disable_dollar_quoting = 0;
 static int	dump_inserts = 0;
+static int	cluster_order = 0;
 static int	column_inserts = 0;
 static int	no_security_labels = 0;
 static int	no_unlogged_table_data = 0;
@@ -319,6 +320,7 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 		 */
 		{attribute-inserts, no_argument, column_inserts, 1},
 		{binary-upgrade, no_argument, binary_upgrade, 1},
+		{cluster-order, no_argument, cluster_order, 1},
 		{column-inserts, no_argument, column_inserts, 1},
 		{disable-dollar-quoting, no_argument, disable_dollar_quoting, 1},
 		{disable-triggers, no_argument, disable_triggers, 1},
@@ -849,6 +851,7 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 	printf(_(  -T, --exclude-table=TABLE   do NOT dump the named table(s)\n));
 	printf(_(  -x, --no-privileges do not dump privileges (grant/revoke)\n));
 	printf(_(  --binary-upgradefor use by upgrade utilities only\n));
+	printf(_(  --cluster-order dump table data in clustered index order (= 8.2)\n));
 	printf(_(  --column-insertsdump data as INSERT commands with column names\n));
 	printf(_(  --disable-dollar-quotingdisable dollar quoting, use SQL standard quoting\n));
 	printf(_(  --disable-triggers  disable triggers during data-only restore\n));
@@ -1245,7 +1248,7 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 		 classname),
 		  column_list);
 	}
-	else if (tdinfo-filtercond)
+	else if (tdinfo-filtercond || tbinfo-ordercond)
 	{
 		/* Note: this syntax is only supported in 8.2 and up */
 		appendPQExpBufferStr(q, COPY (SELECT );
@@ -1257,10 +1260,14 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 		}
 		else
 			appendPQExpBufferStr(q, * );
-		appendPQExpBuffer(q, FROM %s %s) TO stdout;,
+		appendPQExpBuffer(q, FROM %s ,
 		  fmtQualifiedId(tbinfo-dobj.namespace-dobj.name,
-		 classname),
-		  tdinfo-filtercond);
+		 classname));
+		if (tdinfo-filtercond)
+		  appendPQExpBuffer(q, %s , tdinfo-filtercond);
+		if (tbinfo-ordercond)
+		  appendPQExpBuffer(q, %s, tbinfo-ordercond);
+		appendPQExpBuffer(q, ) TO stdout;);
 	}
 	else
 	{
@@ -1388,6 +1395,8 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 	}
 	if (tdinfo-filtercond)
 		appendPQExpBuffer(q,  %s, tdinfo-filtercond);
+	if (tbinfo-ordercond)
+		appendPQExpBuffer(q,  %s, tbinfo-ordercond);
 
 	res = PQexec(g_conn, q-data);
 	check_sql_result(res, g_conn, q-data, PGRES_COMMAND_OK);
@@ -4400,6 +4409,7 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 i_oid,
 i_indexname,
 i_indexdef,
+i_indexdeforderclause,
 i_indnkeys,
 i_indkey,
 i_indisclustered,
@@ -4451,6 +4461,14 @@ static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
 			  SELECT t.tableoid, t.oid, 
 			  t.relname AS indexname, 
 	 

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-23 Thread Marko Kreen
On Thu, Feb 23, 2012 at 07:14:03PM +0900, Kyotaro HORIGUCHI wrote:
 Hello, this is new version of the patch.

Looks good.

 By the way, I would like to ask you one question. What is the
 reason why void* should be head or tail of the parameter list?

Aesthetical reasons:

1) PGresult and PGrowValue belong together.

2) void* is probably the context object for handler.  When doing
   object-oriented programming in C the main object is usually first.
   Like libpq does - PGconn is always first argument.

But as libpq does not know the actual meaning of void* for handler,
is can be last param as well.

When I wrote the demo code, I noticed that it is unnatural to have
void* in the middle.


Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
usable outside of handler, we can simplify internal usage as well:
the PGresult-rowProcessorErrMsg can be dropped and let's use
-errMsg to transport the error message.

The PGresult is long-lived structure and adding fields for such
temporary usage feels wrong.  There is no other libpq code between
row processor and getAnotherTuple, so the use is safe.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 11:17 AM, Greg Smith g...@2ndquadrant.com wrote:

 A second fact that's visible from the TPS graphs over the test run, and
 obvious if you think about it, is that BGW writes force data to physical
 disk earlier than they otherwise might go there.  That's a subtle pattern in
 the graphs.  I expect that though, given one element to do I write this?
 in Linux is how old the write is.  Wondering about this really emphasises
 that I need to either add graphing of vmstat/iostat data to these graphs or
 switch to a benchmark that does that already.  I think I've got just enough
 people using pgbench-tools to justify the feature even if I plan to use the
 program less.

For me, that is the key point.

For the test being performed there is no value in things being written
earlier, since doing so merely overexercises the I/O.

We should note that there is no feedback process in the bgwriter to do
writes only when the level of dirty writes by backends is high enough
to warrant the activity. Note that Linux has a demand paging
algorithm, it doesn't just clean all of the time. That's the reason
you still see some swapping, because that activity is what wakes the
pager. We don't count the number of dirty writes by backends, we just
keep cleaning even when nobody wants it.

Earlier, I pointed out that bgwriter is being woken any time a user
marks a buffer dirty. That is overkill. The bgwriter should stay
asleep until a threshold number (TBD) of dirty writes is reached, then
it should wake up and do some cleaning. Having a continuously active
bgwriter is pointless, for some workloads whereas for others, it
helps. So having a sleeping bgwriter isn't just a power management
issue its a performance issue in some cases.

/*
 * Even in cases where there's been little or no buffer allocation
 * activity, we want to make a small amount of progress through the 
buffer
 * cache so that as many reusable buffers as possible are clean after an
 * idle period.
 *
 * (scan_whole_pool_milliseconds / BgWriterDelay) computes how many 
times
 * the BGW will be called during the scan_whole_pool time; slice the
 * buffer pool into that many sections.
 */

Since scan_whole_pool_milliseconds is set to 2 minutes we scan the
whole bufferpool every 2 minutes, no matter how big the bufferpool,
even when nothing else is happening. Not cool.

I think it would be sensible to have bgwriter stop when 10% of
shared_buffers are clean, rather than keep going even when no dirty
writes are happening.

So my suggestion is that we put in an additional clause into
BgBufferSync() to allow min_scan_buffers to fall to zero when X% of
shared buffers is clean. After that bgwriter should sleep. And be
woken again only by a dirty write by a user backend. That sounds like
clean ratio will flip between 0 and X% but first dirty write will
occur long before we git zero, so that will cause bgwriter to attempt
to maintain a reasonably steady state clean ratio.



I would also take a wild guess that the 750 results are due to
freelist contention. To assess that, I post again the patch shown on
other threads designed to assess the overall level of freelist lwlock
contention.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..36b0160 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -17,6 +17,7 @@
 
 #include storage/buf_internals.h
 #include storage/bufmgr.h
+#include utils/timestamp.h
 
 
 /*
@@ -41,6 +42,21 @@ typedef struct
 	 */
 	uint32		completePasses; /* Complete cycles of the clock sweep */
 	uint32		numBufferAllocs;	/* Buffers allocated since last reset */
+
+	/*
+	 * Wait Statistics
+	 */
+	long	waitBufferAllocSecs;
+	int		waitBufferAllocUSecs;
+	int		waitBufferAlloc;
+
+	long	waitBufferFreeSecs;
+	int		waitBufferFreeUSecs;
+	int		waitBufferFree;
+
+	long	waitSyncStartSecs;
+	int		waitSyncStartUSecs;
+	int		waitSyncStart;
 } BufferStrategyControl;
 
 /* Pointers to shared state */
@@ -125,7 +141,29 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 
 	/* Nope, so lock the freelist */
 	*lock_held = true;
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+	{
+		TimestampTz waitStart = GetCurrentTimestamp();
+		TimestampTz waitEnd;
+		long		wait_secs;
+		int			wait_usecs;
+
+		LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+
+		waitEnd = GetCurrentTimestamp();
+
+		TimestampDifference(waitStart, waitEnd,
+		wait_secs, wait_usecs);
+
+		StrategyControl-waitBufferAllocSecs += wait_secs;
+		StrategyControl-waitBufferAllocUSecs += wait_usecs;
+		if (StrategyControl-waitBufferAllocUSecs  100)
+		{
+			StrategyControl-waitBufferAllocUSecs -= 100;
+			

Re: [HACKERS] pg_stat_statements normalization: re-review

2012-02-23 Thread Peter Geoghegan
On 23 February 2012 11:09, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 23 February 2012 09:58, Daniel Farina dan...@heroku.com wrote:
 * The small changes to hashing are probably not strictly required,
 unless collisions are known to get terrible.

 I imagine that collisions would be rather difficult to demonstrate at
 all with a 32-bit value.

Assuming that the hash function exhibits a perfectly uniform
distribution of values (FWIW, hash_any is said to exhibit the
avalanche effect), the birthday problem provides a mathematical basis
for estimating the probability of some 2 queries being alike. Assuming
a population of 1,000 queries are in play at any given time (i.e. the
default value of pg_stat_statements.max) and 2 ^ 32 days of the
year, that puts the probability of a collision at a vanishingly small
number. I cannot calculate the number with Gnome calculator. Let's
pretend that 2 ^ 32 is exactly 42 million, rather than approximately
4.2 *billion*. Even then, the probability of collision is a minuscule
0.01857 .

I'd have to agree that a uint32 hash is quite sufficient here.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Jeroen Vermeulen

On 2012-02-23 10:18, Simon Riggs wrote:


However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.


The suggested immutable-column constraint was meant as a potential 
80/20 workaround.  Definitely not a full solution, helpful to some, 
probably easier to do.  I don't know if an immutable key would actually 
be enough to elide foreign-key locks though.


Simon, I think you had a reason why it couldn't work, but I didn't quite 
get your meaning and didn't want to distract things further at that 
stage.  You wrote that it doesn't do what KEY LOCKS are designed to 
do...  any chance you might recall what the problem was?


I don't mean to be pushy about my pet idea, and heaven knows I don't 
have time to implement it, but it'd be good to know whether I should put 
the whole thought to rest.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:

 Making pg_multixact persistent across clean shutdowns is no bridge to cross
 lightly, since it means committing to an on-disk format for an indefinite
 period.  We should do it; the benefits of this patch justify it, and I haven't
 identified a way to avoid it without incurring worse problems.

I can't actually see anything in the patch that explains why this is
required. (That is something we should reject more patches on, since
it creates a higher maintenance burden).

Can someone explain? We might think of a way to avoid that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 1:08 PM, Jeroen Vermeulen j...@xs4all.nl wrote:

 Simon, I think you had a reason why it couldn't work, but I didn't quite get
 your meaning and didn't want to distract things further at that stage.  You
 wrote that it doesn't do what KEY LOCKS are designed to do...  any chance
 you might recall what the problem was?

The IMMUTABLE idea would work, but it requires all users to recode
their apps. By the time they've done that we'll have probably fixed
the problem in full anyway, so then we have to ask them to stop again,
which is hard so we'll be stuck with a performance tweak that applies
to just one release. So its the fully automatic solution we're looking
for. I don't object to someone implementing IMMUTABLE, I'm just saying
its not a way to get this patch simpler and therefore acceptable.

If people are willing to recode apps to avoid this then hire me and
I'll tell you how ;-)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] overriding current_timestamp

2012-02-23 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 For (unit) testing, I have often had the need to override the current
 timestamp in the database system.  For example, a column default,
 function, or views would make use of the current timestamp in some way,
 and to test the behavior, it's sometimes useful to tweak the current
 timestamp.

 What might be a good way to do that?

 Just overwrite xactStartTimestamp?  Is that safe?  If it weren't static,
 a user-loaded function could do it.

I think it's safe enough if you can figure out where/when to do it.  Do
you need this to happen invisibly, or is it okay to require the test
script to call a set-the-timestamp function in each transaction?
If the former, it'd likely be necessary to hook into the transaction
start/end hooks.

 Overwrite pg_catalog.now() in the test database?

Yeah, that would work too if you'd rather do it at that end.

 Some semi-official support for this sort of thing would be good.

Mumble.  It's not hard to think of applications where monkeying with the
system clock would amount to a security breach.  So I'm not that excited
about providing a way to do it even semi-officially.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-23 Thread Robert Haas
On Thu, Feb 23, 2012 at 3:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The industry accepted description for non-sequential access is random
 access whether or not the function that describes the movement is
 entirely random. To argue otherwise is merely hairsplitting.

 I don't think so.

 PostgreSQL already uses  a parameter called random_page_cost to
 describe non-sequential access. Perhaps that is wrong and we need a
 third parameter?

 For example, a bitmap index scan contrives to speed
 things up by arranging for the table I/O to happen in ascending block
 number order, with skips, rather than in random order, as a plain
 index scan would do, and that seems to be a pretty effective
 technique.  Except to the extent that it interferes with the kernel's
 ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
 and 5 than to read blocks 1, 2, 4, and 5.  Not reading block 3 can't
 require more effort than reading it.

 By that argument, ANALYZE never could run longer than VACUUM ANALYZE,
 so you disagree with Tom and I and you can't explain Pavel's
 results

 cost_bitmap_heap_scan() uses random_page_cost to evaluate the cost
 of accessing blocks, even though the author knew the access was in
 ascending block number order. Why was that?

 Note that the cost_bitmap_heap_scan() cost can be  than
 cost-seqscan() for certain parameter values.

I think all three of us are saying more or less the same thing in
slightly different words, so I'd rather not have an argument about
this one.  But you're right: I can't explain Pavel's results, unless
doing ANALYZE before VACUUM is causing skip-block reads that defeat
the kernel's read-ahead detection.  I think it's fairly self-evident
that reading a fixed-size subset of the pages in ascending order can't
*in general* be more expensive than reading all of an arbitrarily
large table, and so I believe we're all in agreement that the behavior
he observed is unusual.  As to the cost estimation stuff, we use
random_page_cost as an approximation: there may be a head seek
involved, but to do better we'd have to estimate the likely length of
the seek based on the number of blocks skipped, something we currently
view as irrelevant, and it's not clear that it would improve the
quality of the estimate very much - there are other, probably larger
sources of error, such as the fact that the sequential logical block
number doesn't imply sequential physical position on the platter,
since the OS often fragments the file, especially (I think) on
Windows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
 On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:
 
  Making pg_multixact persistent across clean shutdowns is no bridge to cross
  lightly, since it means committing to an on-disk format for an indefinite
  period.  We should do it; the benefits of this patch justify it, and I 
  haven't
  identified a way to avoid it without incurring worse problems.
 
 I can't actually see anything in the patch that explains why this is
 required. (That is something we should reject more patches on, since
 it creates a higher maintenance burden).
 
 Can someone explain? We might think of a way to avoid that.

Sure.  The problem is that we are allowing updated rows to be locked (and
locked rows to be updated).  This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

(In the previous code, a multixact never meant an update, it always
signified only shared locks.  After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)

This means that this extended Xmax info needs to be able to survive, so
that it's possible to retrieve it after a crash; because even if the
lockers are all gone, the updater might have committed and this means
the tuple is dead.  If we failed to keep this, the tuple would be
considered live which would be wrong because the other version of the
tuple, which was created by the update, is also live.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
 On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:

  Making pg_multixact persistent across clean shutdowns is no bridge to cross
  lightly, since it means committing to an on-disk format for an indefinite
  period.  We should do it; the benefits of this patch justify it, and I 
  haven't
  identified a way to avoid it without incurring worse problems.

 I can't actually see anything in the patch that explains why this is
 required. (That is something we should reject more patches on, since
 it creates a higher maintenance burden).

 Can someone explain? We might think of a way to avoid that.

 Sure.  The problem is that we are allowing updated rows to be locked (and
 locked rows to be updated).  This means that we need to store extended
 Xmax information in tuples that goes beyond mere locks, which is what we
 were doing previously -- they may now have locks and updates simultaneously.

 (In the previous code, a multixact never meant an update, it always
 signified only shared locks.  After a crash, all backends that could
 have been holding locks must necessarily be gone, so the multixact info
 is not interesting and can be treated like the tuple is simply live.)

 This means that this extended Xmax info needs to be able to survive, so
 that it's possible to retrieve it after a crash; because even if the
 lockers are all gone, the updater might have committed and this means
 the tuple is dead.  If we failed to keep this, the tuple would be
 considered live which would be wrong because the other version of the
 tuple, which was created by the update, is also live.

OK, thanks.

So why do we need pg_upgrade support?

If pg_multixact is not persistent now, surely there is no requirement
to have pg_upgrade do any form of upgrade? The only time we'll need to
do this is from 9.2 to 9.3, which can of course occur any time in next
year. That doesn't sound like a reason to block a patch now, because
of something that will be needed a year from now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Option for pg_dump to dump tables in clustered index order

2012-02-23 Thread Christopher Browne
On Wed, Feb 22, 2012 at 6:17 PM, Timothy Garnett tgarn...@panjiva.com wrote:
 I wanted to gauge the interest in adding an option for this to pg_dump.

I was thinking about an application for much the same feature.

Consider the case where you have a relatively small database such as
the accounting records for a not-hugely-active business.

And you'd like to handle backups via checking them into an SCM repository.

Thus...

#!/bin/sh
cd $HOME/GitBackup/Databases
pg_dump -h dbserver -p 5432 accounting  accounting.sql
git add accounting.sql
git commit -m Latest backup accounting.sql

If the database's tables have gotten clustered, then the order of data
will tend to be consistent, and differences between versions of
accounting.sql will generally represent the actual differences.

If, on the other hand, tables are not clustered, then dumps will find
tuples ordered in rather less predictable fashions, and the backups
will have more differences indicated.

I was thinking about writing a script to run CLUSTER before doing
backups.  For that step to be part of pg_dump is certainly an
interesting idea.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-02-23 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 Fixing most of this is not difficult, see attached patch.  The only
 ugliness is in pg_backup_archiver.h

 FILE   *FH; /* General purpose file handle */

 which is used throughout pg_dump as sometimes a real FILE* and sometimes
 a gzFile handle.  There are also some fileno() calls on this, so just
 replacing this with an #ifdef isn't going to work.  This might need some
 more restructuring to make the code truly type-safe.  My quick patch
 replaces the type with void*, thus sort of restoring the original
 situation that allowed this to work.

void * seems entirely reasonable given the two different usages, but
I would be happier if the patch added explicit casts whereever FH is
set to or used as one of these two types.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Sure.  The problem is that we are allowing updated rows to be locked (and
 locked rows to be updated).  This means that we need to store extended
 Xmax information in tuples that goes beyond mere locks, which is what we
 were doing previously -- they may now have locks and updates simultaneously.

 (In the previous code, a multixact never meant an update, it always
 signified only shared locks.  After a crash, all backends that could
 have been holding locks must necessarily be gone, so the multixact info
 is not interesting and can be treated like the tuple is simply live.)

Ugh.  I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.  This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint.  The only reason multixacts
were tolerable at all was that they had only one semantics.  Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
 
 On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:
 
  All in all, I think this is in pretty much final shape.  Only pg_upgrade
  bits are still missing.  If sharp eyes could give this a critical look
  and knuckle-cracking testers could give it a spin, that would be
  helpful.
 
  Lack of pg_upgrade support leaves this version incomplete, because that
  omission would constitute a blocker for beta 2.  This version changes as 
  much
  code compared to the version I reviewed at the beginning of the CommitFest 
  as
  that version changed overall.  In that light, it's time to close the books 
  on
  this patch for the purpose of this CommitFest; I'm marking it Returned with
  Feedback.  Thanks for your efforts thus far.

Now this is an interesting turn of events.  I must thank you for your
extensive review effort in the current version of the patch, and also
thank you and credit you for the idea that initially kicked this patch
from the older, smaller, simpler version I wrote during the 9.1 timeline
(which you also reviewed exhaustively).  Without your and Simon's
brilliant ideas, this patch wouldn't exist at all.

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to needs review, so that other reviewers can pick it up.

As for the late code churn, it mostly happened as a result of your
own feedback; I would have left most of it in the original state, but as
I went ahead it seemed much better to refactor things.  This is mostly
in heapam.c.  As for multixact.c, it also had a lot of churn, but that
was mostly to restore it to the state it has in the master branch,
dropping much of the code I had written to handle multixact truncation.
The new code there and in the vacuum code path (relminmxid and so on) is
a lot smaller than that other code was, and it's closely based on
relfrozenxid which is a known piece of technology.

 My view would be that with 90 files touched this is a very large
 patch, so that alone makes me wonder whether we should commit this
 patch, so I agree with Noah and compliment him on an excellent
 detailed review.

I note, however, that the bulk of the patch is in three files --
multixact.c, tqual.c, heapam.c, as is clearly illustrated in the diff
stats I posted.  The rest of them are touched mostly to follow their new
APIs (and of course to add tests and docs).

To summarize, of 94 files touched in total:
* 22 files are in src/test/isolation/
  (new and updated tests and expected files)
* 19 files are in src/include/
* 10 files are in contrib/
* 39 files are in src/backend;
 * in that subdir, there are 3097 insertions and 1006 deletions
 * 3047 (83%) of which are in heapam.c multixact.c tqual.c
 * one is a README

 However, review of such a large patch should not be simply pass or
 fail. We should be looking back at the original problem and ask
 ourselves whether some subset of the patch could solve a useful subset
 of the problem. For me, that seems quite likely and this is very
 definitely an important patch.
 
 Even if we can't solve some part of the problem we can at least commit
 some useful parts of infrastructure to allow later work to happen more
 smoothly and quickly.
 
 So please let's not focus on the 100%, lets focus on 80/20.

Well, we have the patch I originally posted in the 9.1 timeframe.
That's a lot smaller and simpler.  However, that solves only part of the
blocking problem, and in particular it doesn't fix the initial deadlock
reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
you wonder about his change of email address) that started this effort
in the first place.  I don't think we can cut down to that and still
satisfy the users that requested this; and Glue was just the first one,
because after I started blogging about this, some more people started
asking for it.

I don't think there's any useful middlepoint between that one and the
current one, but maybe I'm wrong.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue feb 23 12:12:13 -0300 2012:
 On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

  Sure.  The problem is that we are allowing updated rows to be locked (and
  locked rows to be updated).  This means that we need to store extended
  Xmax information in tuples that goes beyond mere locks, which is what we
  were doing previously -- they may now have locks and updates simultaneously.

 OK, thanks.
 
 So why do we need pg_upgrade support?

Two reasons.  One is that in upgrades from a version that contains this
patch to another version that also contains this patch (i.e. future
upgrades), we need to copy the multixact files from the old cluster to
the new.

The other is that in upgrades from a version that doesn't contain this
patch to a version that does, we need to set the multixact limit values
so that values that were used in the old cluster are returned as empty
values (keeping the old semantics); otherwise they would cause errors
trying to read the member Xids from disk.

 If pg_multixact is not persistent now, surely there is no requirement
 to have pg_upgrade do any form of upgrade? The only time we'll need to
 do this is from 9.2 to 9.3, which can of course occur any time in next
 year. That doesn't sound like a reason to block a patch now, because
 of something that will be needed a year from now.

I think there's a policy that we must allow upgrades from one beta to
the next, which is why Noah says this is a blocker starting from beta2.

The pg_upgrade code for this is rather simple however.  There's no
rocket science there.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Sure.  The problem is that we are allowing updated rows to be locked (and
  locked rows to be updated).  This means that we need to store extended
  Xmax information in tuples that goes beyond mere locks, which is what we
  were doing previously -- they may now have locks and updates simultaneously.
 
  (In the previous code, a multixact never meant an update, it always
  signified only shared locks.  After a crash, all backends that could
  have been holding locks must necessarily be gone, so the multixact info
  is not interesting and can be treated like the tuple is simply live.)
 
 Ugh.  I had not been paying attention to what you were doing in this
 patch, and now that I read this I wish I had objected earlier.

Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.

 This
 seems like a horrid mess that's going to be unsustainable both from a
 complexity and a performance standpoint.  The only reason multixacts
 were tolerable at all was that they had only one semantics.  Changing
 it so that maybe a multixact represents an actual updater and maybe
 it doesn't is not sane.

As far as complexity, yeah, it's a lot more complex now -- no question
about that.

Regarding performance, the good thing about this patch is that if you
have an operation that used to block, it might now not block.  So maybe
multixact-related operation is a bit slower than before, but if it
allows you to continue operating rather than sit waiting until some
other transaction releases you, it's much better.

As for sanity -- I regard multixacts as a way to store extended Xmax
information.  The original idea was obviously much more limited in that
the extended info was just locking info.  We've extended it but I don't
think it's such a stretch.

I have been posting about (most? all of?) the ideas that I've been
following to make this work at all, so that people had plenty of chances
to disagree with them -- and Noah and others did disagree with many of
them, so I changed the patch accordingly.  I'm not closed to further
rework, but I'm not going to entirely abandon the idea too lightly.

I'm sure there are bugs too, but hopefully there are as shallow as
interested reviewer eyeballs there are.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 As for sanity -- I regard multixacts as a way to store extended
 Xmax information.  The original idea was obviously much more
 limited in that the extended info was just locking info.  We've
 extended it but I don't think it's such a stretch.
 
Since the limitation on what can be stored in xmax was the killer
for Florian's attempt to support SELECT FOR UPDATE in a form which
was arguably more useful (and certainly more convenient for those
converting from other database products), I'm wondering whether
anyone has determined whether this new scheme would allow Florian's
work to be successfully completed.  The issues seem very similar. 
If this approach also provides a basis for the other work, I think
it helps bolster the argument that this is a good design; if not, I
think it suggests that maybe it should be made more general or
extensible in some way.  Once this has to be supported by pg_upgrade
it will be harder to change the format, if that is needed for some
other feature.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of jue feb 23 13:31:36 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com wrote:
  
  As for sanity -- I regard multixacts as a way to store extended
  Xmax information.  The original idea was obviously much more
  limited in that the extended info was just locking info.  We've
  extended it but I don't think it's such a stretch.
  
 Since the limitation on what can be stored in xmax was the killer
 for Florian's attempt to support SELECT FOR UPDATE in a form which
 was arguably more useful (and certainly more convenient for those
 converting from other database products), I'm wondering whether
 anyone has determined whether this new scheme would allow Florian's
 work to be successfully completed.  The issues seem very similar. 
 If this approach also provides a basis for the other work, I think
 it helps bolster the argument that this is a good design; if not, I
 think it suggests that maybe it should be made more general or
 extensible in some way.  Once this has to be supported by pg_upgrade
 it will be harder to change the format, if that is needed for some
 other feature.

I have no idea what improvements Florian was seeking, but multixacts now
have plenty of bit flag space to indicate whatever we want for each
member transaction, so most likely the answer is yes.  However we need
to make clear that a single SELECT FOR UPDATE in a tuple does not
currently use a multixact; if we wish to always store flags then we are
forced to use one which incurs a performance hit.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith

On 02/23/2012 10:43 AM, Alvaro Herrera wrote:

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to needs review, so that other reviewers can pick it up.


This feature made Robert's list of serious CF concerns, too, and the 
idea that majorly revised patches might be punted isn't a new one.  Noah 
is certainly justified in saying you're off his community support list, 
after all the review work he's been doing for this CF.


We here think it would be a shame for all of these other performance 
bits to be sorted but still have this one loose though, if it's possible 
to keep going on it.  It's well known as something on Simon's peeve list 
for some time now.  I was just reading someone else ranting about how 
this foreign key locking issue proves Postgres isn't enterprise scale 
yesterday, it was part of an article proving why DB2 is worth paying for 
I think.  This change crosses over into the advocacy area due to that, 
albeit only for the people who have been burned by this already.


If the main problem is pg_upgrade complexity, eventually progress on 
that front needs to be made.  I'm surprised the project has survived 
this long without needing anything beyond catalog conversion for 
in-place upgrade.  That luck won't hold forever.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Greg Smith's message of jue feb 23 14:48:13 -0300 2012:
 On 02/23/2012 10:43 AM, Alvaro Herrera wrote:
  I completely understand that you don't want to review this latest
  version of the patch; it's a lot of effort and I wouldn't inflict it on
  anybody who hasn't not volunteered.  However, it doesn't seem to me that
  this is reason to boot the patch from the commitfest.  I think the thing
  to do would be to remove yourself from the reviewers column and set it
  back to needs review, so that other reviewers can pick it up.
 
 This feature made Robert's list of serious CF concerns, too, and the 
 idea that majorly revised patches might be punted isn't a new one.

Well, this patch (or rather, a previous incarnation of it) got punted
from 9.1's fourth commitfest; I intended to have the new version in
9.2's first CF, but business reasons (which I will not discuss in
public) forced me otherwise.  So here we are again -- as I said to Tom,
I don't intend to let go of this one easily, though of course I will
concede to whatever the community decides.

 Noah 
 is certainly justified in saying you're off his community support list, 
 after all the review work he's been doing for this CF.

Yeah, I can't blame him.  I've been trying to focus most of my review
availability on his own patches precisely due to that, but it's very
clear to me that his effort is larger than mine.

 We here think it would be a shame for all of these other performance 
 bits to be sorted but still have this one loose though, if it's possible 
 to keep going on it.  It's well known as something on Simon's peeve list 
 for some time now.  I was just reading someone else ranting about how 
 this foreign key locking issue proves Postgres isn't enterprise scale 
 yesterday, it was part of an article proving why DB2 is worth paying for 
 I think.  This change crosses over into the advocacy area due to that, 
 albeit only for the people who have been burned by this already.

Yeah, Simon's been on this particular issue for quite some time -- which
is probably why the initial idea that kickstarted this patch was his.
Personally I've been in the not enterprise strength camp for a long
time, mostly unintentionally; you can see that by tracing how my major
patches close holes in that kind of area (cluster loses indexes, we
don't have subtransactions, foreign key concurrency sucks (-- SELECT
FOR SHARE), manual vacuum is teh sux0r, and now this one about FKs
again).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith

On 02/23/2012 01:04 PM, Alvaro Herrera wrote:

manual vacuum is teh sux0r


I think you've just named my next conference talk submission.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 4:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

As far as complexity goes, would it be easier if we treated the UPDATE
of a primary key column as a DELETE plus an INSERT?

There's not really a logical reason why updating a primary key has
meaning, so allowing an ExecPlanQual to follow the chain across
primary key values doesn't seem valid to me.

That would make all primary keys IMMUTABLE to updates.

No primary key, no problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Kevin Grittner's message:
 
 Since the limitation on what can be stored in xmax was the killer
 for Florian's attempt to support SELECT FOR UPDATE in a form
 which was arguably more useful (and certainly more convenient for
 those converting from other database products), I'm wondering
 whether anyone has determined whether this new scheme would allow
 Florian's work to be successfully completed.  The issues seem
 very similar. If this approach also provides a basis for the
 other work, I think it helps bolster the argument that this is a
 good design; if not, I think it suggests that maybe it should be
 made more general or extensible in some way.  Once this has to be
 supported by pg_upgrade it will be harder to change the format,
 if that is needed for some other feature.
 
 I have no idea what improvements Florian was seeking, but
 multixacts now have plenty of bit flag space to indicate whatever
 we want for each member transaction, so most likely the answer is
 yes.  However we need to make clear that a single SELECT FOR
 UPDATE in a tuple does not currently use a multixact; if we wish
 to always store flags then we are forced to use one which incurs a
 performance hit.
 
Well, his effort really started to go into a tailspin on the related
issues here:
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01743.php
 
... with a summary of the problem and possible directions for a
solution here:
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01833.php
 
One of the problems that Florian was trying to address is that
people often have a need to enforce something with a lot of
similarity to a foreign key, but with more subtle logic than
declarative foreign keys support.  One example would be the case
Robert has used in some presentations, where the manager column in
each row in a project table must contain the id of a row in a person
table *which has the project_manager boolean column set to TRUE*. 
Short of using the new serializable transaction isolation level in
all related transactions, hand-coding enforcement of this useful
invariant through trigger code (or application code enforced through
some framework) is very tricky.  The change to SELECT FOR UPDATE
that Florian was working on would make it pretty straightforward.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] overriding current_timestamp

2012-02-23 Thread David E. Wheeler
On Feb 23, 2012, at 3:08 AM, Peter Eisentraut wrote:

 For (unit) testing, I have often had the need to override the current
 timestamp in the database system.  For example, a column default,
 function, or views would make use of the current timestamp in some way,
 and to test the behavior, it's sometimes useful to tweak the current
 timestamp.
 
 What might be a good way to do that?
 
 Just overwrite xactStartTimestamp?  Is that safe?  If it weren't static,
 a user-loaded function could do it.
 
 Overwrite pg_catalog.now() in the test database?
 
 Other ideas?
 
 Some semi-official support for this sort of thing would be good.

I create a mock schema, add the function to it, and then put it in the 
search_path ahead of pg_catalog. See the example starting at slide 48 on 
http://www.slideshare.net/justatheory/pgtap-best-practices.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] overriding current_timestamp

2012-02-23 Thread David E. Wheeler
On Feb 23, 2012, at 10:54 AM, David E. Wheeler wrote:

 I create a mock schema, add the function to it, and then put it in the 
 search_path ahead of pg_catalog. See the example starting at slide 48 on 
 http://www.slideshare.net/justatheory/pgtap-best-practices.

Sorry, starting at slide 480.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-23 Thread Daniele Varrazzo
On Mon, Feb 20, 2012 at 12:09 AM, Jan Urbański wulc...@wulczer.org wrote:

 BTW, that tool is quite handy, I'll have to try running it over psycopg2.

Indeed. I'm having a play with it. It is reporting several issues to
clean up (mostly on failure at module import). It's also tracebacking
here and there: I'll send the author some feedback/patches.

I'm patching psycopg in the gcc-python-plugin branch in my dev repos
(https://github.com/dvarrazzo/psycopg).

-- Daniele

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-23 Thread Greg Smith

On 02/23/2012 07:36 AM, Simon Riggs wrote:

Since scan_whole_pool_milliseconds is set to 2 minutes we scan the
whole bufferpool every 2 minutes, no matter how big the bufferpool,
even when nothing else is happening. Not cool.


It's not quite that bad.  Once the BGW has circled around the whole 
buffer pool, such that it's swept so far ahead it's reached the clock 
sweep strategy point, it stops.  So when the system is idle, it creeps 
forward until it's scanned the pool once.  Then, it still wakes up 
regularly, but the computation of the bufs_to_lap lap number will reach 
0.  That aborts running the main buffer scanning loop, so it only burns 
a bit of CPU time and a lock on BufFreelistLock each time it wakes--both 
of which are surely to spare if the system is idle.  I can agree with 
your power management argument, I don't see much of a performance win 
from eliminating this bit.


The goals was to end up with a fully cleaned pool ready to absorb going 
from idle to a traffic spike.  The logic behind where the magic 
constants controlling it came from was all laid out at 
http://archives.postgresql.org/pgsql-hackers/2007-09/msg00214.php 
There's a bunch of code around that whole computation that only executes 
if you enable BGW_DEBUG.  I left that in there in case somebody wanted 
to fiddle with this specific tuning work again, since it took so long to 
get right.  That was the last feature change made to the 8.3 background 
writer tuning work.


I was content at that time to cut the minimal activity level in half 
relative to what it was in 8.2, and that measured well enough.  It's 
hard to find compelling benchmark workloads where the background writer 
really works well though.  I hope to look at this set of interesting 
cases I found here more, now that I seem to have both positive and 
negative results for background writer involvement.


As for free list contention, I wouldn't expect that to be a major issue 
in the cases I was testing.  The background writer is just one of many 
backends all contending for that.  When there's dozens of backends all 
grabbing, I'd think that its individual impact would be a small slice of 
the total activity.  I will of course reserve arguing that point until 
I've benchmarked to support it though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] swapcache-style cache?

2012-02-23 Thread Greg Smith

On 02/22/2012 05:31 PM, james wrote:

Has anyone considered managing a system like the DragonFLY swapcache for
a DBMS like PostgreSQL?

ie where the admin can assign drives with good random read behaviour
(but perhaps also-ran random write) such as SSDs to provide a cache for
blocks that were dirtied, with async write that hopefully writes them
out before they are forcibly discarded.


We know that battery-backed write caches are extremely effective for 
PostgreSQL writes.  I see most of these tiered storage ideas as acting 
like a big one of those, which seems to hold in things like SAN storage 
that have adopted this sort of technique already.  A SSD is quite large 
relative to a typical BBWC.


There are a few reasons that doesn't always give the win hoped for though:

-Database writes have write durability requirements that require safe 
storage more often than most other applications.  One of the reasons the 
swapcache helps is that it aims to bundle writes into 64K chunks, very 
SSD friendly.  The database may force them more often than that.  The 
fact that all the Dragonfly documentation uses Intel drives for its 
examples that don't write reliably doesn't make me too optimistic about 
that being a priority of the design.  The SSDs that have safe, 
battery-backed write buffers =64KB make that win go away.


-Ultimately all this data needs to make it out to real disk.  The funny 
thing about caches is that no matter how big they are, you can easily 
fill them up if doing something faster than the underlying storage can 
handle.


-If you have something like a BBWC in front of traditional storage, as 
well as a few gigabytes of operating system write buffering, that really 
helps traditional storage a lot already.  Those two things do so much 
write reordering that some of the random seek gain gap between spinning 
disk and SSD shrinks.  And sequential throughput is usually not sped up 
very much by SSD, except at the high end (using lots of banks).


One reaction to all this is to point out that it's sometimes easier to 
add a SSD to a system than a BBWC.  That is true.  The thing that 
benefits most from this are the WAL writes though, and since they're 
both sequential and very high volume they're really smacking into the 
worst case scenario for SSD vs. spinning disk too.



I'd been thinking that swapcache would help where the working set won't
fit in RAM, also L2ARC on Solaris - but it seems to me that there is no
reason not to allow the DBMS to manage the set-aside area itself where
it is given either access to the raw device or to a pre-sized file on
the device it can map in segments.


Well, you could argue that if we knew what to do with it, we'd have 
already built that logic into a superior usage of shared_buffers. 
Instead we punt a lot of this work toward the kernel, often usefully. 
Write cache reordering and read-ahead are the two biggest things storage 
does that we'd have to reinvent inside PostgreSQL if more direct disk 
I/O was attempted.


I don't think the idea of a swapcache is without merit; there's surely 
some applications that will benefit from it.  It's got a lot of 
potential as a way to absorb short-term bursts of write activity.  And 
there are some applications that could benefit from having a second tier 
of read cache, not as fast as RAM but larger and faster than real disk 
seeks.  In all of those potential win cases, though, I don't see why the 
OS couldn't just manage the whole thing for us.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch
On Thu, Feb 23, 2012 at 02:08:28PM +0100, Jeroen Vermeulen wrote:
 On 2012-02-23 10:18, Simon Riggs wrote:

 However, review of such a large patch should not be simply pass or
 fail. We should be looking back at the original problem and ask
 ourselves whether some subset of the patch could solve a useful subset
 of the problem. For me, that seems quite likely and this is very
 definitely an important patch.

 Even if we can't solve some part of the problem we can at least commit
 some useful parts of infrastructure to allow later work to happen more
 smoothly and quickly.

 So please let's not focus on the 100%, lets focus on 80/20.

 The suggested immutable-column constraint was meant as a potential  
 80/20 workaround.  Definitely not a full solution, helpful to some,  
 probably easier to do.  I don't know if an immutable key would actually  
 be enough to elide foreign-key locks though.

That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.

If you had both an immutable column constraint and a never-deleted table
constraint, that combination would be sufficient to simplify the picture.
(Directly or indirectly, it would not actually be a never-deleted constraint
so much as a you must take AccessExclusiveLock to DELETE constraint.)
Foreign-side DML would then take an AccessShareLock on the parent table with
no tuple lock at all.

By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Noah Misch's message of mié feb 22 14:00:07 -0300 2012:
 
 On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:

 On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
  On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
   * Columns that are part of the key
 Noah thinks the set of columns should only consider those actually 
   referenced
 by keys, not those that *could* be referenced.
  
  Well, do you disagree?  To me it's low-hanging fruit, because it isolates 
  the
  UPDATE-time overhead of this patch to FK-referenced tables rather than all
  tables having a PK or PK-like index (often just all tables).
 
 You have not answered my question above.

Sorry.  The reason I didn't research this is that at the very start of
the discussion it was said that having heapam.c figure out whether
columns are being used as FK destinations or not would be more of a
modularity violation than indexed columns already are for HOT support
(this was a contentious issue for HOT, so I don't take it lightly.  I
don't think I need any more reasons for Tom to object to this patch, or
more bulk into it.  Both are already serious issues.)

In any case, with the way we've defined FOR KEY SHARE locks (in the docs
it's explicitely said that the set of columns considered could vary in
the future), it's a relatively easy patch to add on top of what I've
submitted.  Just as the ALTER TABLE bits to add columns to the set of
columns considered, it could be left for a second pass on the issue.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 8:44 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 02/23/2012 07:36 AM, Simon Riggs wrote:

 Since scan_whole_pool_milliseconds is set to 2 minutes we scan the
 whole bufferpool every 2 minutes, no matter how big the bufferpool,
 even when nothing else is happening. Not cool.


 It's not quite that bad.  Once the BGW has circled around the whole buffer
 pool, such that it's swept so far ahead it's reached the clock sweep
 strategy point, it stops.  So when the system is idle, it creeps forward
 until it's scanned the pool once.  Then, it still wakes up regularly, but
 the computation of the bufs_to_lap lap number will reach 0.  That aborts
 running the main buffer scanning loop, so it only burns a bit of CPU time
 and a lock on BufFreelistLock each time it wakes--both of which are surely
 to spare if the system is idle.  I can agree with your power management
 argument, I don't see much of a performance win from eliminating this bit.

The behaviour is wrong though, because we're scanning for too long
when the system goes quiet and then we wake up again too quickly - as
soon as a new buffer allocation happens.

We don't need to clean the complete bufferpool in 2 minutes. That's
exactly the thing checkpoint does and we slowed that down so it didn't
do that. So we're still writing way too much.

So the proposal was to make it scan only 10% of the bufferpool, not
100%, then sleep. We only need some clean buffers, we don't need *all*
buffers clean, especially on very large shared_buffers. And we should
wake up only when we see an effect on user backends, i.e. a dirty
write - which is the event the bgwriter is designed to avoid.

The last bit is the key - waking up only when a dirty write occurs. If
they aren't happening we literally don't need the bgwriter - as your
tests show.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Coverity Scan

2012-02-23 Thread Josh Berkus
All,

If you are one of the developers who has used the Coverity Scan tool
(static analysis) on PostgreSQL in the past, or if you would like to use
it in the future, they've updated Scan and are looking to get people set
up with access.

Please contact me offlist.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-23 Thread Peter Eisentraut
On fre, 2012-02-17 at 10:19 -0500, Tom Lane wrote:
  What if you did this ONCE and wrote the results to a file someplace?
 
 That's still a cache, you've just defaulted on your obligation to think
 about what conditions require the cache to be flushed.  (In the case at
 hand, the trigger for a cache rebuild would probably need to be a glibc
 package update, which we have no way of knowing about.) 

We basically hardwire locale behavior at initdb time, so computing this
then and storing it somewhere for eternity would be consistent.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql \i tab completion initialization problem on HEAD

2012-02-23 Thread Peter van Hardenberg
While testing Noah's filename quoting patch on my local development
machine, I noticed some strange behaviour around tab completion with
\i; it doesn't appear to be present in 9.1, but it is present on 9.2
HEAD and appears to be present with and without readline.

It manifests as the client preferring statement completion over
filename completion until the first time \i is forced to check
something on disk, after which it begins to work as expected.

Here's a reliable reproduction on my OS X laptop.

- % bin/psql
psql (9.2devel, server 9.0.4)
WARNING: psql version 9.2, server version 9.0.
 Some psql features might not work.
Type help for help.

pvh=# \i TABTAB
ABORT   ALTER   ANALYZE BEGIN
CHECKPOINT  CLOSE   CLUSTER COMMENT COMMIT
 COPYCREATE  DEALLOCATE
DECLARE DELETE FROM DISCARD DO  DROP
 END EXECUTE EXPLAIN FETCH
  GRANT   INSERT  LISTEN
LOADLOCKMOVENOTIFY
PREPARE REASSIGNREINDEX RELEASE RESET
 REVOKE  ROLLBACKSAVEPOINT
SECURITY LABEL  SELECT  SET SHOWSTART
 TABLE   TRUNCATEUNLISTENUPDATE
  VACUUM  VALUES  WITH
pvh=# \i asdf
asdf: No such file or directory
pvh=# \i ./TABTAB
./bin  ./include  ./lib  ./oh hai   ./share
pvh=# \i

I don't see this regression with the 9.1 client I have here, so I
suspect it has something to do with whatever patch introduced the
relative paths by default.

-p

-- 
Peter van Hardenberg
San Francisco, California
Everything was beautiful, and nothing hurt. -- Kurt Vonnegut

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch
On Thu, Feb 23, 2012 at 06:36:42PM -0300, Alvaro Herrera wrote:
 
 Excerpts from Noah Misch's message of mi?? feb 22 14:00:07 -0300 2012:
  
  On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:
 
  On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
   On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
* Columns that are part of the key
  Noah thinks the set of columns should only consider those actually 
referenced
  by keys, not those that *could* be referenced.
   
   Well, do you disagree?  To me it's low-hanging fruit, because it isolates 
   the
   UPDATE-time overhead of this patch to FK-referenced tables rather than all
   tables having a PK or PK-like index (often just all tables).
  
  You have not answered my question above.
 
 Sorry.  The reason I didn't research this is that at the very start of
 the discussion it was said that having heapam.c figure out whether
 columns are being used as FK destinations or not would be more of a
 modularity violation than indexed columns already are for HOT support
 (this was a contentious issue for HOT, so I don't take it lightly.  I
 don't think I need any more reasons for Tom to object to this patch, or
 more bulk into it.  Both are already serious issues.)

That's fair.

 In any case, with the way we've defined FOR KEY SHARE locks (in the docs
 it's explicitely said that the set of columns considered could vary in
 the future), it's a relatively easy patch to add on top of what I've
 submitted.  Just as the ALTER TABLE bits to add columns to the set of
 columns considered, it could be left for a second pass on the issue.

Agreed.  Let's have that debate another day, as a follow-on patch.

Thanks for shedding this light.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-23 Thread Robert Haas
On Thu, Feb 23, 2012 at 3:44 PM, Greg Smith g...@2ndquadrant.com wrote:
 It's not quite that bad.  Once the BGW has circled around the whole buffer
 pool, such that it's swept so far ahead it's reached the clock sweep
 strategy point, it stops.  So when the system is idle, it creeps forward
 until it's scanned the pool once.  Then, it still wakes up regularly, but
 the computation of the bufs_to_lap lap number will reach 0.  That aborts
 running the main buffer scanning loop, so it only burns a bit of CPU time
 and a lock on BufFreelistLock each time it wakes--both of which are surely
 to spare if the system is idle.  I can agree with your power management
 argument, I don't see much of a performance win from eliminating this bit

I think that goal of ending up with a clean buffer pool is a good one,
and I'm loathe to give it up.  On the other hand, I agree with Simon
that it does seem a bit wasteful to scan the entire buffer arena
because there's one dirty buffer somewhere.  But maybe we should look
at that as a reason to improve the way we find dirty buffers, rather
than a reason not to worry about writing them out.  There's got to be
a better way than scanning the whole buffer pool.  Honestly, though,
that feels like 9.3 material.  So far there's no evidence that we've
introduced any regressions that can't be compensated for by tuning,
and this doesn't feel like the right time to embark on a bunch of new
engineering projects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-23 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2012-02-17 at 10:19 -0500, Tom Lane wrote:
 That's still a cache, you've just defaulted on your obligation to think
 about what conditions require the cache to be flushed.  (In the case at
 hand, the trigger for a cache rebuild would probably need to be a glibc
 package update, which we have no way of knowing about.) 

 We basically hardwire locale behavior at initdb time, so computing this
 then and storing it somewhere for eternity would be consistent.

Well, only if we could cache every locale-related libc inquiry we ever
make.  Locking down only part of the behavior does not sound like a
plan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \i tab completion initialization problem on HEAD

2012-02-23 Thread Tom Lane
Peter van Hardenberg p...@pvh.ca writes:
 While testing Noah's filename quoting patch on my local development
 machine, I noticed some strange behaviour around tab completion with
 \i; it doesn't appear to be present in 9.1, but it is present on 9.2
 HEAD and appears to be present with and without readline.

 It manifests as the client preferring statement completion over
 filename completion until the first time \i is forced to check
 something on disk, after which it begins to work as expected.

Hm, I don't see that here.  I get filename completion immediately.

 Here's a reliable reproduction on my OS X laptop.

OS X?  Are you using GNU readline, or Apple's libedit?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reviewing patch URI connection string support for libpq

2012-02-23 Thread Harold Giménez
Hello hackers,

I've been a reader of this list for some time, but have never posted.

I have interest in the URI connection string support patch[1], so I'm in
the process of reviewing it. I have a couple of comments and questions:

1. I see no tests in the patch. I'd like to start getting together a set of
tests, likely based on the connection string permutations found on Greg
Smith's response[2]. However I don't find an obvious place to put them.
They could possibly live in the test/examples directory. Another thought is
to use dblink in a test, although it may be problematic to depend on a
contrib package for a test, to say the least. Any thoughts on how to test
this are most welcome.
2. The documentation/manual was not updated as part of this patch, so this
is pending.
3. I for one do prefer the `postgres` prefix, as opposed to `postgresql`
for the reasons stated on an earlier thread [3]. In my opinion the best way
to move forward is to support them both.

The good news is the patch still applies fine on the 9.2 HEAD, and seems to
work well.

Regards,

-Harold

[1] https://commitfest.postgresql.org/action/patch_view?id=720
[2]
http://archives.postgresql.org/message-id/4f45253c.4030...@2ndquadrant.com
[3] http://archives.postgresql.org/pgsql-hackers/2011-12/msg00499.php


[HACKERS] row_to_json() Bug

2012-02-23 Thread David E. Wheeler
Looks like row_to_json() thinks 0s are nulls:

postgres=# select row(0);
 row 
-
 (0)
(1 row)

postgres=# SELECT row_to_json(row(0));
 row_to_json 
-
 {f1:null}
(1 row)

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] row_to_json() Bug

2012-02-23 Thread Andrew Dunstan



On 02/23/2012 08:35 PM, David E. Wheeler wrote:

Looks like row_to_json() thinks 0s are nulls:

 postgres=# select row(0);
  row
 -
  (0)
 (1 row)

 postgres=# SELECT row_to_json(row(0));
  row_to_json
 -
  {f1:null}
 (1 row)




Yeah, ouch, will fix.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Runtime SHAREDIR for testing CREATE EXTENSION

2012-02-23 Thread Daniel Farina
On Tue, Feb 21, 2012 at 1:34 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Sandro Santilli s...@keybit.net writes:
 Please see the inline extension thread where answers to your problem
 have been discussed.

I'm pretty sure Sandro is hacking PostGIS, so inline extensions are of
no help here.

Can you tell us why alternative approaches that don't require adding a
knob (such as copying/symlinking of compiled artifacts) is such a huge
pain for you?

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-02-23 Thread Joachim Wieland
On Thu, Feb 16, 2012 at 1:29 PM, Robert Haas robertmh...@gmail.com wrote:
 Can you provide an updated patch?

Robert, updated patch is attached.


parallel_pg_dump_2.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] row_to_json() Bug

2012-02-23 Thread Andrew Dunstan



On 02/23/2012 09:09 PM, Andrew Dunstan wrote:



On 02/23/2012 08:35 PM, David E. Wheeler wrote:

Looks like row_to_json() thinks 0s are nulls:

 postgres=# select row(0);
  row
 -
  (0)
 (1 row)

 postgres=# SELECT row_to_json(row(0));
  row_to_json
 -
  {f1:null}
 (1 row)




Yeah, ouch, will fix.




Fixed, Thanks for the report. (Also fixed in my 9.1 backport).

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch
On Thu, Feb 23, 2012 at 12:43:11PM -0300, Alvaro Herrera wrote:
 Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
  On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:
  
   All in all, I think this is in pretty much final shape. ??Only pg_upgrade
   bits are still missing. ??If sharp eyes could give this a critical look
   and knuckle-cracking testers could give it a spin, that would be
   helpful.
  
   Lack of pg_upgrade support leaves this version incomplete, because that
   omission would constitute a blocker for beta 2. ??This version changes as 
   much
   code compared to the version I reviewed at the beginning of the 
   CommitFest as
   that version changed overall. ??In that light, it's time to close the 
   books on
   this patch for the purpose of this CommitFest; I'm marking it Returned 
   with
   Feedback. ??Thanks for your efforts thus far.
 
 Now this is an interesting turn of events.  I must thank you for your
 extensive review effort in the current version of the patch, and also
 thank you and credit you for the idea that initially kicked this patch
 from the older, smaller, simpler version I wrote during the 9.1 timeline
 (which you also reviewed exhaustively).  Without your and Simon's
 brilliant ideas, this patch wouldn't exist at all.
 
 I completely understand that you don't want to review this latest
 version of the patch; it's a lot of effort and I wouldn't inflict it on
 anybody who hasn't not volunteered.  However, it doesn't seem to me that
 this is reason to boot the patch from the commitfest.  I think the thing
 to do would be to remove yourself from the reviewers column and set it
 back to needs review, so that other reviewers can pick it up.

It would indeed be wrong to change any patch from Needs Review to Returned
with Feedback on account of a personal distaste for reviewing the patch.  I
hope I did not harbor such a motive here.  Rather, this CommitFest has given
your patch its fair shake, and I and other reviewers would better serve the
CF's needs by reviewing, say, ECPG FETCH readahead instead of your latest
submission.  Likewise, you would better serve the CF by evaluating one of the
four non-committer patches that have been Ready for Committer since January.
That's not to imply that the goals of the CF align with my goals, your goals,
or broader PGDG goals.  The patch status on commitfest.postgresql.org does
exist solely for the advancement of the CF, and I have set it in accordingly.

 As for the late code churn, it mostly happened as a result of your
 own feedback; I would have left most of it in the original state, but as
 I went ahead it seemed much better to refactor things.  This is mostly
 in heapam.c.  As for multixact.c, it also had a lot of churn, but that
 was mostly to restore it to the state it has in the master branch,
 dropping much of the code I had written to handle multixact truncation.
 The new code there and in the vacuum code path (relminmxid and so on) is
 a lot smaller than that other code was, and it's closely based on
 relfrozenxid which is a known piece of technology.

I appreciate that.

  However, review of such a large patch should not be simply pass or
  fail. We should be looking back at the original problem and ask
  ourselves whether some subset of the patch could solve a useful subset
  of the problem. For me, that seems quite likely and this is very
  definitely an important patch.

Incidentally, I find it harmful to think of Returned with Feedback as
fail.  For large patches, it's healthier to think of a CF as a bimonthly
project status meeting with stakeholders.  When the project is done,
wonderful!  When there's work left, that's no great surprise.

  Even if we can't solve some part of the problem we can at least commit
  some useful parts of infrastructure to allow later work to happen more
  smoothly and quickly.
  
  So please let's not focus on the 100%, lets focus on 80/20.
 
 Well, we have the patch I originally posted in the 9.1 timeframe.
 That's a lot smaller and simpler.  However, that solves only part of the
 blocking problem, and in particular it doesn't fix the initial deadlock
 reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
 you wonder about his change of email address) that started this effort
 in the first place.  I don't think we can cut down to that and still
 satisfy the users that requested this; and Glue was just the first one,
 because after I started blogging about this, some more people started
 asking for it.
 
 I don't think there's any useful middlepoint between that one and the
 current one, but maybe I'm wrong.

Nothing additional comes to my mind, either.  This patch is monolithic.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] row_to_json() Bug

2012-02-23 Thread David E. Wheeler
On Feb 23, 2012, at 8:49 PM, Andrew Dunstan wrote:

 Fixed, Thanks for the report. (Also fixed in my 9.1 backport).

Awesome, thanks, will try it tomorrow.

David



smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] ISO8601 nitpicking

2012-02-23 Thread Daniel Farina
As it turns out, evidence would suggests that the ISO output in
Postgres isn't, unless there's an ISO standard for date and time that
is referring to other than 8601.  It does not permit use of a space
between the date and the time, as seen in:

SELECT now();
  now
---
 2012-02-23 23:31:59.580915-08
(1 row)

Thanks to Vasily Chekalkin for digging that up.  He was annoyed at the
time, so someone actually cares now and again, sample size one.

It is true that many common adaptations of ISO8601 do allow the space,
and many pages on the web that abstract the standard include that
variant, but it is not the letter of the standard, as far as I can
tell. It is an acceptable letter of the standard in RFC3339.  Unless
one actually digs down in the ISO document -- this is the third
edition(!) -- one may be misinformed.  Or perhaps it's buried in the
standard's PDF.

Here's a link to the standard:

http://dotat.at/tmp/ISO_8601-2004_E.pdf

I'm not sure if there's anything to be done here other than a mention
of errata in the manual.  Alternatively, datestyle = 'sql' and
datestyle = 'iso' may be reasonably different, after all.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers