Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-26 Thread Magnus Hagander
On Fri, Mar 23, 2012 at 18:05, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Mar 24, 2012 at 1:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 23, 2012 at 12:42 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 Might it be a good idea to put it on it's own row instead of changing
 the format of an existing row, in order not to break scripts and
 programs that are parsing the previous output?

 Good idea! What row name should we use for the WAL file containing
 REDO record? Latest checkpoint's REDO file?

 Sounds good to me.  I like the idea, too.  The status quo is an
 unnecessary nuisance, so this will be a nice usability improvement.

 Attached patch adds new row Latest checkpoint's REDO WAL segment: into
 the result of pg_controldata. I used the term WAL segment for the row name
 instead of file because WAL segment is used in another row Bytes per 
 WAL
 segment:. But better name?

 s/segment/file/g?

 Yep, file might be more intuitive for a user than segment. Attached is the
 file version of the patch.

We're already using file to mean something different *internally*,
don't we? And since pg_controldata shows fairly internal information,
I'm not sure this is the best idea.

Maybe compromise and call it segment file - that is both easier to
understand than segment, and not actually using a term that means
something else...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend for same-role

2012-03-26 Thread Jeff Davis
On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote:
 On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote:
  Parallel to pg_cancel_backend, it'd be nice to allow the user to just
  outright kill a backend that they own (politely, with a SIGTERM),
  aborting any transactions in progress, including the idle transaction,
  and closing the socket.
 
  +1
 
 Here's a patch implementing the simple version, with no more guards
 against signal racing than have been seen previously.  The more
 elaborate variants to close those races is being discussed in a
 parallel thread, but I thought I'd get this simple version out there.

Review:

After reading through the threads, it looks like there was no real
objection to this approach -- pid recycling is not something we're
concerned about.

I think you're missing a doc update though, in func.sgml:

For the less restrictive functionpg_cancel_backend/, the role of an
active backend can be found from
the structfieldusename/structfield column of the
structnamepg_stat_activity/structname view.

Also, high-availability.sgml makes reference to pg_cancel_backend(), and
it might be worthwhile to add an ...and pg_terminate_backend() there.

Other than that, it looks good to me.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Replace empty locale name with implied value in CREATE DATABASE

2012-03-26 Thread Stefan Kaltenbrunner
On 03/26/2012 03:47 AM, Tom Lane wrote:
 Replace empty locale name with implied value in CREATE DATABASE and initdb.
 
 setlocale() accepts locale name  as meaning the locale specified by the
 process's environment variables.  Historically we've accepted that for
 Postgres' locale settings, too.  However, it's fairly unsafe to store an
 empty string in a new database's pg_database.datcollate or datctype fields,
 because then the interpretation could vary across postmaster restarts,
 possibly resulting in index corruption and other unpleasantness.
 
 Instead, we should expand  to whatever it means at the moment of calling
 CREATE DATABASE, which we can do by saving the value returned by
 setlocale().
 
 For consistency, make initdb set up the initial lc_xxx parameter values the
 same way.  initdb was already doing the right thing for empty locale names,
 but it did not replace non-empty names with setlocale results.  On a
 platform where setlocale chooses to canonicalize the spellings of locale
 names, this would result in annoying inconsistency.  (It seems that popular
 implementations of setlocale don't do such canonicalization, which is a
 pity, but the POSIX spec certainly allows it to be done.)  The same risk
 of inconsistency leads me to not venture back-patching this, although it
 could certainly be seen as a longstanding bug.
 
 Per report from Jeff Davis, though this is not his proposed patch.

hmm seems like this commit broken quagga:


http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2012-03-26%2002%3A03%3A04


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-26 Thread Kyotaro HORIGUCHI
Hello, This is new version of patch for dblink using row processor.

 - Use palloc to allocate temporaly memoriy blocks.

 - Memory allocation is now done in once. Preallocate a block of
   initial size and palloc simplified reallocation code.

 - Resurrected the route for command invoking. And small
   adjustment of behavior on error.

 - Modification to fix connection name missing bug is removed out
   to another patch.

 - Commenting on the functions skipped over by lonjmp is
   withholded according to Marko's discussion.

 - rebased to e8476f46fc847060250c92ec9b310559293087fc

dblink_use_rowproc_20120326.patch - dblink row processor patch.
dblink_connname_20120326.patch- dblink connname fix patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..dd73aa5 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char* valbuf;
+	int valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
    int2vector *pkattnums_arg, int32 pknumatts_arg,
    int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -111,6 +127,9 @@ typedef struct remoteConnHashEnt
 /* initial number of connection hashes */
 #define NUMCONN 16
 
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+
 /* general utility */
 #define xpfree(var_) \
 	do { \
@@ -503,6 +522,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +579,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(buf, FETCH %d FROM %s, howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res-result retuned by PQexec below
+	 */
+	initStoreInfo(storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK 
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+	(errcode(ERRCODE_DATATYPE_MISMATCH),
+	 errmsg(remote query result rowtype does not match 
+			the specified FROM clause rowtype)));
+		}
+
 		dblink_res_error(conname, res, could not fetch from cursor, fail);
 		return (Datum) 0;
 	}
@@ -579,8 +635,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_CURSOR_NAME),
  errmsg(cursor \%s\ does not exist, curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +696,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -660,6 +717,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		{
 			/* text,text,bool */
 			DBLINK_GET_CONN;
+			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			fail = PG_GETARG_BOOL(2);
 		}
@@ -715,164 +773,229 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo-setResult = NULL;
 	rsinfo-setDesc = NULL;
 
-	/* synchronous query, or async result 

Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Robert Haas
On Sun, Mar 25, 2012 at 12:15 PM, Andres Freund and...@2ndquadrant.com wrote:
 On Friday, March 16, 2012 10:40:46 AM Dimitri Fontaine wrote:
  This will have the effect of calling triggers outside of alphabetic
  order. I don't think thats a good idea even if one part is ANY and the
  other a specific command.
  I don't think there is any reason anymore to separate the two? The only

  callsite seems to look like:
 The idea is to have a predictable ordering of command triggers. The code
 changed in the patch v16 (you pasted code from git in between v15 and
 v16, I cleaned it up) and is now easier to read:

                 case CMD_TRIGGER_FIRED_BEFORE:
                         whenstr = BEFORE;
                         procs[0] = cmd-before_any;
                         procs[1] = cmd-before;
                         break;

                 case CMD_TRIGGER_FIRED_AFTER:
                         whenstr = AFTER;
                         procs[0] = cmd-after;
                         procs[1] = cmd-after_any;
                         break;

 So it's BEFORE ANY then BEFORE command then AFTER command then AFTER
 ANY. That's an arbitrary I made and we can easily reconsider. Triggers
 are called in alphabetical order in each “slot” here.

 In my mind it makes sense to have ANY triggers around the specific
 triggers, but it's hard to explain why that feels better.
 I still think this would be a mistake. I don't have a hard time imagining
 usecases where a specific trigger should be called before or after an ANY
 trigger because e.g. it wants to return a more specific error or doesn't want
 to check all preconditions already done by the ANY trigger... All that would
 be precluded by enforcing a strict ordering between ANY and specific triggers.
 I don't see a use-case that would benefit from the current behaviour...

Dimitri's proposed behavior would be advantageous if you have an ANY
trigger that wants to take over the world and make sure that nobody
else can run before it.  I think, though, that's not a case we want to
cater to - all of this stuff requires superuser privileges anyway, so
we should assume that the DBA knows what he's doing.  So +1 for making
it strictly alphabetical, as we do with other triggers.  Everything
that can be done under Dimitri's proposal can also be done in that
scheme, but the reverse is not true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-26 Thread Robert Haas
On Mon, Mar 26, 2012 at 2:50 AM, Magnus Hagander mag...@hagander.net wrote:
 s/segment/file/g?

 Yep, file might be more intuitive for a user than segment. Attached is 
 the
 file version of the patch.

 We're already using file to mean something different *internally*,
 don't we? And since pg_controldata shows fairly internal information,
 I'm not sure this is the best idea.

 Maybe compromise and call it segment file - that is both easier to
 understand than segment, and not actually using a term that means
 something else...

It's also kind of wordy.  I think file is fine.  There are a few
references to xlogid indicating a file number, but the actual field
name is just xlogid.  We also use the term file to mean the other
thing, as in XLOGfileslop, and I have a hard time believing anyone's
really going to get confused about what is meant here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-03-26 Thread Gabriele Bartolini

Hello Tom,


 I started to look at this patch a bit.  I'm quite confused by the fact
 that some, but not all, of the possible FK action types now come in an
 EACH variant.  This makes no sense at all to me.  ISTM that EACH is a
 property of the FK constraint as a whole, that is that it says the
 constraint is from array elements on the referencing side to column
 values on the referenced side, rather than the normal case of column
 values to column values.


The specification that Gianni posted applies only to v5 of the patch.
The original idea was indeed to have the whole foreign key to be defined 
with an EACH property (initially we were actually thinking of the ARRAY 
keyword following your advice, then for grammar reasons we opted for EACH).
However, during the actual development we faced some difficulties with 
multi-column foreign keys.
Through discussions on this list and with the reviewer we opted to allow 
the EACH keyword at column level.
We started with the case where at most one column is EACH, which is 
easier to understand.
The case of two or more EACH columns in the same foreign key has been 
left open for future development.



 Why would the possible actions be affected, and why only these?


We  had to add the EACH variant to two actions (EACH CASCADE and EACH 
SET  NULL), in order to leave users the flexibility to choose the 
operation  to be performed in case of delete or update of one or more 
elements from  the referenced table.
Some  users indeed might prefer that, in case a referenced row is 
deleted,  the whole row is deleted (therefore they'd use the standard 
CASCADE  action). Others mights simply require that references to that 
row is  removed from the referencing array (therefore they'd use the 
variant  EACH CASCADE action). The same concept applies for SET NULL 
(the whole  array is set to NULL) and EACH SET NULL (referencing 
elements are set to  NULL).


Thank you.

Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it



Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 26, 2012 at 2:50 AM, Magnus Hagander mag...@hagander.net wrote:
 s/segment/file/g?

 We're already using file to mean something different *internally*,
 don't we? And since pg_controldata shows fairly internal information,
 I'm not sure this is the best idea.
 
 Maybe compromise and call it segment file - that is both easier to
 understand than segment, and not actually using a term that means
 something else...

 It's also kind of wordy.  I think file is fine.

+1 for file.  I think the internal usage of file to mean roughly
4GB worth of WAL is going to go away soon anyway, as there won't be
much reason to worry about the concept once LSN arithmetic is 64-bit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Replace empty locale name with implied value in CREATE DATABASE

2012-03-26 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 03/26/2012 03:47 AM, Tom Lane wrote:
 Replace empty locale name with implied value in CREATE DATABASE and initdb.

 hmm seems like this commit broken quagga:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2012-03-26%2002%3A03%3A04

Hm, it's hard to see how that patch would have affected the
create-conversions step but not any preceding one.  Can you provide
a stack trace from the core dump?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)

2012-03-26 Thread Robert Haas
On Sat, Mar 24, 2012 at 1:01 PM, Michael Tautschnig m...@debian.org wrote:
 Here, the two writes on Worker 0 corresponds to lines 15 and 16. And indeed
 line 16 is exactly the call to SetLatch. For solving problem 1, the mp idiom,
 the following options are possible (in all cases stronger synchronisation
 primitives may be used, i.e., the strongest Power barrier, sync, may be used, 
 or
 lwsync may be used instead of an address dependency):

 1. An lwsync at the beginning of SetLatch, and lwsync in ResetLatch 
 (preferably
 after the write).
 2. An lwsync at the beginning of SetLatch, and an address dependency in
 ResetLatch.

 To address the second problem, the lb idiom, an address dependency has to be 
 put
 either in WaitLatch or SetLatch.

 To fix both problems, the performance-wise cheapest option would thus be 
 placing
 an address dependency in ResetLatch and an lwsync in SetLatch. For practical
 reasons, however, placing an lwsync in both places (at the beginning of 
 SetLatch
 and after the write in ResetLatch) might be preferable, as address 
 dependencies
 may be optimised away by the C compiler or require inline assembly in a form 
 not
 as easy to factor out as lwsync, plus the interface of ResetLatch would have 
 to
 be amended.

 In summary, we were thus able to show that both points marked with XXX there
 really ought to be a memory barrier in

 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4e15a4db5e65e43271f8d20750d6500ab12632d0

 are the appropriate points to place memory synchronisation primitives, and
 picking an lwsync-equivalent in both cases is sound and does not require any
 other modifications.

It's interesting that you've concluded that memory barriers are needed
in exactly the two places that Tom concluded they were needed.  I
think your analysis of what type of memory barrier is required is
faulty, however.  In your version of the code, setting the latch is
represented by a single store.  However, SetLatch() is more
complicated than that - it does a load, and depending on the results
of the load, it does a store and then maybe a system call.  I suspect
that if you change line 16 to read:

if (!latch[i+1 % WORKERS]) latch[i+1 % WORKERS] = 1;

...then your tool will tell you that you need a full barrier before
that rather than just a store/store barrier.

After staring at this for a while, I am fairly certain that
ResetLatch() needs a full barrier, too.  It seems to me that the
formal hazard you're guarding against by inserting an lwsync here is
the load/load dependency between loading the latch and loading the
flag.  In your version, that's probably OK, but in real life, it's
not, because WaitLatch() can wake up for a variety of reasons, not
just because the latch has been set.  So the following is possible:
worker #1 executes line 14, worker #2 wakes up after line 19 and
executes line 10 and then performs the load on line 12 before the
store on line 11 (since the lwsync you propose doesn't act as a
store/load barrier), worker #1 executes lines 15 and 16, worker #2 now
does the store on line 11.  At this point we are hosed, because worker
#2 has clobbered worker #1's attempt to set the latch without seeing
that the flag is set, and everybody goes into the tank and waits
forever (or until worker #2 receives another wake-up from some other
source, at which point we'll be back in business).

I proposed before that a barrier was also needed at the start of
WaitLatch(), to guard against a load/load dependency between loading
the flag (line 12) and loading the latch (line 19).  Our version of
WaitLatch doesn't busy-wait, so the concern is approximately that we
could do the load at line 19, conclude that no waiting is needed, then
do the load at line 12, do the store at line 14, and then go to sleep.
 That actually can't happen with this exact code, because if we were
to execute line 14 then we'd also hit the proposed lwsync at line 16
and so the loads would happen in order.  But lines 15-16 need not be
there: the most common use of this machinery is to hand of requests
from a foreground process to a background process, and the background
process need not be friendly enough to supply a barrier after checking
the flag and before calling WaitLatch().  I think a load/load barrier
here would be enough, but we've actually got a full barrier, because
WaitLatch calls drainSelfPipe() before checking the flag, and as noted
in the comments we assume that a system call acts as a full barrier.
So there's no live bug here, but I think it's worth noting that you
can't conclude that WaitLatch() doesn't need a barrier on the basis of
this simplified example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan


I'm not sure if this is a bug, but I have wrestling with this problem 
for a client.


Platform is Windows Servers 2003 64 bit, PostgreSQL 8.4.8., 4Gb RAM, 
running on an Amazon VM.


Shared buffers: 512Mb, work_mem: 25Mb. There are only a handful of 
connections to the database, and no other activity.


We are seeing the error shown below. The table in question has two 
columns (Oid, int) and roughly 43m rows. The only other thing remarkable 
about the settings is that effective_cache_size is set to 5Gb, which is 
clearly too high, but surely that shouldn't cause a memory error.


I'm really perplexed as to why this fairly simple query should cause an 
out of memory error:


   select loid, max(pageno) from ldata group by loid order by 2 desc
   limit 10;

I can't see what I might be missing.


cheers

andrew

   TopMemoryContext: 49816 total in 6 blocks; 5384 free (7 chunks);
   44432 used
  TopTransactionContext: 8192 total in 1 blocks; 7696 free (0
   chunks); 496 used
  Record information cache: 8192 total in 1 blocks; 1800 free (0
   chunks); 6392 used
  Type information cache: 8192 total in 1 blocks; 1800 free (0
   chunks); 6392 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0
   chunks); 4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6
   chunks); 10504 used
  MessageContext: 40960 total in 3 blocks; 29920 free (6 chunks);
   11040 used
  smgr relation table: 8192 total in 1 blocks; 2816 free (0
   chunks); 5376 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
   chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks);
   104 used
  ExecutorState: 8192 total in 1 blocks; 2144 free (1 chunks);
   6048 used
TupleSort: 40984 total in 3 blocks; 24208 free (10 chunks);
   16776 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: 864018432 total in 127 blocks; 3400 free (110
   chunks); 864015032 used
  TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks);
   4816 used
  CacheMemoryContext: 667696 total in 20 blocks; 169960 free (2
   chunks); 497736 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0
   chunks); 872 used
pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0
   chunks); 872 used
pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_database_datname_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free
   (0 chunks); 872 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
   680 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0
   chunks); 784 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free
   (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
   chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0
   chunks); 832 used
pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 280
   free (0 chunks); 744 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_conversion_default_index: 1024 total in 1 blocks; 128 free
   (0 chunks); 896 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88
   free (0 chunks); 936 used
 

Re: [HACKERS] heap_freeze_tuple locking requirements

2012-03-26 Thread Robert Haas
On Fri, Mar 23, 2012 at 11:02 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié mar 21 21:50:24 -0300 2012:
 heap_freeze_tuple() was apparently designed at one point to cope with
 being called with either a shared or exclusive buffer lock.  But none
 of the current callers call it with a shared lock; they all call it
 with an exclusive lock, except for the heap-rewrite code which doesn't
 take (or need) a lock at all.

 Since this is just dead code removal, I propose to apply this to 9.2.

 +1

Thanks, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Hans-Jürgen Schönig
hello,

does the problem show up on 2% of all problems after 2 weeks or so?
we had a similar problem on UNIX as well. it even materialized on 100 identical 
boxes (on 2% of them). it pops up randomly and never stops …
i checked some code paths. some of those messages are direct output via stderr 
(not even elog).
unfortunately i did not manage to find a box where i could GDB to attack the 
problem .
it was 8.4.8 as well.

do you see a certain workload which would make the problem reproducable?


regards,

hans



On Mar 26, 2012, at 5:03 PM, Andrew Dunstan wrote:

 
 I'm not sure if this is a bug, but I have wrestling with this problem for a 
 client.
 
 Platform is Windows Servers 2003 64 bit, PostgreSQL 8.4.8., 4Gb RAM, running 
 on an Amazon VM.
 
 Shared buffers: 512Mb, work_mem: 25Mb. There are only a handful of 
 connections to the database, and no other activity.
 
 We are seeing the error shown below. The table in question has two columns 
 (Oid, int) and roughly 43m rows. The only other thing remarkable about the 
 settings is that effective_cache_size is set to 5Gb, which is clearly too 
 high, but surely that shouldn't cause a memory error.
 
 I'm really perplexed as to why this fairly simple query should cause an out 
 of memory error:
 
   select loid, max(pageno) from ldata group by loid order by 2 desc
   limit 10;
 
 I can't see what I might be missing.
 
 
 cheers
 
 andrew
 
   TopMemoryContext: 49816 total in 6 blocks; 5384 free (7 chunks);
   44432 used
  TopTransactionContext: 8192 total in 1 blocks; 7696 free (0
   chunks); 496 used
  Record information cache: 8192 total in 1 blocks; 1800 free (0
   chunks); 6392 used
  Type information cache: 8192 total in 1 blocks; 1800 free (0
   chunks); 6392 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0
   chunks); 4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6
   chunks); 10504 used
  MessageContext: 40960 total in 3 blocks; 29920 free (6 chunks);
   11040 used
  smgr relation table: 8192 total in 1 blocks; 2816 free (0
   chunks); 5376 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
   chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks);
   104 used
  ExecutorState: 8192 total in 1 blocks; 2144 free (1 chunks);
   6048 used
TupleSort: 40984 total in 3 blocks; 24208 free (10 chunks);
   16776 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: 864018432 total in 127 blocks; 3400 free (110
   chunks); 864015032 used
  TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks);
   4816 used
  CacheMemoryContext: 667696 total in 20 blocks; 169960 free (2
   chunks); 497736 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0
   chunks); 872 used
pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0
   chunks); 872 used
pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_database_datname_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free
   (0 chunks); 872 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
   680 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0
   chunks); 784 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free
   (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
   chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_ts_config_map_index: 1024 total 

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I'm really perplexed as to why this fairly simple query should cause an 
 out of memory error:
 select loid, max(pageno) from ldata group by loid order by 2 desc
 limit 10;

Looks like the group by/aggregate step is eating lots of memory:

  AggContext: 864018432 total in 127 blocks; 3400 free (110
 chunks); 864015032 used
TupleHashTable: 619175960 total in 95 blocks; 821528 free
 (331 chunks); 618354432 used

A guess is that there are a huge number of distinct values of loid but
the planner fails to realize that and tries to use a hash aggregation.
Could we see EXPLAIN output for this query?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan



On 03/26/2012 11:18 AM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

I'm really perplexed as to why this fairly simple query should cause an
out of memory error:
 select loid, max(pageno) from ldata group by loid order by 2 desc
 limit 10;

Looks like the group by/aggregate step is eating lots of memory:


  AggContext: 864018432 total in 127 blocks; 3400 free (110
 chunks); 864015032 used
TupleHashTable: 619175960 total in 95 blocks; 821528 free
 (331 chunks); 618354432 used

A guess is that there are a huge number of distinct values of loid but
the planner fails to realize that and tries to use a hash aggregation.
Could we see EXPLAIN output for this query?


Currently it shows:

Limit  (cost=19443025.87..19443025.89 rows=10 width=8
  -  Sort  (cost=19443025.87..19446451.29 rows=1370168 width=8)
Sort Key: (max(pageno))
-  GroupAggregate  (cost=18537785.99..19413417.03 rows=1370168 
width=8)
  -  Sort  (cost=18537785.99..18823953.97 rows=114467192 
width=8)

Sort Key: loid
-  Seq Scan on ldata  (cost=0.00..1651163.92 
rows=114467192 width=8)




The table might have been analysed since I ran the query, though.

To answer Hans' question, we have seen the problem in other contexts. We 
first noticed this problem in a failure to restore large objects when 
running pg_restore.  The database has 43,998,486 LOs on 114,467,137 
pages. The largest of these is 2160 pages. We're currently running a 
test to see if we can successfully restore LOs by doing them in smaller 
batches rather than in a single transaction. However, this one seemed 
even odder than the LO problem.



cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 11:18 AM, Tom Lane wrote:
 Could we see EXPLAIN output for this query?

 Currently it shows:

 Limit  (cost=19443025.87..19443025.89 rows=10 width=8
-  Sort  (cost=19443025.87..19446451.29 rows=1370168 width=8)
  Sort Key: (max(pageno))
  -  GroupAggregate  (cost=18537785.99..19413417.03 rows=1370168 
 width=8)
-  Sort  (cost=18537785.99..18823953.97 rows=114467192 
 width=8)
  Sort Key: loid
  -  Seq Scan on ldata  (cost=0.00..1651163.92 
 rows=114467192 width=8)

 The table might have been analysed since I ran the query, though.

That plan should not create a tuple hash table, so I think it's almost
certain that the plan changed.  It might be interesting to remove the
pg_statistic rows for the table and then see what plan you get.

 To answer Hans' question, we have seen the problem in other contexts. We 
 first noticed this problem in a failure to restore large objects when 
 running pg_restore.

[ scratches head... ]  I don't understand how or why pg_restore would be
executing such a query.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark
On Mon, Mar 26, 2012 at 4:03 PM, Andrew Dunstan and...@dunslane.net wrote:
              TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used

I think the plan you showed isn't the plan that's running out of
memory. I think it's running out of memory because it's using a Hash
Aggregate and underestimating the number of distinct elements. I would
set enable_hash_aggregate=false for this query (not for the whole
server, hash aggregates are quite useful in general).

This is a long-standing problem with hash aggregates. The problem is
that we don't have a clever way to fall back to a non-hash-aggregate
if the original estimated memory usage turns out to be way off. Just
spilling to disk the way hash joins do would be horrendously
expensive, we think. If we had an approach to doing so it would be
very useful because simply running out of memory is kind of rude.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan



On 03/26/2012 12:11 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 03/26/2012 11:18 AM, Tom Lane wrote:

Could we see EXPLAIN output for this query?

Currently it shows:
Limit  (cost=19443025.87..19443025.89 rows=10 width=8
-   Sort  (cost=19443025.87..19446451.29 rows=1370168 width=8)
  Sort Key: (max(pageno))
  -   GroupAggregate  (cost=18537785.99..19413417.03 rows=1370168
width=8)
-   Sort  (cost=18537785.99..18823953.97 rows=114467192
width=8)
  Sort Key: loid
  -   Seq Scan on ldata  (cost=0.00..1651163.92
rows=114467192 width=8)
The table might have been analysed since I ran the query, though.

That plan should not create a tuple hash table, so I think it's almost
certain that the plan changed.  It might be interesting to remove the
pg_statistic rows for the table and then see what plan you get.



Yeah, that gets us:


Limit  (cost=2223492.78..2223492.81 rows=10 width=8)
  -  Sort  (cost=2223492.78..2223493.28 rows=200 width=8)
Sort Key: (max(pageno))
-  HashAggregate  (cost=2223485.96..2223488.46 rows=200 width=8)
  -  Seq Scan on ldata  (cost=0.00..1651154.64 
rows=114466264 width=8)






To answer Hans' question, we have seen the problem in other contexts. We
first noticed this problem in a failure to restore large objects when
running pg_restore.

[ scratches head... ]  I don't understand how or why pg_restore would be
executing such a query.





It's not. I was explaining that we have seen memory failures in *other* 
contexts, not just this query. The restore fails after many hours on a 
call to lo_write().


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan



On 03/26/2012 12:20 PM, Greg Stark wrote:

On Mon, Mar 26, 2012 at 4:03 PM, Andrew Dunstanand...@dunslane.net  wrote:

  TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used

I think the plan you showed isn't the plan that's running out of
memory. I think it's running out of memory because it's using a Hash
Aggregate and underestimating the number of distinct elements. I would
set enable_hash_aggregate=false for this query (not for the whole
server, hash aggregates are quite useful in general).

This is a long-standing problem with hash aggregates. The problem is
that we don't have a clever way to fall back to a non-hash-aggregate
if the original estimated memory usage turns out to be way off. Just
spilling to disk the way hash joins do would be horrendously
expensive, we think. If we had an approach to doing so it would be
very useful because simply running out of memory is kind of rude.



Ugh, Ok, thanks for the explanation.

So it looks like I'm running into two separate memory problems. Here's a 
trace from the other one


cheers

andrew

   TopMemoryContext: 49816 total in 6 blocks; 6840 free (9 chunks);
   42976 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0
   chunks); 4344 used
  Filesystem: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used
  TopTransactionContext: 213901336 total in 10 blocks; 1796344 free
   (25 chunks); 212104992 used
Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
   chunks); 750329768 used
  MessageContext: 73728 total in 4 blocks; 44920 free (15 chunks);
   28808 used
  smgr relation table: 8192 total in 1 blocks; 2816 free (0
   chunks); 5376 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
   chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks);
   104 used
  ExecutorState: 8192 total in 1 blocks; 5880 free (1 chunks);
   2312 used
ExprContext: 24576 total in 2 blocks; 13664 free (8
   chunks); 10912 used
  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks);
   4816 used
  CacheMemoryContext: 405552 total in 19 blocks; 14296 free (5
   chunks); 391256 used
pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 192 free
   (0 chunks); 832 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
   680 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 280 free (0
   chunks); 744 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free
   (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
   chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0
   chunks); 832 used
pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 280
   free (0 chunks); 744 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_conversion_default_index: 1024 total in 1 blocks; 128 free
   (0 chunks); 896 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 128
   free (0 chunks); 896 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240
   free (0 chunks); 784 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0
   chunks); 744 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free
   (0 chunks); 832 used
pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
   720 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 12:11 PM, Tom Lane wrote:
 That plan should not create a tuple hash table, so I think it's almost
 certain that the plan changed.  It might be interesting to remove the
 pg_statistic rows for the table and then see what plan you get.

 Yeah, that gets us:

 Limit  (cost=2223492.78..2223492.81 rows=10 width=8)
-  Sort  (cost=2223492.78..2223493.28 rows=200 width=8)
  Sort Key: (max(pageno))
  -  HashAggregate  (cost=2223485.96..2223488.46 rows=200 width=8)
-  Seq Scan on ldata  (cost=0.00..1651154.64 
 rows=114466264 width=8)

Hm.  This illustrates that it's not too prudent to rely on a default
numdistinct estimate to decide that a hash aggregation is safe :-(.
We had probably better tweak the cost estimation rules to not trust
that.  Maybe, if we have a default estimate, we should take the worst
case estimate that the column might be unique?  That could still burn
us if the rowcount estimate was horribly wrong, but those are not nearly
as shaky as numdistinct estimates ...

 [ scratches head... ]  I don't understand how or why pg_restore would be
 executing such a query.

 It's not. I was explaining that we have seen memory failures in *other* 
 contexts, not just this query. The restore fails after many hours on a 
 call to lo_write().

Seems probably unrelated then.  Have you got a memory-usage dump for
that case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-26 Thread Peter Eisentraut
On mån, 2012-03-19 at 15:53 -0400, Tom Lane wrote:
 This connects somewhat to the discussions we've had in the past about
 trying to get not-intended-for-public-use functions out of the
 standard search path.  Not that you want to put a full visibility
 check into the tab-completion query, but if it could simply exclude a
 pg_private namespace, that probably wouldn't be too expensive.

I wonder if this could be worked out using pg_depend.  For example, give
me all functions that are not referenced by some object in pg_catalog.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark
On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstan and...@dunslane.net wrote:
        Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
   chunks); 750329768 used

I think you'll have to catch Heikki's attention to get a good answer to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Simon Riggs
On Mon, Mar 26, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Hm.  This illustrates that it's not too prudent to rely on a default
 numdistinct estimate to decide that a hash aggregation is safe :-(.
 We had probably better tweak the cost estimation rules to not trust
 that.  Maybe, if we have a default estimate, we should take the worst
 case estimate that the column might be unique?  That could still burn
 us if the rowcount estimate was horribly wrong, but those are not nearly
 as shaky as numdistinct estimates ...

The selectivity API should include some way of indicating the accuracy
of the answer, as well as the answer itself.

That way we could respond better in a wide range of circumstances.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Heikki Linnakangas

On 26.03.2012 19:51, Greg Stark wrote:

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstanand...@dunslane.net  wrote:

Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
   chunks); 750329768 used


I think you'll have to catch Heikki's attention to get a good answer to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?


.. and all that in different subtransactions.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark
On Mon, Mar 26, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  This illustrates that it's not too prudent to rely on a default
 numdistinct estimate to decide that a hash aggregation is safe :-(.
 We had probably better tweak the cost estimation rules to not trust
 that.  Maybe, if we have a default estimate, we should take the worst
 case estimate that the column might be unique?  That could still burn
 us if the rowcount estimate was horribly wrong, but those are not nearly
 as shaky as numdistinct estimates ...

Perhaps we should have two work_mem settings -- one for the target to
aim for and one for a hard(er) limit that we should ensure the worst
case falls under?

I have a sketch for how to handle spilling hash aggregates to disk in
my head. I'm not sure if it's worth the amount of complexity it would
require but I'll poke around a bit and see if it works out well.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Heikki Linnakangas

On 26.03.2012 19:59, Heikki Linnakangas wrote:

On 26.03.2012 19:51, Greg Stark wrote:

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstanand...@dunslane.net
wrote:

Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
chunks); 750329768 used


I think you'll have to catch Heikki's attention to get a good answer
to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?


.. and all that in different subtransactions.


sorry, scratch that, they don't need to be in different subtransactions.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 I have a sketch for how to handle spilling hash aggregates to disk in
 my head. I'm not sure if it's worth the amount of complexity it would
 require but I'll poke around a bit and see if it works out well.

It'd be awfully nice if those could spill to disk.  I think that
currently that's the only plan type where a misestimate can lead to
hard failure rather than just slower-than-you'd-like.  Which is not
nice considering that the estimates are necessarily just estimates.

Could you give us a brain dump on the sketch?  I've never seen how to
do it without unreasonable overhead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan



On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:

On 26.03.2012 19:59, Heikki Linnakangas wrote:

On 26.03.2012 19:51, Greg Stark wrote:

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstanand...@dunslane.net
wrote:

Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
chunks); 750329768 used


I think you'll have to catch Heikki's attention to get a good answer
to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?


.. and all that in different subtransactions.


sorry, scratch that, they don't need to be in different subtransactions.



It's all in a single transaction. In fact the solution I'm currently 
testing and seems to be working involves breaking it up into batches of 
a few thousand LOs restored per batch.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:
 Is it possible this job is inserting and then updating (or deleteing)
 the row it just inserted and doing a large number of such
 insert/update operations all within the same transaction? Or perhaps
 it's updating the same row over and over again?

 It's all in a single transaction. In fact the solution I'm currently 
 testing and seems to be working involves breaking it up into batches of 
 a few thousand LOs restored per batch.

Hm.  The test case is just a straight pg_restore of lots and lots of LOs?
What pg_dump version was the dump made with?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan



On 03/26/2012 01:34 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?

It's all in a single transaction. In fact the solution I'm currently
testing and seems to be working involves breaking it up into batches of
a few thousand LOs restored per batch.

Hm.  The test case is just a straight pg_restore of lots and lots of LOs?
What pg_dump version was the dump made with?





8.4.8, same as the target. We get the same issue whether we restore 
direct to the database from pg_restore or via a text dump.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 01:34 PM, Tom Lane wrote:
 Hm.  The test case is just a straight pg_restore of lots and lots of LOs?
 What pg_dump version was the dump made with?

 8.4.8, same as the target. We get the same issue whether we restore 
 direct to the database from pg_restore or via a text dump.

I believe I see the issue: when creating/loading LOs, we first do a
lo_create (which in 8.4 makes a page zero tuple in pg_largeobject
containing zero bytes of data) and then lo_write, which will do a
heap_update to overwrite that tuple with data.  This is at the next
command in the same transaction, so the original tuple has to receive a
combo CID.  Net result: we accumulate one new combo CID per large object
loaded in the same transaction.  You can reproduce this without any
pg_dump involvement at all, using something like

create table mylos (id oid);
insert into mylos select lo_import('/tmp/junk') from generate_series(1,100);

The problem is gone in 9.0 and up because now we use a
pg_largeobject_metadata entry instead of a pg_largeobject row to flag
the existence of an empty large object.  I don't see any very practical
backend fix for the problem in 8.x.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Replace empty locale name with implied value in CREATE DATABASE

2012-03-26 Thread Stefan Kaltenbrunner
On 03/26/2012 04:29 PM, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 03/26/2012 03:47 AM, Tom Lane wrote:
 Replace empty locale name with implied value in CREATE DATABASE and initdb.
 
 hmm seems like this commit broken quagga:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2012-03-26%2002%3A03%3A04
 
 Hm, it's hard to see how that patch would have affected the
 create-conversions step but not any preceding one.  Can you provide
 a stack trace from the core dump?

hmm weird - I cannot reproduce the core dump using the binaries created
by the failing build at all.
Given that the current build also went through just fine I wonder if
that was some sort of false-alarm created by a cosmic ray or a hardware
issue(though I have no indication that anything is wrong on the box).


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove dead assignment

2012-03-26 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 Remove dead assignment
 found by Coverity

init_sequence(seq_relid, elm, seq_rel);
-   seq = read_info(elm, seq_rel, buf);
+   read_info(elm, seq_rel, buf);


I have to object to this patch.  In the blind service of eliminating
warnings from some tool or other, you will introduce warnings from
other tools?  It's traditional for lint to complain about code that
sometimes ignores the return value of a function, for instance.
I also do not think it does anything for readability for this call
of read_info() to be unexpectedly unlike all the others.

I think we should institute a project policy that we will ignore dead
assignment coverity warnings.  I have not seen one of those changes
yet that seemed to me like a good idea.  Any optimizing compiler is
perfectly capable of figuring out that an assignment is dead and
eliminating it, so there is no code size advantage from doing this
manually; and even the gcc boys have not (yet?) decided they should
warn about dead assignments.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Dimitri's proposed behavior would be advantageous if you have an ANY
 trigger that wants to take over the world and make sure that nobody
 else can run before it.  I think, though, that's not a case we want to
 cater to - all of this stuff requires superuser privileges anyway, so
 we should assume that the DBA knows what he's doing.  So +1 for making

What about extensions?

One use case would be for londiste/slony/bucardo to rewrite the command
and queue its text, that will be done in C and should probably be done
first. Using an ANY command trigger means that code will run before user
specific code (or likewise after it).

As I said it's not that clear in my head, but when thinking about
command trigger and extensions, it could be better to impose an
arbitrary order here.

 it strictly alphabetical, as we do with other triggers.  Everything
 that can be done under Dimitri's proposal can also be done in that
 scheme, but the reverse is not true.

That's true too. I'm just not sure how much it applies to code installed
by the DBA as opposed to written by the DBA. I'll be happy to edit the
patch to melt both lists if that's the decision, it's not hard to do so.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove dead assignment

2012-03-26 Thread Peter Eisentraut
On mån, 2012-03-26 at 15:15 -0400, Tom Lane wrote:
 init_sequence(seq_relid, elm, seq_rel);
 -   seq = read_info(elm, seq_rel, buf);
 +   read_info(elm, seq_rel, buf);
 
 
 I have to object to this patch.  In the blind service of eliminating
 warnings from some tool or other, you will introduce warnings from
 other tools?  It's traditional for lint to complain about code that
 sometimes ignores the return value of a function, for instance.

Yes, but the return value is ignored in this case as well.  Just
assigning it doesn't change that.

 I also do not think it does anything for readability for this call
 of read_info() to be unexpectedly unlike all the others. 

I do not think that it is good code quality to assign something to a
variable and then assign something different to a variable later in the
same function.  It is better, on the other hand, if a function call
looks different if what it's supposed to do is different.

But I don't want to get hung up on this.  I thought it was just an
oversight.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Thom Brown
Can someone clarify whether this will be reviewed by a committer?
Will there be time to get this reviewed before the commitfest closes?
I get the impression the commitfest closure is fairly imminent.

Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove dead assignment

2012-03-26 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2012-03-26 at 15:15 -0400, Tom Lane wrote:
 I also do not think it does anything for readability for this call
 of read_info() to be unexpectedly unlike all the others. 

 I do not think that it is good code quality to assign something to a
 variable and then assign something different to a variable later in the
 same function.

Well, that's a totally different issue, because if we had used a
different variable for the other purpose, this assignment would
still be dead and coverity would still be whinging about it, no?

The problem that I have with this change (and the similar ones you've
made elsewhere) is really that it's only chance that the code isn't
fetching anything from the result of read_info.  If we subsequently
wanted to change the logic so it did do that, we'd have to put back the
assignment.  That sort of code churn benefits nobody.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Robert Haas
On Mon, Mar 26, 2012 at 3:24 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 One use case would be for londiste/slony/bucardo to rewrite the command
 and queue its text, that will be done in C and should probably be done
 first. Using an ANY command trigger means that code will run before user
 specific code (or likewise after it).

And, if the user wants it to be run first, he or she can do that.  But
suppose he wants to run it first, and also forbid users whose username
starts with the letter b from running the ANALYZE command.  Well,
then, he now wants that trigger to come before the other one, even
though the Slony trigger is for all commands (maybe) and the other
just for ANALYZE (maybe).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Robert Haas robertmh...@gmail.com writes:
 Dimitri's proposed behavior would be advantageous if you have an ANY
 trigger that wants to take over the world and make sure that nobody
 else can run before it.  I think, though, that's not a case we want to
 cater to - all of this stuff requires superuser privileges anyway, so
 we should assume that the DBA knows what he's doing.  So +1 for making

 What about extensions?

 One use case would be for londiste/slony/bucardo to rewrite the command
 and queue its text, that will be done in C and should probably be done
 first. Using an ANY command trigger means that code will run before user
 specific code (or likewise after it).

Unless you intend a restriction that there be only one ANY trigger,
I don't see how that follows.  AFAICS, the only way to guarantee my
trigger runs first is to give it a name alphabetically before anything
else in the system.

Also, it strikes me that in most of the trigger ordering cases I've
seen, it's actually more interesting to want to be sure that a
particular trigger runs *last* so that its effects can't be modified
by some other, hypothetically less trusted, trigger.

So I don't think that the mere fact of being an ANY trigger rather than
a command-specific trigger should be taken to mean that a particular
ordering is desirable.  Trigger name order isn't the greatest solution
by any means, but it's more flexible than hard-wiring according to
trigger type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Tom Lane
Thom Brown thombr...@gmail.com writes:
 Can someone clarify whether this will be reviewed by a committer?
 Will there be time to get this reviewed before the commitfest closes?
 I get the impression the commitfest closure is fairly imminent.

I don't have that impression.  (I wish I did.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Robert Haas
On Mon, Mar 26, 2012 at 3:36 PM, Thom Brown thombr...@gmail.com wrote:
 Can someone clarify whether this will be reviewed by a committer?
 Will there be time to get this reviewed before the commitfest closes?
 I get the impression the commitfest closure is fairly imminent.

Well, I have been holding off for two reasons:

1. It sure seems like there is an awful lot of code churn and design
work going on.

2. I'm not sure which patches Tom is planning to look at or in what
order, so I've been avoiding the ones he seems to be taking an
interest in.

Personally, I am about at the point where I'd like to punt everything
and move on.  As nice as it would be to squeeze a few more things into
9.2, there WILL be a 9.3.  If a few less people had submitted
half-baked code at the last minute and a few more people had helped
with review, we'd be done by now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 So I don't think that the mere fact of being an ANY trigger rather than
 a command-specific trigger should be taken to mean that a particular
 ordering is desirable.  Trigger name order isn't the greatest solution
 by any means, but it's more flexible than hard-wiring according to
 trigger type.

That ANY sandwich idea is then dead, I will fix it tomorrow to rather
just handle a single list of BEFORE and AFTER triggers (that's 2 lists
total) ordered by trigger name.

v19 will also integrate latest doc comments from Thom and most from
Andres, I don't know how to fix the plpython specifics he's talking
about.

About the reviewing and the commit fest closing, even if that patch is
big it's a principled implementation: the integration of the facility is
done in the same way in lots of different places, and is not rocket
science either (we removed all the complexity). So I guess it's not
really an herculean job here, just a certain amount of mechanical edits:
we just support too many commands ;)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Andres Freund
On Monday, March 26, 2012 10:18:59 PM Dimitri Fontaine wrote:
  don't know how to fix the plpython specifics he's talking
 about.
Just copy what is done in the normal trigger handling facility (the decref 
both in the CATCH and in the normal path). Ping me some other way if you need 
help...

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 1. It sure seems like there is an awful lot of code churn and design
 work going on.

There has only been minor adjustments for a while now, and they have
been visible because Thom was doing lots of testing for me and it was
way easier for me to publish a new version and have a test result the
next day (thanks again Thom).

 Personally, I am about at the point where I'd like to punt everything
 and move on.  As nice as it would be to squeeze a few more things into
 9.2, there WILL be a 9.3.  If a few less people had submitted
 half-baked code at the last minute and a few more people had helped
 with review, we'd be done by now.

Well, wait a minute. There's a difference between half-baked and
reacting to a review that changes the goal of a patch. My idea of the
code I wanted to write here is extremely different from what we as a
community decided to be doing. The main part of the code churn has been
answering to review, removing features and cleaning the code afterwards.

The only major design decision that I had to change here has been about
from where to call in the command trigger code in the existing commands
implementation, and it was done before entering this CF, IIRC.

If you want to punt this patch out of 9.2 after all the changes I had to
make for it to be a candidate for 9.2, I think it would be only fair for
you to find a show stopper in my current implementation. The trigger
firing order is about an hour of work, so not a stopper I believe.

And as soon as we're done here, you know I'll put the same hours and
energy into reviewing other people patches :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoint patches

2012-03-26 Thread Robert Haas
On Sun, Mar 25, 2012 at 4:29 PM, Jim Nasby j...@nasby.net wrote:
 I wouldn't be too quick to dismiss increasing checkpoint frequency (ie:
 decreasing checkpoint_timeout).

I'm not dismissing that, but my tests show only a very small gain in
that area.  Now there may be another test where it shows a bigger
gain, but finding such a test is the patch author's job, not mine.
Patches that supposedly improve performance should be submitted with
test results showing that they do.  This patch was submitted more than
two months ago with no performance results, no documentation, and
bugs.  Because I feel that this is an important area for us to try to
improve, I devoted a substantial amount of time to trying to
demonstrate that the patch does something good, but I didn't find
anything very convincing, so I think it's time to punt this one for
now and let Greg pursue the strategy he originally intended, which was
to leave this whole area alone until 9.3[1].  I think he only
submitted something at all because I kicked out a somewhat random
attempt to solve the same problem[2].  Well, it turns out that, on the
test cases I have available, neither one is any great shakes.
Considering Greg Smith's long track record of ridiculously diligent
research, I feel pretty confident he'll eventually come back to the
table either with more evidence that this is the right approach (in
which case we'll be able to document it in a reasonable way, which is
currently impossible, since we don't have only the vaguest idea when
setting it to a non-zero value might be useful, or what value to set)
or with another proposed approach that he thinks is better and test
results to back it up.  Had someone other than Greg proposed this, I
probably wouldn't have spent time on it at all, because by his own
admission it's not really ready for prime time, but as it was I
thought I'd try to see if I could fill in the gaps.

Long story short, this may yet prove to be useful in some
as-yet-unknown set of circumstances, but that's not a sufficient
reason to commit it, so I've marked it (and my patch, which doesn't
win either) Returned with Feedback.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://archives.postgresql.org/message-id/4f13d856.60...@2ndquadrant.com
[2] https://commitfest.postgresql.org/action/patch_view?id=752

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Thom Brown
On 26 March 2012 21:36, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Personally, I am about at the point where I'd like to punt everything
 and move on.  As nice as it would be to squeeze a few more things into
 9.2, there WILL be a 9.3.  If a few less people had submitted
 half-baked code at the last minute and a few more people had helped
 with review, we'd be done by now.

 Well, wait a minute. There's a difference between half-baked and
 reacting to a review that changes the goal of a patch. My idea of the

I think Robert was referring to patches in general.

Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Magnus Hagander
On Tue, Mar 20, 2012 at 18:48, Daniel Farina dan...@heroku.com wrote:
 On Tue, Mar 20, 2012 at 10:13 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Maybe we should just not worry about this.

 That's been my reaction right along.  There's no evidence that PID
 recycling is a problem in the real world.

 I'm entirely willing to acquiesce to that point of view.  I only
 thought this was the blocker as to why pg_terminate_backend was left
 out of the pg_cancel_backend patch.

Late back into this thread.

I wasn't aware that was the reason there. I think it was the general
leftovers from previous times. When we first created
pg_terminate_backend() there was a general thought that it might not
be safe to just SIGTERM a backend to make it quit. A bunch of fixes
were put in place to make it more safe, but I'm not sure anybody
actually declared it fully safe. And I think it's a lot of legacy from
that time that just steers people towards the baby-steps approach.

I'm not sure - perhaps we're past that worry these days?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Robert Haas
On Mon, Mar 26, 2012 at 4:36 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Well, wait a minute. There's a difference between half-baked and
 reacting to a review that changes the goal of a patch. My idea of the
 code I wanted to write here is extremely different from what we as a
 community decided to be doing. The main part of the code churn has been
 answering to review, removing features and cleaning the code afterwards.

Sure.  And if this weren't the final CommitFest, we would have bumped
this to the next CommitFest two months ago.  Since it is the final
CommitFest, we're going to go on and on and on.  Already we have
almost as many patches queued up for the next CommitFest as we do
remaining in this one.  So my question is: how long should we keep all
those people waiting for the purpose of squeezing more stuff into 9.2?
 At the beginning of this CommitFest, various people offered time
frames ranging between 6 weeks to 2 months.  We're past that now.  If
you don't think that was a long enough time frame, then how long do
you think we should wait?   It doesn't seem to me to matter very much
whether this got stretched out due to fundamental design deficiencies
or just because it takes a while to beat a major feature into
committable form; surely it's not a shock that this patch wasn't going
to go in overnight.

 If you want to punt this patch out of 9.2 after all the changes I had to
 make for it to be a candidate for 9.2, I think it would be only fair for
 you to find a show stopper in my current implementation. The trigger
 firing order is about an hour of work, so not a stopper I believe.

I don't think there is a show-stopper.  I do think there is probably a
lot more cleaning up, tidying, and adjusting needed.

 And as soon as we're done here, you know I'll put the same hours and
 energy into reviewing other people patches :)

As soon as we're done here, the CommitFest will end, and there won't
be any other people's patches to review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 I wasn't aware that was the reason there. I think it was the general
 leftovers from previous times. When we first created
 pg_terminate_backend() there was a general thought that it might not
 be safe to just SIGTERM a backend to make it quit.

Not just might not be safe - it was demonstrably buggy in the
beginning.

 A bunch of fixes
 were put in place to make it more safe, but I'm not sure anybody
 actually declared it fully safe.

We never did, we only said we didn't know of more bugs.

 I'm not sure - perhaps we're past that worry these days?

I'm not.  I still wouldn't trust SIGTERMing an individual backend in a
production database.  It'll probably work, but what if it doesn't?
Best-case scenario is you'll need to do a panic shutdown to clear the
stuck lock or whatever that the backend left behind.  (Once you've
diagnosed the problem, that is.)  Now, in a case where the alternative
is a database shutdown anyway, you might as well try it.  But it's the
kind of tool you only want to hand to responsible adults, which is why
it's superuser-only at the moment.  I'm not sure we should be
encouraging people to fire that weapon indiscriminately.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Christopher Browne
On Mon, Mar 26, 2012 at 4:45 PM, Robert Haas robertmh...@gmail.com wrote:
 As soon as we're done here, the CommitFest will end, and there won't
 be any other people's patches to review.

Hurray?  :-)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 2. I'm not sure which patches Tom is planning to look at or in what
 order, so I've been avoiding the ones he seems to be taking an
 interest in.

Well, I think I'm definitely on the hook for the pg_stat_statements,
pgsql_fdw, foreign table stats, and caching-stable-subexpressions
patches, and I should look at the libpq alternate row returning
mechanism because I suspect I was the last one to mess with that libpq
code in any detail.  I don't claim any special insight into the other
stuff on the list.  In particular I've not been paying much attention
to command triggers.

 Personally, I am about at the point where I'd like to punt everything
 and move on.  As nice as it would be to squeeze a few more things into
 9.2, there WILL be a 9.3.  If a few less people had submitted
 half-baked code at the last minute and a few more people had helped
 with review, we'd be done by now.

The main reason I proposed setting a schedule a few weeks ago was that
I was afraid the commitfest would otherwise end precisely in a we're
tired out, we're punting everything to 9.3 moment.  Without some
definite goal to work towards, it'll just keep stretching out until
we've had enough.  I'd prefer it end in a more orderly fashion than
that.  The end result will be the same, in the sense that some of the
stuff that's still-not-ready-for-committer is going to get punted,
but people might have a less bad taste in their mouths about why.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, v16

2012-03-26 Thread Robert Haas
On Mar 26, 2012, at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 2. I'm not sure which patches Tom is planning to look at or in what
 order, so I've been avoiding the ones he seems to be taking an
 interest in.
 
 Well, I think I'm definitely on the hook for the pg_stat_statements,
 pgsql_fdw, foreign table stats, and caching-stable-subexpressions
 patches, and I should look at the libpq alternate row returning
 mechanism because I suspect I was the last one to mess with that libpq
 code in any detail.  I don't claim any special insight into the other
 stuff on the list.  In particular I've not been paying much attention
 to command triggers.

How long will that all take?

I guess I'll work on command triggers, pg_archivecleanup, and buffer I/O 
timings next.

 
 Personally, I am about at the point where I'd like to punt everything
 and move on.  As nice as it would be to squeeze a few more things into
 9.2, there WILL be a 9.3.  If a few less people had submitted
 half-baked code at the last minute and a few more people had helped
 with review, we'd be done by now.
 
 The main reason I proposed setting a schedule a few weeks ago was that
 I was afraid the commitfest would otherwise end precisely in a we're
 tired out, we're punting everything to 9.3 moment.  Without some
 definite goal to work towards, it'll just keep stretching out until
 we've had enough.  I'd prefer it end in a more orderly fashion than
 that.  The end result will be the same, in the sense that some of the
 stuff that's still-not-ready-for-committer is going to get punted,
 but people might have a less bad taste in their mouths about why.

Fine. What do you propose, specifically?

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.2 commitfest closure (was Command Triggers, v16)

2012-03-26 Thread Thom Brown
On 26 March 2012 23:16, Robert Haas robertmh...@gmail.com wrote:
 On Mar 26, 2012, at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, I think I'm definitely on the hook for the pg_stat_statements,
 pgsql_fdw, foreign table stats, and caching-stable-subexpressions
 patches, and I should look at the libpq alternate row returning
 mechanism because I suspect I was the last one to mess with that libpq
 code in any detail.  I don't claim any special insight into the other
 stuff on the list.  In particular I've not been paying much attention
 to command triggers.

 How long will that all take?

 I guess I'll work on command triggers, pg_archivecleanup, and buffer I/O 
 timings next.

This is probably a dumb question but... surely there's more than 2
committers able to review?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Daniel Farina
On Mon, Mar 26, 2012 at 1:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not.  I still wouldn't trust SIGTERMing an individual backend in a
 production database.  It'll probably work, but what if it doesn't?
 Best-case scenario is you'll need to do a panic shutdown to clear the
 stuck lock or whatever that the backend left behind.  (Once you've
 diagnosed the problem, that is.)  Now, in a case where the alternative
 is a database shutdown anyway, you might as well try it.  But it's the
 kind of tool you only want to hand to responsible adults, which is why
 it's superuser-only at the moment.  I'm not sure we should be
 encouraging people to fire that weapon indiscriminately.

Okay, it was my precise intention to hand this to users so that not
only could they cancel their queries, but also force the transaction
to be aborted and the connection to be closed in case there is a
client run amok.  Is there a good injection site -- perhaps
immediately after query cancellation -- where we can put in a
rollback-and-disconnect behavior?

Given this information, my understanding is that even the superuser is
coerced into undertaking an action that is unnecessarily dangerous if
they think the backend would respond to cancellation, but they also
wish to close the connection.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 commitfest closure (was Command Triggers, v16)

2012-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mar 26, 2012, at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 2. I'm not sure which patches Tom is planning to look at or in what
 order, so I've been avoiding the ones he seems to be taking an
 interest in.

 Well, I think I'm definitely on the hook for the pg_stat_statements,
 pgsql_fdw, foreign table stats, and caching-stable-subexpressions
 patches, and I should look at the libpq alternate row returning
 mechanism because I suspect I was the last one to mess with that libpq
 code in any detail.

 How long will that all take?

Dunno, but surely at least a day apiece if they're to be pushed to
commit.  On the other hand, considering that none of them is actually
Ready For Committer right now, we possibly shouldn't expect that they'll
all get committed.

 Personally, I am about at the point where I'd like to punt everything
 and move on.  As nice as it would be to squeeze a few more things into
 9.2, there WILL be a 9.3.  If a few less people had submitted
 half-baked code at the last minute and a few more people had helped
 with review, we'd be done by now.

 The main reason I proposed setting a schedule a few weeks ago was that
 I was afraid the commitfest would otherwise end precisely in a we're
 tired out, we're punting everything to 9.3 moment.  Without some
 definite goal to work towards, it'll just keep stretching out until
 we've had enough.  I'd prefer it end in a more orderly fashion than
 that.  The end result will be the same, in the sense that some of the
 stuff that's still-not-ready-for-committer is going to get punted,
 but people might have a less bad taste in their mouths about why.

 Fine. What do you propose, specifically?

The end of the month is coming up.  How about we propose to close the
'fest on April 1st?  Anything that's not committable by then goes to
the 9.3 list.  If one week seems too short, how about 2 weeks?

Thom Brown t...@linux.com writes:
 This is probably a dumb question but... surely there's more than 2
 committers able to review?

Able and willing might be two different things.  Alvaro, Heikki, and
Magnus have all been looking at stuff, but I think they may be getting
burned out too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Robert Haas
On Mon, Mar 26, 2012 at 4:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not sure - perhaps we're past that worry these days?

 I'm not.  I still wouldn't trust SIGTERMing an individual backend in a
 production database.  It'll probably work, but what if it doesn't?
 Best-case scenario is you'll need to do a panic shutdown to clear the
 stuck lock or whatever that the backend left behind.  (Once you've
 diagnosed the problem, that is.)  Now, in a case where the alternative
 is a database shutdown anyway, you might as well try it.  But it's the
 kind of tool you only want to hand to responsible adults, which is why
 it's superuser-only at the moment.  I'm not sure we should be
 encouraging people to fire that weapon indiscriminately.

I don't think we should be overly afraid of bugs in this code path.  I
mean, there could very well be residual bugs, but that can be said of
anything.  Moreover, if there are bugs, I'd like to find them and fix
them rather than living forever in a state of fear.

And frankly, if we're going to pick a feature to give the hairy
eyeball, this one wouldn't make my top ten list.

I think the more important question is a policy question: do we want
it to work like this?  It seems like a policy question that ought to
be left to the DBA, but we have no policy management framework for
DBAs to configure what they do or do not wish to allow.  Still, if
we've decided it's OK to allow cancelling, I don't see any real reason
why this should be treated differently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark
On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 I have a sketch for how to handle spilling hash aggregates to disk in
 my head. I'm not sure if it's worth the amount of complexity it would
 require but I'll poke around a bit and see if it works out well.

 It'd be awfully nice if those could spill to disk.  I think that
 currently that's the only plan type where a misestimate can lead to
 hard failure rather than just slower-than-you'd-like.  Which is not
 nice considering that the estimates are necessarily just estimates.

 Could you give us a brain dump on the sketch?  I've never seen how to
 do it without unreasonable overhead.

Hm. So my original plan was dependent on adding the state-merge
function we've talked about in the past. Not all aggregate functions
necessarily can support such a function but I think all or nearly all
the builtin aggregates can. Certainly min,max, count, sum, avg,
stddev, array_agg can which are most of what people do. That would be
a function which can take two state variables and produce a new state
variable.

If you have this then you can sort and spill the hash table and start
a new hash table and keep going. When you're done you merge the hash
tables using something like heap merge but applying the state merge
function.

However now that I've started thinking about it further I think you
could solve it with less complexity by cheating in various ways. For
example if you limit the hash size to 1/2 of work_mem then you when
you reach that limit you could just stuff any tuple that doesn't match
a hash entry into a tuplesort with 1/2 of work_mem and do the regular
level break logic on the output of that.

Or if you can count on tuplesort to be a stable sort (and I think it
might be already) then you could just spill the sorted hash entries
and then switch to doing a tapesort for the rest. When you finish
merging you can read them back and do level break logic like normal.
If there is a partially computed state it will be the first one in the
equal group. If you can't count on tuplesort to be stable you could
dump the partially computed states to a different tape and do a merge
between it and the sorted output of the main data set.

The holy grail of this kind of merging of the two algorithms would be
something that keeps the hash table going and maintains an lru of hash
entries. When it grows too large it would spill the oldest partial
state. Then it would sort those states and merge them (possibly in a
single step).  That might be too complex to pull its weight given some
of the above sketches are probably simple enough.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Tom Lane
Daniel Farina dan...@heroku.com writes:
 On Mon, Mar 26, 2012 at 1:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not.  I still wouldn't trust SIGTERMing an individual backend in a
 production database.

 Okay, it was my precise intention to hand this to users so that not
 only could they cancel their queries, but also force the transaction
 to be aborted and the connection to be closed in case there is a
 client run amok.  Is there a good injection site -- perhaps
 immediately after query cancellation -- where we can put in a
 rollback-and-disconnect behavior?

 Given this information, my understanding is that even the superuser is
 coerced into undertaking an action that is unnecessarily dangerous if
 they think the backend would respond to cancellation, but they also
 wish to close the connection.

Hrm, I think we're talking at cross-purposes here.

Me: This mechanism hasn't been tested enough, and may still have nasty bugs.

You: Then let's invent some entirely new mechanism.

I'm not seeing how that responds to the concern.

The original rationale for exposing pg_terminate_backend as a
superuser-only function was, as Magnus said, to take a baby step toward
providing the functionality generally.  At this point we can either
think of another baby step, or cross our fingers and open the
floodgates.  I'm just expressing discomfort with the second option.
I have to admit that I don't have a concrete proposal for a second baby
step (IOW an intermediate level of functionality that might provide
scope for more testing while not opening it up all the way yet).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Could you give us a brain dump on the sketch?  I've never seen how to
 do it without unreasonable overhead.

 Hm. So my original plan was dependent on adding the state-merge
 function we've talked about in the past. Not all aggregate functions
 necessarily can support such a function but I think all or nearly all
 the builtin aggregates can. Certainly min,max, count, sum, avg,
 stddev, array_agg can which are most of what people do. That would be
 a function which can take two state variables and produce a new state
 variable.

I'd rather not invent new requirements for aggregate implementations
if we can avoid it.

 However now that I've started thinking about it further I think you
 could solve it with less complexity by cheating in various ways. For
 example if you limit the hash size to 1/2 of work_mem then you when
 you reach that limit you could just stuff any tuple that doesn't match
 a hash entry into a tuplesort with 1/2 of work_mem and do the regular
 level break logic on the output of that.

Or just start dumping such tuples into a tuplestore, while continuing to
process tuples that match the hashagg entries that are already in
existence.  Once the input is exhausted, read out the hashagg entries we
have, flush the hashagg table, start reading from the tuplestore.
Repeat as needed.

I like this idea because the only thing you give up is predictability of
the order of output of aggregated entries, which is something that a
hashagg isn't guaranteeing anyway.  In particular, we would still have a
guarantee that any one aggregate evaluation processes the matching
tuples in arrival order, which is critical for some aggregates.

The main problem I can see is that if we start to flush after work_mem
is X% full, we're essentially hoping that the state values for the
existing aggregates won't grow by more than 1-X%, which is safe for many
common aggregates but fails for some like array_agg().  Ultimately, for
ones like that, it'd probably be best to never consider hashing at all.
I guess we could invent an unsafe for hash aggregation flag for
aggregates that have unbounded state-size requirements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think the more important question is a policy question: do we want
 it to work like this?  It seems like a policy question that ought to
 be left to the DBA, but we have no policy management framework for
 DBAs to configure what they do or do not wish to allow.  Still, if
 we've decided it's OK to allow cancelling, I don't see any real reason
 why this should be treated differently.

Right now the only thing you can do to lock down pg_cancel_backend is
to make sure non-mutually-trusting users aren't given the same user ID.
Which, well, duh.  Somebody with your user ID can probably do a lot more
damage than just cancelling your queries/sessions.

I do wonder though (and am too lazy to go look) whether the
pg_cancel_backend check is a strict user ID match or it allows
member-of-role matches.  We might want to think a bit more carefully
about the implications if it's the latter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers patch v18

2012-03-26 Thread Robert Haas
On Tue, Mar 20, 2012 at 1:49 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hi,

 I guess I sent v17 a little early considering that we now already have
 v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the
 work of Andres and Tom.

 There was some spurious tags in the sgml files in v17 that I did clean
 up too.

There are spurious whitespace changes in the documentation.  Some of
these are of the following form:

- return_arr
+ return_arr

create_trigger.sgml adds a stray blank line as well.

I think that the syntax for enabling or disabling a command trigger
should not use the keyword SET.  So just:

ALTER COMMAND TRIGGER name ENABLE [ ALWAYS | REPLICA ];
ALTER COMMAND TRIGGER name DISABLE;

That way is more parallel with the existing syntax for ordinary triggers.

+  The name to give the new trigger. This must be distinct from the name
+  of any other trigger for the same table. The name cannot be
+  schema-qualified.

For the same table is a copy-and-pasteo.

-   /* Look up object address. */
+   /* Look up object address. */

Spurious diff.

I think that get_object_address_cmdtrigger should be turned into a
case within get_object_address_unqualified; I can't see a reason for
it to have its own function.

+   elog(ERROR, could not find tuple for command trigger
%u, trigOid);

The standard phrasing is cache lookup failed for command trigger %u.

+/*
+ * Functions to execute the command triggers.
+ *
+ * We call the functions that matches the command triggers definitions in
+ * alphabetical order, and give them those arguments:
+ *
+ *   command tag, text
+ *   objectId, oid
+ *   schemaname, text
+ *   objectname, text
+ *
+ */

This will not survive pgindent, unless you surround it with -- and
-.  More generally, it would be nice if you could pgindent the
whole patch and then fix anything that breaks in such a way that it
will survive the next pgindent run.

+ * if (CommandFiresTriggers(cmd)

Missing paren.

+   /* before or instead of command trigger might have cancelled
the command */

Leftovers.

@@ -169,7 +169,7 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(EXPLAIN option BUFFERS
requires ANALYZE)));
-
+
/* if the timing was not set explicitly, set default value */
es.timing = (timing_set) ? es.timing : es.analyze;

Spurious diff.

All the changes to ruleutils.c appear spurious.  Ditto the change to
src/test/regress/sql/triggers.sql.

In the pg_dump support, it seems you're going to try to execute an
empty query if this is run against a pre-9.2 server.  It seems like
you should just return, or something like that.  What do we do in
comparable cases elsewhere?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-26 Thread Greg Stark
On Tue, Mar 27, 2012 at 12:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hrm, I think we're talking at cross-purposes here.

 Me: This mechanism hasn't been tested enough, and may still have nasty bugs.

 You: Then let's invent some entirely new mechanism.

 I'm not seeing how that responds to the concern.

I assume the intention was that the entirely new mechanism would be
a less risky one.

I may be forgetting something obvious here but is there even a
function to send an interrupt signal? That would trigger the same
behaviour that a user hitting C-c would trigger which would only be
handled at the next CHECK_FOR_INTERRUPTS which seems like it would be
non-controversial and iirc we don't currently have a function to do
this for other connections the user may have if he doesn't have access
to the original terminal and doesn't have raw shell access to run
arbitrary commands.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-26 Thread Kyotaro HORIGUCHI
I'm sorry to have coded a silly bug.

The previous patch has a bug in realloc size calculation.
And separation of the 'connname patch' was incomplete in regtest.
It is fixed in this patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..4de28ef 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char* valbuf;
+	int valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
    int2vector *pkattnums_arg, int32 pknumatts_arg,
    int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -111,6 +127,9 @@ typedef struct remoteConnHashEnt
 /* initial number of connection hashes */
 #define NUMCONN 16
 
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+
 /* general utility */
 #define xpfree(var_) \
 	do { \
@@ -503,6 +522,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +579,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(buf, FETCH %d FROM %s, howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res-result retuned by PQexec below
+	 */
+	initStoreInfo(storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK 
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+	(errcode(ERRCODE_DATATYPE_MISMATCH),
+	 errmsg(remote query result rowtype does not match 
+			the specified FROM clause rowtype)));
+		}
+
 		dblink_res_error(conname, res, could not fetch from cursor, fail);
 		return (Datum) 0;
 	}
@@ -579,8 +635,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_CURSOR_NAME),
  errmsg(cursor \%s\ does not exist, curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +696,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -660,6 +717,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		{
 			/* text,text,bool */
 			DBLINK_GET_CONN;
+			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			fail = PG_GETARG_BOOL(2);
 		}
@@ -715,164 +773,234 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo-setResult = NULL;
 	rsinfo-setDesc = NULL;
 
-	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res-result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, storeinfo);
+
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		/* synchronous query, or async result retrieval */
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res 

[HACKERS] Can param evaluation use same estate as executor?

2012-03-26 Thread limaozeng
 Hi,
now I find that params for execute statement are evaluated before 
PortalRun, using another temporary estate. Is it necessary? Or can we evaluate 
params during executions, that is, in ExecEvalParam function? then we can use 
the same estate as one for executor, which can save time to initialize another 
estate. Looking forward your reply. Thx.

Best Regards
3.26


Re: [HACKERS] Command Triggers patch v18

2012-03-26 Thread Robert Haas
On Mon, Mar 26, 2012 at 9:11 PM, Robert Haas robertmh...@gmail.com wrote:
 [ various trivial issues ]

OK, now I got that out of my system.  Now on to bigger topics.

I am extremely concerned about the way in which this patch arranges to
invoke command triggers.  You've got call sites spattered all over the
place, and I think that's going to be, basically, an unfixable mess
and a breeding ground for bugs.  On a first read-through:

1. BEFORE ALTER TABLE triggers are fired in AlterTable().  However, an
ALTER TABLE statement does not necessarily call AlterTable() once and
only once.  The real top-level logic for AlterTable is in
ProcessUtility(), which runs transformAlterTableStmt() to generate a
list of commands and then either calls AlterTable() or recursively
invokes ProcessUtility() for each one.  This means that if IF EXISTS
is used and the table does not exist, then BEFORE command triggers
won't get invoked at all.  On the other hand, if multiple commands are
specified, then I think AlterTable() may get invoked either once or
more than once, depending on exactly which commands are specified; and
we might manage to fire some CREATE INDEX command triggers or whatnot
as well, again depending on exactly what that ALTER TABLE command is
doing.

2. BEFORE CREATE TABLE triggers seem to have similar issues; see
transformCreateStmt.

rhaas=# create table foo (a serial primary key);
NOTICE:  CREATE TABLE will create implicit sequence foo_a_seq for
serial column foo.a
NOTICE:  snitch: BEFORE CREATE SEQUENCE public.foo_a_seq [NULL]
NOTICE:  snitch: AFTER CREATE SEQUENCE public.foo_a_seq [16392]
NOTICE:  snitch: BEFORE CREATE TABLE public.foo [NULL]
NOTICE:  snitch: BEFORE CREATE INDEX public.foo_pkey [NULL]
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
foo_pkey for table foo
NOTICE:  snitch: AFTER CREATE INDEX public.foo_pkey [16398]
NOTICE:  snitch: BEFORE ALTER SEQUENCE public.foo_a_seq [16392]
NOTICE:  snitch: AFTER ALTER SEQUENCE public.foo_a_seq [16392]
NOTICE:  snitch: AFTER CREATE TABLE public.foo [16394]
CREATE TABLE

3. RemoveRelations() and RemoveObjects() similarly take the position
that when the object does not exist, command triggers need not fire.
This seems entirely arbitrary.  CREATE EXTENSION IF NOT EXISTS,
however, takes the opposite (and probably correct) position that even
if we decide not to create the extension, we should still fire command
triggers.  In a similar vein, AlterFunctionOwner_oid() skips firing
the command triggers if the old and new owners happen to be the same,
but other forms of ALTER FUNCTION (e.g. ALTER FUNCTION .. COST) fire
triggers even if the old and new parameters are the same; and
AlterForeignDataWrapperOwner_internal does NOT skip firing command
triggers just because the old and new owners are the same.

4. RemoveRelations() and RemoveObjects() also take the position that a
statement like DROP TABLE foo, bar should fire each relevant BEFORE
command trigger twice, then drop both objects, then fire each relevant
AFTER command trigger twice.  I think that's wrong.  It's 100% clear
that the user executed one, and only one, command.  The only real
argument for it is that if we were to only fire command triggers once,
we wouldn't be able to pass identifying information about the objects
being dropped, since the argument-passing mechanism only has room to
pass details about a single object.  I think that means that the
argument-passing mechanism needs improvement, not that we should
redefine one command as two commands.  Something like CREATE SCHEMA
foo CREATE VIEW bar ... has the same problem: the user only entered
one command, but we're firing command triggers as if they entered
multiple commands.  This case is possibly more defensible than the
other one, but note that the two aren't consistent with each other as
regards the order of trigger firing, and I actually think that they're
both wrong, and that only the toplevel command should fire triggers.

5. It seems desirable for BEFORE command triggers to fire at a
consistent point during command execution, but currently they don't.
For example, BEFORE DROP VIEW triggers don't fire until we've verified
that q exists, is a view, and that we have permission to drop it, but
LOAD triggers fire much earlier, before we've really checked anything
at all.  And ALTER TABLE is somewhere in the middle: we fire the
BEFORE trigger after checking permissions on the main table, but
before all permissions checks are done, viz:

rhaas= alter table p add foreign key (a) references p2 (a);
NOTICE:  snitch: BEFORE ALTER TABLE public.p [16418]
ERROR:  permission denied for relation p2

6. Another pretty hideous aspect of the CREATE TABLE behavior is that
AFTER triggers are fired from a completely different place than BEFORE
triggers.   It's not this patch's fault that our CREATE TABLE and
ALTER TABLE code is a Rube Goldberg machine, but finding some place to
jam each trigger invocation in that happens to (mostly) work as the

Re: [HACKERS] 9.2 commitfest closure (was Command Triggers, v16)

2012-03-26 Thread Magnus Hagander
On Tue, Mar 27, 2012 at 01:39, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Thom Brown t...@linux.com writes:
 This is probably a dumb question but... surely there's more than 2
 committers able to review?

 Able and willing might be two different things.  Alvaro, Heikki, and
 Magnus have all been looking at stuff, but I think they may be getting
 burned out too.

Can't honestly claim I've been burned out by it - more that I feel bad
fo rnot having done my part :-O

It's not the CF itself, but a whole lot of other things (non-postgres)
that have been taking my time.

I'll try to get my head around the pg_stat_bgwriter patch - which I
htink actually has the wrong status on the CF page after talking to
Greg about it tonight.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers