Re: [HACKERS] Collect frequency statistics for arrays

2012-01-17 Thread Alexander Korotkov
Hi!

Thanks for your fixes to the patch. Them looks correct to me. I did some
fixes in the patch. The proof of some concepts is still needed. I'm going
to provide it in a few days.

On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch n...@leadboat.com wrote:

  I'm not sure about shared lossy counting module, because part of shared
  code would be relatively small. Part of compute_array_stats function
 which
  is taking care about array decompression, distinct occurence calculation,
  disting element count histogram, packing statistics slots etc is much
  larger than lossy counting algorithm itself. May be, there is some other
  opinions in community?

 True; it would probably increase total lines of code.  The benefit, if any,
 lies in separation of concerns; the business of implementing this
 algorithm is
 quite different from the other roles of these typanalyze functions.  I
 won't
 insist that you try it, though.

I'd prefer to try it as separate patch.


+ /*
+  * The probability of no occurence of events which
 forms
  rest
+  * probability have a limit of exp(-rest) when number
 of
  events fo to
+  * infinity. Another simplification is to replace that
  events with one
+  * event with (1 - exp(-rest)) probability.
+  */
+ rest = 1.0f - exp(-rest);
  
   What is the name of the underlying concept in probability theory?
 
  The most closest concept to caculated distribution is multinomial
  distribution. But it's not exactly same, because multinomial distribution
  gives probability of particular count of each event occurece, not
  probability of summary occurence. Actually, distribution is caclulated
 just
  from assumption of events independence. The most closest concept of rest
  probability is approximation by exponential distribution. It's quite
 rough
  approximation, but I can't invent something better with low calculation
  complexity.

 Do you have a URL of a tutorial or paper that explains the method in more
 detail?  If, rather, this is a novel synthesis, could you write a proof to
 include in the comments?

Unfortunately I don't have relevant paper for it. I'll try to search
it. Otherwise I'll try to do some proof.


 + /*
+  * Array selectivity estimation based on most common elements
  statistics for
+  * column @ const case. Assumption that element occurences are
  independent
+  * means certain distribution of array lengths. Typically real
  distribution
+  * of lengths is significantly different from it. For example, if
  even we
+  * have set of arrays with 1 integer element in range [0;10] each,
  element
+  * occurences are not independent. Because in the case of
  independence we
  
   Do you refer to a column where '{1,12,46}' and '{13,7}' may appear, but
   '{6,19,4}' cannot appear?
 
  I refer column where only one element exists, i.e. only possible values
 are
  '{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}', '{7}', '{8}', '{9}',
  '{10}'. That is a corner case. But similar situation occurs when, for
  example, we've distribution of distinct element count between 1 and 3. It
  significantly differs from distribution from independent occurence.

 Oh, I think I see now.  If each element 1..10 had frequency 0.1
 independently,
 column values would have exactly one distinct element just 39% of the time?

Yes, it's right.


 If probability theory has a prototypical problem resembling this, it would
 be
 nice to include a URL to a thorough discussion thereof.  I could not think
 of
 the search terms to find one, though.

Actually, usage of both distinct element count histogram and element
occurrence frequencies produce some probability distribution (which is more
complex than just independent element occurrence). If real distribution is
close this distribution, then estimate is accurate. I didn't met such
distributions is papers, actually I've just invented it in my tries to do
accurate column @ const estimation at least in simple cases. I'll try to
search for similar things in papers, but I doubt I'll succeed. Otherwise
I'll try to do some more detailed proof.


  *** /dev/null
  --- b/src/backend/utils/adt/array_selfuncs.c

  + Selectivity
  + calc_scalararraysel(VariableStatData *vardata, Datum constval, bool
 orClause,
  + Oid operator)
  + {
  + Oid elemtype;
  + Selectivity selec;
  + TypeCacheEntry *typentry;
  + Datum  *hist;
  + int nhist;
  + FunctionCallInfoData cmpfunc;
  +
  + elemtype = get_base_element_type(vardata-vartype);
  +
  +
  + /* Get default comparison function */
  + typentry = lookup_type_cache(elemtype,
  +TYPECACHE_CMP_PROC | TYPECACHE_CMP_PROC_FINFO |
 TYPECACHE_EQ_OPR);
  +
  + /* Handle only = operator. Return default selectivity in other
 cases. */
  + if (operator != 

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-17 Thread Kyotaro HORIGUCHI
Hello,  This is revised and rebased version of the patch.

a. Old term `Add Tuple Function' is changed to 'Store
   Handler'. The reason why not `storage' is simply length of the
   symbols.

b. I couldn't find the place to settle PGgetAsCString() in. It is
   removed and storeHandler()@dblink.c touches PGresAttValue
   directly in this new patch. Definition of PGresAttValue stays
   in lipq-fe.h and provided with comment.

c. Refine error handling of dblink.c. I think it preserves the
   previous behavior for column number mismatch and type
   conversion exception.

d. Document is revised.

 It jumped from 332K tuples/sec to 450K, a 35% gain, and had a
 lower memory footprint too.  Test methodology and those results
 are at
 http://archives.postgresql.org/pgsql-hackers/2011-12/msg8.php

It is a disappointment that I found that the gain had become
lower than that according to the re-measuring.

For CentOS6.2 and other conditions are the same to the previous
testing, the overall performance became hihger and the loss of
libpq patch was 1.8% and the gain of full patch had been fallen
to 5.6%. But the reduction of the memory usage was not changed.

Original : 3.96s  100.0%
w/libpq patch: 4.03s  101.8%
w/libpq+dblink patch : 3.74s   94.4%


The attachments are listed below.

libpq_altstore_20120117.patch
  - Allow alternative storage for libpql.

dblink_perf_20120117.patch
  - Modify dblink to use alternative storage mechanism.
 
libpq_altstore_doc_20120117.patch
  - Document for libpq_altstore. Shows in 31.19. Alternatie result storage


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..83525e1 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams  157
 PQping158
 PQpingParams  159
 PQlibVersion  160
+PQregisterStoreHandler	  161
+PQgetStoreHandlerParam	  163
+PQsetStoreHandlerErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..5559f0b 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,7 @@ makeEmptyPGconn(void)
 	conn-allow_ssl_try = true;
 	conn-wait_ssl_try = false;
 #endif
+	conn-storeHandler = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5076,3 +5077,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterStoreHandler(PGconn *conn, StoreHandler func, void *param)
+{
+	conn-storeHandler = func;
+	conn-storeHandlerParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..96e5974 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -67,6 +67,10 @@ static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
 
+static void *pqDefaultStoreHandler(PGresult *res, PQStoreFunc func,
+   int id, size_t len);
+static void *pqAddTuple(PGresult *res, PGresAttValue *tup);
+
 
 /* 
  * Space management for PGresult.
@@ -160,6 +164,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result-curBlock = NULL;
 	result-curOffset = 0;
 	result-spaceLeft = 0;
+	result-storeHandler = pqDefaultStoreHandler;
+	result-storeHandlerParam = NULL;
+	result-storeHandlerErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +201,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result-nEvents = conn-nEvents;
 		}
+
+		if (conn-storeHandler)
+		{
+			result-storeHandler = conn-storeHandler;
+			result-storeHandlerParam = conn-storeHandlerParam;
+		}
 	}
 	else
 	{
@@ -487,6 +500,33 @@ PQresultAlloc(PGresult *res, size_t nBytes)
 	return pqResultAlloc(res, nBytes, TRUE);
 }
 
+void *
+pqDefaultStoreHandler(PGresult *res, PQStoreFunc func, int id, size_t len)
+{
+	void *p;
+
+	switch (func)
+	{
+		case PQSF_ALLOC_TEXT:
+			return pqResultAlloc(res, len, TRUE);
+
+		case PQSF_ALLOC_BINARY:
+			p = pqResultAlloc(res, len, FALSE);
+
+			if (id == -1)
+res-storeHandlerParam = p;
+
+			return p;
+
+		case PQSF_ADD_TUPLE:
+			return pqAddTuple(res, res-storeHandlerParam);
+
+		default:
+			/* Ignore */
+			break;
+	}
+	return NULL;
+}
 /*
  * pqResultAlloc -
  *		Allocate subsidiary storage for a PGresult.
@@ -830,9 +870,9 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns TRUE if OK, FALSE if not enough memory to add the row
+ *	  Returns tup if OK, NULL if not enough memory to add the row.
  */
-int
+static void *
 pqAddTuple(PGresult *res, PGresAttValue *tup)
 {
 	if (res-ntups = res-tupArrSize)
@@ -858,13 +898,13 @@ pqAddTuple(PGresult *res, 

Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-17 Thread Noah Misch
On Sun, Jan 15, 2012 at 01:49:54AM -0300, Alvaro Herrera wrote:
 - I'm not sure that the multixact truncation code is sane on
 checkpoints.  It might be that I need to tweak more the pg_control info
 we keep about truncation.  The whole truncation thing needs more
 testing, too.

My largest outstanding concern involves the possibility of MultiXactId
wraparound.  From my last review:

This raises a notable formal hazard: it's possible to burn through the
MultiXactId space faster than the regular TransactionId space.  We 
could get
into a situation where pg_clog is covering 2B xids, and yet we need 4B
MultiXactId to cover that period.  We had better at least notice this 
and
halt, if not have autovacuum actively prevent it.

(That should have been 2B rather than 4B, since MultiXactId uses the same
2B-in-past, 2B-in-future behavior as regular xids.)

Are we willing to guess that this will never happen and make recovery
minimally possible?  If so, we could have GetNewMultiXactId() grow defenses
similar to GetNewTransactionId() and leave it at that.  If not, we need to
involve autovacuum.


The other remaining high-level thing is to have key_attrs contain only columns
actually referenced by FKs.

 - Go over Noah's two reviews again and see if I missed anything; also
 make sure I haven't missed anything from other reviewers.

There are some, yes.

 *** a/src/backend/access/heap/heapam.c
 --- b/src/backend/access/heap/heapam.c

 ***
 *** 2773,2783  l2:
   }
   else if (result == HeapTupleBeingUpdated  wait)
   {
 ! TransactionId xwait;
   uint16  infomask;
   
   /* must copy state data before unlocking buffer */
 ! xwait = HeapTupleHeaderGetXmax(oldtup.t_data);
   infomask = oldtup.t_data-t_infomask;
   
   LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 --- 2848,2871 
   }
   else if (result == HeapTupleBeingUpdated  wait)
   {
 ! TransactionId   xwait;
   uint16  infomask;
 + boolnone_remain = false;

Nothing can ever set this variable to anything different.  It seems that
keep_xact == InvalidTransactionId substitutes well enough, though.


   /*
* We may overwrite if previous xmax aborted, or if it 
 committed but
 !  * only locked the tuple without updating it, or if we are 
 going to
 !  * keep it around in Xmax.
*/

The second possibility is just a subset of the third.

 ! if (TransactionIdIsValid(keep_xmax) ||
 ! none_remain ||
 ! (oldtup.t_data-t_infomask  HEAP_XMAX_INVALID))
   result = HeapTupleMayBeUpdated;
   else
   result = HeapTupleUpdated;

 ***
 *** 3314,3323  heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
*/
   static bool
   HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs,
 !HeapTuple oldtup, HeapTuple newtup)
   {
   int attrnum;
   
   while ((attrnum = bms_first_member(hot_attrs)) = 0)
   {
   /* Adjust for system attributes */
 --- 3537,3549 
*/
   static bool
   HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs,
 !HeapTuple oldtup, HeapTuple newtup, 
 bool empty_okay)
   {
   int attrnum;
   
 + if (!empty_okay  bms_is_empty(hot_attrs))
 + return false;

When a table contains no key attributes, it seems arbitrary whether we call
the key revoked or not.  What is the motivation for this change?

 ! /*
 !  * If we're requesting KeyShare, and there's no update present, 
 we
 !  * don't need to wait.  Even if there is an update, we can still
 !  * continue if the key hasn't been modified.
 !  *
 !  * However, if there are updates, we need to walk the update 
 chain
 !  * to mark future versions of the row as locked, too.  That 
 way, if
 !  * somebody deletes that future version, we're protected 
 against the
 !  * key going away.  This locking of future versions could block
 !  * momentarily, if a concurrent transaction is deleting a key; 
 or it
 !  * could return a value to the effect that the transaction 
 deleting the
 !  * key has already committed.  So we do this before re-locking 
 the
 !  * buffer; otherwise this would be prone to deadlocks.  Note 
 that the TID
 !  * we're locking was grabbed before we unlocked the buffer.  
 For it to
 !  * change while we're not looking, the other properties we're 
 testing
 !  * for below after re-locking the buffer would also change, in 
 which
 !  

Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-17 Thread Noah Misch
On Mon, Jan 16, 2012 at 04:52:36PM -0300, Alvaro Herrera wrote:
 Excerpts from Heikki Linnakangas's message of lun ene 16 16:17:42 -0300 2012:
  On 15.01.2012 06:49, Alvaro Herrera wrote:
   --- 164,178 
 #define HEAP_HASVARWIDTH0x0002/* has variable-width 
   attribute(s) */
 #define HEAP_HASEXTERNAL0x0004/* has external stored 
   attribute(s) */
 #define HEAP_HASOID0x0008/* has an object-id field 
   */
   ! #define HEAP_XMAX_KEYSHR_LOCK0x0010/* xmax is a key-shared 
   locker */
 #define HEAP_COMBOCID0x0020/* t_cid is a combo cid */
 #define HEAP_XMAX_EXCL_LOCK0x0040/* xmax is exclusive 
   locker */
   ! #define HEAP_XMAX_IS_NOT_UPDATE0x0080/* xmax, if valid, is only 
   a locker.
   !  * Note this is not set unless
   !  * XMAX_IS_MULTI is also set. */
   !
   ! #define HEAP_LOCK_BITS(HEAP_XMAX_EXCL_LOCK | 
   HEAP_XMAX_IS_NOT_UPDATE | \
   !  HEAP_XMAX_KEYSHR_LOCK)
 #define HEAP_XMIN_COMMITTED0x0100/* t_xmin committed */
 #define HEAP_XMIN_INVALID0x0200/* t_xmin invalid/aborted */
 #define HEAP_XMAX_COMMITTED0x0400/* t_xmax committed */
  
  HEAP_XMAX_IS_NOT_UPDATE is a pretty opaque name for that. From the name, 
  I'd say that a DELETE would set that, but the comment says it has to do 
  with locking. After going through all the combinations in my mind, I 
  think I finally understood it: HEAP_XMAX_IS_NOT_UPDATE is set if xmax is 
  a multi-xact, that represent only locking xids, no updates. How about 
  calling it HEAP_XMAX_LOCK_ONLY, and setting it whenever whether or not 
  xmax is a multi-xid?
 
 Hm, sounds like a good idea.  Will do.

+1

  Why are you renaming HeapTupleHeaderGetXmax() into 
  HeapTupleHeaderGetRawXmax()? Any current callers of 
  HeapTupleHeaderGetXmax() should already check that HEAP_XMAX_IS_MULTI is 
  not set.
 
 I had this vague impression that it'd be better to break existing
 callers so that they ensure they decide between
 HeapTupleHeaderGetRawXmax and HeapTupleHeaderGetUpdateXid.  Noah
 suggested changing the macro name, too.  It's up to each caller to
 decide what's the sematics they want.  Most want the latter; and callers
 outside core are more likely to want that one.  If we kept the old name,
 they would get the wrong value.

My previous suggestion was to have both macros:

#define HeapTupleHeaderGetXmax(tup) \
( \
AssertMacro(!((tup)-t_infomask  HEAP_XMAX_IS_MULTI)), \
HeapTupleHeaderGetRawXmax(tup) \
)

#define HeapTupleHeaderGetRawXmax(tup) \
( \
(tup)-t_choice.t_heap.t_xmax \
)

No strong preference, though.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-17 Thread Heikki Linnakangas

On 04.01.2012 17:05, Peter Geoghegan wrote:

On 4 January 2012 07:24, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I think SetBufferCommitInfoNeedsSave() needs the same treatment as
MarkBufferDirty(). And it would probably be good to only set the latch if
the buffer wasn't dirty already. Setting a latch that's already set is fast,
but surely it's even faster to not even try.


That seems reasonable. Revised patch is attached.


Thanks! It occurs to me that it's still a bad idea to call SetLatch() 
while holding the buffer header spinlock. At least when it's trivial to 
just move the call after releasing the lock. See attached.


Could you do the sleeping/hibernating logic all in BgWriterNap()?


Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a bit
worried about the impact on systems with a lot of CPUs. If you have a lot of
CPUs writing to the same cache line that contains the latch's flag, that
might get expensive.


Also reasonable, but I don't think that I'll get around to it until
after the final commitfest deadline.


That's still pending. And I admit I haven't tested my updated version 
besides make check.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***
*** 51,56 
--- 51,58 
  #include storage/ipc.h
  #include storage/lwlock.h
  #include storage/pmsignal.h
+ #include storage/proc.h
+ #include storage/procsignal.h
  #include storage/shmem.h
  #include storage/smgr.h
  #include storage/spin.h
***
*** 73,83  static volatile sig_atomic_t shutdown_requested = false;
  /*
   * Private state
   */
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
  
! static void BgWriterNap(void);
  
  /* Signal handlers */
  
--- 75,87 
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			1
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
  
! static void BgWriterNap(bool hibernating);
  
  /* Signal handlers */
  
***
*** 97,102  BackgroundWriterMain(void)
--- 101,107 
  {
  	sigjmp_buf	local_sigjmp_buf;
  	MemoryContext bgwriter_context;
+ 	bool		hibernating;
  
  	am_bg_writer = true;
  
***
*** 123,129  BackgroundWriterMain(void)
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 128,134 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
***
*** 139,144  BackgroundWriterMain(void)
--- 144,155 
  	sigdelset(BlockSig, SIGQUIT);
  
  	/*
+ 	 * Advertise our latch that backends can use to wake us up while we're
+ 	 * sleeping.
+ 	 */
+ 	ProcGlobal-bgwriterLatch = MyProc-procLatch;
+ 
+ 	/*
  	 * Create a resource owner to keep track of our resources (currently only
  	 * buffer pins).
  	 */
***
*** 235,242  BackgroundWriterMain(void)
--- 246,256 
  	/*
  	 * Loop forever
  	 */
+ 	hibernating = false;
  	for (;;)
  	{
+ 		bool lapped;
+ 
  		/*
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
***
*** 264,281  BackgroundWriterMain(void)
  		/*
  		 * Do one cycle of dirty-buffer writing.
  		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
! BgWriterNap(void)
  {
  	long		udelay;
  
--- 278,333 
  		/*
  		 * Do one cycle of dirty-buffer writing.
  		 */
! 		lapped = BgBufferSync();
! 
! 		if (lapped  !hibernating)
! 		{
! 			/*
! 			 * Initiate hibernation by arming the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively returns having found that the latch is
! 			 * already set. For write-heavy workloads, it's quite possible that
! 			 * those calls will never actually get the opportunity to really
! 			 * set the latch.
! 			 *
! 			 * By only giving backends the opportunity to really set the
! 			 * latch at this point via ResetLatch(), we avoid the overhead of
! 			 * sending a signal perhaps as much 

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-17 Thread Noah Misch
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
 Thanks for your fixes to the patch. Them looks correct to me. I did some
 fixes in the patch. The proof of some concepts is still needed. I'm going
 to provide it in a few days.

Your further fixes look good.  Could you also answer my question about the
header comment of mcelem_array_contained_selec()?

/*
 * Estimate selectivity of column @ const based on most common element
 * statistics.  Independent element occurrence would imply a particular
 * distribution of distinct element counts among matching rows.  Real data
 * usually falsifies that assumption.  For example, in a set of 1-element
 * integer arrays having elements in the range [0;10], element occurrences are
 * not independent.  If they were, a sufficiently-large set would include all
 * distinct element counts 0 through 11.  We correct for this using the
 * histogram of distinct element counts.
 *
 * In the column @ const and column  const cases, we usually have
 * const with low summary frequency of elements (otherwise we have
 * selectivity close to 0 or 1 correspondingly).  That's why the effect of
 * dependence related to distinct element counts distribution is negligible
 * there.  In the column @ const case, summary frequency of elements is
 * high (otherwise we have selectivity close to 0).  That's why we should do
 * correction due to array distinct element counts distribution.
 */

By summary frequency of elements, do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for column  const and column @
const, but not for column @ const.  For column @ const, selectivity
cannot exceed the smallest frequency among const elements.  A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Restore process during recovery

2012-01-17 Thread Simon Riggs
On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
 WALRestore process asynchronously executes restore_command while
 recovery continues working.

 Overlaps downloading of next WAL file to reduce time delays in file
 based archive recovery.

 Handles cases of file-only and streaming/file correctly.

 Though I've not reviewed the patch deeply yet, I observed the following
 two problems when I tested the patch.

 When I set up streaming replication + archive (i.e., restore_command is set)
 and started the standby, I got the following error:

    FATAL:  all AuxiliaryProcs are in use
    LOG:  walrestore process (PID 18839) exited with exit code 1

 When I started an archive recovery without setting restore_command,
 it successfully finished.

Oh, I did have NUM_AUXILIARY_PROCS increased at one point, but I
realised it wasn't needed and removed it. Will change that. Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM in SP-GiST

2012-01-17 Thread YAMAMOTO Takashi
hi,

 I started reading the spgist vacuum code last night, and didn't like it
 at all.  I found a number of smaller issues, but it seems to me that
 the design is just fundamentally wrong.  Unless I'm misunderstanding it,
 the approach is to recursively traverse the tree in sort of the same way
 that a full-index search would do, except that it continues to hold lock
 on the immediate parent page of whichever page is currently being
 visited, so that it can update the downlink if it has to delete the
 first leaf tuple of a chain.  I've got several complaints about that:
 
 1. Holding exclusive locks on upper pages while we visit
 possibly-hundreds of leaf pages seems pretty bad from a concurrency
 standpoint.  It doesn't hold locks all the way up to the root, so it's
 not a complete loss, but even a first-level inner page can have a lot of
 children.
 
 2. I do not trust this code to visit every leaf page (which, if it
 failed to do so and thereby missed deleting a heap reference, would
 result in a silently corrupt index).  Unlike a regular index search,
 which makes a list of lower-level targets while examining an inner tuple
 just once, this code depends on the assumption that it can reacquire
 lock on an upper page and then resychronize itself with whatever's been
 done meanwhile to the inner tuple it was looking at.  I don't trust
 that.  The mechanism that's being used relies on page LSN having been
 changed if anything was changed on the page, which does not work in
 unlogged tables.  Furthermore, if the inner tuple did change, it has to
 rescan *everything* that the inner tuple leads to.  That's horrendously
 inefficient, and it's not even clear that it would ever terminate in the
 face of a constant stream of concurrent updates.
 
 3. Even if it all worked, it would be slow as can be, because it
 requires a whole lot of nonsequential accesses to the index.  For
 instance, the same leaf page might be visited many times (once for
 each leaf chain on the page), and not necessarily close together either.
 
 Also, the code doesn't seem to perform any cleanup at all on inner
 pages.  I am not expecting it to try to get rid of childless inner
 tuples, but surely it ought to at least convert old redirects to dead
 and get rid of dead tuples at the end of the page, much as for leaf
 pages?
 
 What I'm envisioning doing instead is making a single sequential scan
 over the entire index.  On both leaf and inner pages we could clean
 redirects and remove end dead tuples.  On leaf pages we'd also check for
 tuples to be deleted.  There's no real difficulty in removing deleted
 tuples as long as they are not at the heads of their chains.  (The code
 would have to reverse-engineer which tuples are at the heads of their
 chains, but that's easily done while scanning the page; we just make a
 reverse-lookup map for the nextOffset pointers.)  If we have to delete a
 tuple at the head of its chain, but there's still at least one live
 tuple in its chain, we could reshuffle the tuples to bring another live
 tuple to that same offset number, thereby not invalidating the parent
 downlink.  The only hard part is what to do when we delete all members
 of a chain: we can't reset the parent downlink because we do not know
 where it is.  What I propose to do in that case is install a variant
 form of dead tuple that just indicates this chain is empty.  One way
 to represent that would be a redirect tuple pointing nowhere, but I
 think it would be cleaner to repurpose the isDead and isRedirect bits as
 a two-bit field with four states.  We'd have LIVE, DEAD, REDIRECT, and
 this fourth state for a dead tuple that is not recyclable because we
 know there's a link to it somewhere.  A scan would ignore such a tuple.
 An insertion that arrives at such a tuple could either replace it (if
 there's enough room on the page) or convert it to a redirect (if not).
 
 Last night I was thinking the fourth state could be named TOMBSTONE,
 but maybe it'd be better to use DEAD for this case and rename the
 existing removable dead tuple state to PLACEHOLDER, since that case
 only exists to preserve the offset numbers of other tuples on the
 page.
 
 Comments?

with a concurrent split moving leaf tuples between pages, can't it make
bulkdelete fail reporting some TIDs and make CREATE INDEX CONCURRENTLY
create duplicates?

YAMAMOTO Takashi

 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Alvaro Herrera

Excerpts from Greg Smith's message of lun ene 16 19:25:50 -0300 2012:
 On 01/16/2012 03:48 PM, Josh Berkus wrote:

  I'll also point out that the process for *applying* a patch, if you
  don't subscribe to hackers and keep archives around on your personal
  machine for months, is also very cumbersome and error-prone.  Copy and
  paste from a web page?  Really?
 
 The most reasonable answer to this is for people to publish a git repo 
 URL in addition to the official submission of changes to the list in 
 patch form.

It's expected that we'll get a more reasonable interface to attachments,
one that will allow you to download patches separately.  (Currently,
attachments that have mime types other than text/plain are already
downloadable separately).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generate call graphs in run-time

2012-01-17 Thread Joel Jacobson
On Mon, Jan 16, 2012 at 2:23 PM, Martin Pihlak martin.pih...@gmail.com wrote:
 My approach was to add parent oid to the per-backend function stats
 structure - PgStat_BackendFunctionEntry. Also, I changed the hash key
 for that structure to (oid, parent) pair. This means that within the
 backend the function usage is always tracked with the context of
 calling function. This has the nice property that you get the per-parent
 usage stats as well. Also the additional lists for parent tracking are
 avoided.

 During pgstat_report_stat() the call graph (with stats) is output
 to logs and the statistics uploaded to collector -- with the parent oid
 removed.

Since you only care about the parentfuncid in one level, it looks like
you will only be able to get a total call graph of all possible
function calls, and not each unique call graph per transaction.
If you have two separate transactions:
T1: a-b, b-c
T2: b-d
You would have two unique call graphs {a-b, b-c} and {b-d}.
The global call graph, where you only keep track of all unique
parentfuncid-funcid pairs, would be {a-b, b-c, b-d}, which lacks
the information on what different call graphs are actually being
executed per transaction.

Also, why remove the parent oid when uploading the statistics to the collector?
It would be nice to have the statistics for each function per parent,
to see where you have a bottleneck which might only be occurring in a
function when called from a specific parent.
Even more fine-grained would be to have the statistics per unique
call-graph, i.e. the entire tree of functions called in the
transactions.

 There is a patch for this and we do use it in production for occasional
 troubleshooting and dependency analysis. Can't attach immediately
 though -- it has some extra cruft in it that needs to be cleaned up.

I would highly appreciate a patch, don't worry about cleaning up, I
can do that, unless it's some code you can't share for other reasons.

 Indeed. Something like a pg_stat_user_function_details view would be
 very useful. Something along the lines of:

   Column     |  Type  |
 --++
  funcid       | oid    |
  parentfuncid | oid    | -- new
  schemaname   | name   |
  funcname     | name   |
  calls        | bigint |
  total_time   | bigint |
  self_time    | bigint |

funcid-parentfuncid might be sufficient for performance
optimizations, but to automatically generate directional graphs of all
unique call graphs in run-time, you would need all the unique pairs of
funcid-parentfuncid as a singel column, probably a sorted array of
oids[][], example: [[1,2],[1,3],[2,4],[2,5]] if the call craph would
be {1-2, 1-3, 2-4, 2-5}.


 And then rewrite pg_stat_user_functions by aggregating the detailed
 view. That'd make the individual pg_stat_get_function* functions a
 bit slower, but that is probably a non-issue - at least not if the
 pg_stat_user_functions view is rewritten to use a SRF.

 regards,
 Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-17 Thread Heikki Linnakangas

On 17.01.2012 12:16, Heikki Linnakangas wrote:

On 04.01.2012 17:05, Peter Geoghegan wrote:

On 4 January 2012 07:24, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

I think SetBufferCommitInfoNeedsSave() needs the same treatment as
MarkBufferDirty(). And it would probably be good to only set the
latch if
the buffer wasn't dirty already. Setting a latch that's already set
is fast,
but surely it's even faster to not even try.


That seems reasonable. Revised patch is attached.


Thanks! It occurs to me that it's still a bad idea to call SetLatch()
while holding the buffer header spinlock. At least when it's trivial to
just move the call after releasing the lock. See attached.

Could you do the sleeping/hibernating logic all in BgWriterNap()?


(sorry, forgot to update the above question before sending..)

In the patch I sent, I did rearrange the sleeping logic. I think it's 
more readable the way it is now.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] review: psql tab completion for GRANT role

2012-01-17 Thread Pavel Stehule
Hello

I did review of this small patch
http://archives.postgresql.org/message-id/1326055692.15293.15.ca...@vanquo.pezone.net

* This patch was applied without with one hunk and compiled without warnings

bash-4.2$ patch -p1  autocompleta.patch
patching file src/bin/psql/tab-complete.c
Hunk #2 succeeded at 2335 with fuzz 1.

* all regress tests was processed

* this patch respect Pg coding standards
* code is simple and has not impact on core functionality or core performance
* regress tests are not requested for interactive function
* I checked it, and autocomplete has expected behave.

It is ready for commit

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Susanne Ebrecht

Am 13.01.2012 22:50, schrieb Josh Berkus:

Hackers,

It occurs to me that I would find it quite personally useful if the
vacuumdb utility was multiprocess capable.

For example, just today I needed to manually analyze a database with
over 500 tables, on a server with 24 cores.   And I needed to know when
the analyze was done, because it was part of a downtime.  I had to
resort to a python script.

I'm picturing doing this in the simplest way possible: get the list of
tables and indexes, divide them by the number of processes, and give
each child process its own list.

Any reason not to hack on this for 9.3?



Hello,

I like the idea - but ...
I would prefer to have an option that the user is able to tell on how much
cores it should be shared. Something like --share-cores=N.

Default is total core number of the machine but users should be able to 
say - ok -

my machine has 24 cores but I want that vacuumdb just will use 12 of them.

Especially on startups - you are able to find machines that aren't 
database-only

machines. Often you find database and web server as single machine.

Also you could have run more cluster on same machine for offering your 
business in

different languages (one cluster per language). I already saw such a setup.

There might be side businesses on the cores - so it should be possible 
that the

users decides on how much cores he wants to share vacuumdb.

Susanne


--
Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Andres Freund
On Tuesday, January 17, 2012 01:18:53 PM Susanne Ebrecht wrote:
 I would prefer to have an option that the user is able to tell on how much
 cores it should be shared. Something like --share-cores=N.
Uhm. -j # does exactly that or am I missing your point?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Jaime Casanova
On Tue, Jan 17, 2012 at 7:23 AM, Andres Freund and...@anarazel.de wrote:
 On Tuesday, January 17, 2012 01:18:53 PM Susanne Ebrecht wrote:
 I would prefer to have an option that the user is able to tell on how much
 cores it should be shared. Something like --share-cores=N.
 Uhm. -j # does exactly that or am I missing your point?


not really.

if you have 12 cores and you say -j 12 you would have 1 process per
core, with Susanne's suggestion, AFAIUI, you can say -j 12
--shared-cores=6... so you would only use 6 cores of the 12 and have 2
processes per core

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-17 Thread Peter Geoghegan
On 17 January 2012 11:24, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 In the patch I sent, I did rearrange the sleeping logic. I think it's more
 readable the way it is now.

I have no objection to either your refinement of the sleeping logic,
nor that you moved some things in both the existing code and my patch
so that they occur when no spinlock is held.

Should I proceed with a benchmark on V3, so that we can get this
committed? I imagine that a long pgbench-tools run is appropriate,
(after all, it was used to justify the re-write of the BGWriter for
8.3) at various scale factors, from smallish to quite large.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-17 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Anyway, I'm hoping to keep hacking at this for a couple more days before
 the CF gets into full swing, and hopefully arrive at something committable
 for 9.2.  I'd really like to get this fixed in this cycle, since it's
 been a problem for several releases now.

 Comments?

Go Tom go ! :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Andrew Dunstan



On 01/17/2012 07:33 AM, Jaime Casanova wrote:

On Tue, Jan 17, 2012 at 7:23 AM, Andres Freundand...@anarazel.de  wrote:

On Tuesday, January 17, 2012 01:18:53 PM Susanne Ebrecht wrote:

I would prefer to have an option that the user is able to tell on how much
cores it should be shared. Something like --share-cores=N.

Uhm. -j # does exactly that or am I missing your point?


not really.

if you have 12 cores and you say -j 12 you would have 1 process per
core, with Susanne's suggestion, AFAIUI, you can say -j 12
--shared-cores=6... so you would only use 6 cores of the 12 and have 2
processes per core



That looks messy. IMNSHO it should work just like pg_restore's -j.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Andres Freund
On Tuesday, January 17, 2012 01:33:06 PM Jaime Casanova wrote:
 On Tue, Jan 17, 2012 at 7:23 AM, Andres Freund and...@anarazel.de wrote:
  On Tuesday, January 17, 2012 01:18:53 PM Susanne Ebrecht wrote:
  I would prefer to have an option that the user is able to tell on how
  much cores it should be shared. Something like --share-cores=N.
  
  Uhm. -j # does exactly that or am I missing your point?
 
 not really.
 
 if you have 12 cores and you say -j 12 you would have 1 process per
 core, with Susanne's suggestion, AFAIUI, you can say -j 12
 --shared-cores=6... so you would only use 6 cores of the 12 and have 2
 processes per core
I don't really get what that should do. If vacuumdb itself is a limit in any 
form in this we did something *very* wrong (in my opinion using processes for 
this is pointless anyway. Using async queries seems to be much easier for this 
special case. Especially for distributing individual commands.).
I don't really see how you could enforce sharing cores on the server side 
(well, there are cpusets, but were sure not introduce usage of that just for 
vacuumdb).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-17 Thread Peter Geoghegan
On 16 January 2012 08:11, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Impressive results. How about uploading the PDF to the community wiki?

Sure. http://wiki.postgresql.org/wiki/Group_commit .

 I think it might be simpler if it wasn't the background writer that's
 responsible for driving the group commit queue, but the backends
 themselves. When a flush request comes in, you join the queue, and if
 someone else is already doing the flush, sleep until the driver wakes you
 up. If no-one is doing the flush yet (ie. the queue is empty), start doing
 it yourself. You'll need a state variable to keep track who's driving the
 queue, but otherwise I think it would be simpler as there would be no
 dependency on WAL writer.

I think this replaces one problem with another. You've now effectively
elevated a nominated backend to the status of an auxiliary process -
do you intend to have the postmaster look after it, as with any other
auxiliary process? I'm not sure that that is a more difficult problem
to solve, but I suspect so. At least my proposal can have any one of
the backends, both currently participating in group commit and yet to,
wake up the WAL Writer.

 I tend think of the group commit facility as a bus. Passengers can hop on
 board at any time, and they take turns on who drives the bus. When the first
 passengers hops in, there is no driver so he takes the driver seat. When the
 next passenger hops in, he sees that someone is driving the bus already, so
 he sits down, and places a big sign on his forehead stating the bus stop
 where he wants to get off, and goes to sleep. When the driver has reached
 his own bus stop, he wakes up all the passengers who wanted to get off at
 the same stop or any of the earlier stops [1]. He also wakes up the
 passenger whose bus stop is the farthest from the current stop, and gets off
 the bus. The woken-up passengers who have already reached their stops can
 immediately get off the bus, and the one who has not notices that no-one is
 driving the bus anymore, so he takes the driver seat.

 [1] in a real bus, a passenger would not be happy if he's woken up too late
 and finds himself at the next stop instead of the one where he wanted to go,
 but for group commit, that is fine.

 In this arrangement, you could use the per-process semaphore for the
 sleep/wakeups, instead of latches. I'm not sure if there's any difference,
 but semaphores are more tried and tested, at least.

Yes, and I expect that this won't be the last time someone uses a bus
analogy in relation to this!

The proposed patch is heavily based on sync rep, which I'd have
imagined was more tried and tested than any proposed completely
alternative implementation, as it is basically a generalisation of
exactly the same principle, WAL Writer changes notwithstanding. I
would have imagined that that aspect would be particularly approved
of.

 wal_writer_delay is still needed for controlling how often asynchronous
 commits are flushed to disk.

That had occurred to me of course, but has anyone ever actually
tweaked wal_writer_delay with adjusting the behaviour of asynchronous
commits in mind? I'm pretty sure that the answer is no. I have a
slight preference for obsoleting it as a consequence of introducing
group commit, but I don't think that it matters that much.

 Auxiliary processes cannot use group commit. The changes made prevent
 them from availing of commit_siblings/commit_delay parallelism,
 because it doesn't exist anymore.

 Auxiliary processes have PGPROC entries too. Why can't they participate?

It was deemed to be a poor design decision to effectively create a
dependency on the WAL Writer among other auxiliary processes, as to do
so would perhaps compromise the way in which the postmaster notices
and corrects isolated failures. Maybe I'll revisit that assessment,
but I am not convinced that it's worth the very careful analysis of
the implications of such an unprecedented dependency, without there
being some obvious advantage. It it's a question of their being
deprived of commit_siblings group commit, well, we know from
experience that people didn't tend to touch it a whole lot anyway.

 Group commit is sometimes throttled, which seems appropriate - if a
 backend requests that the WAL Writer flush an LSN deemed too far from
 the known flushed point, that request is rejected and the backend goes
 through another path, where XLogWrite() is called.

 Hmm, if the backend doing the big flush gets the WALWriteLock before a bunch
 of group committers, the group committers will have to wait until the big
 flush is finished, anyway. I presume the idea of the throttling is to avoid
 the situation where a bunch of small commits need to wait for a huge flush
 to finish.

Exactly. Of course, you're never going to see that situation with
pgbench. I don't have much data to inform exactly what the right
trade-off is here, or some generic approximation of it across
platforms and hardware - other 

Re: [HACKERS] xlog location arithmetic

2012-01-17 Thread Peter Geoghegan
On 20 December 2011 10:27, Magnus Hagander mag...@hagander.net wrote:
 Doing it in numeric should be perfectly fine. The only real reason to
 pick int8 over in this context would be performance, but it's not like
 this is something that's going to be called in really performance
 critical paths...

FYI, my group commit patch has a little macro, in the spirit of
XLByteAdvance, to get the delta between two LSNs in bytes as an
uint64.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql \timing vs failed statements

2012-01-17 Thread Magnus Hagander
Right now, psql \timing output only gives output for successful
queries. Is there any actual reason for this, or just a it happened?

In particular,I just had a very long run of a CREATE UNIQUE INDEX fail
pretty far in - and I would've really liked to have timing output for
that one even though it failed.

Thus - if I were to change psql to output timing on failed queries as
well, will anybody object? ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logging in high performance systems.

2012-01-17 Thread Marti Raudsepp
On Thu, Nov 24, 2011 at 04:28, Theo Schlossnagle je...@omniti.com wrote:
 So... here's my first whack at solving this with some flexibility.

 The first thing I did was add hook points where immediate statement
 logging happens pre_exec and those that present duration
 post_exec.  These should, with optimization turned on, have only a
 few instructions of impact when no hooks are registered (we could
 hoist the branch outside the function call if that were identified as
 an issue).

Note that the hook mechanism you've built is a departure from how
other hooks are managed in Postgres. Traditionally hooks are just
global function pointers, and each consumer is responsible for storing
the previous value of the hook and chain-calling it in the handler. If
you want to change this pattern, I think you should start another
discussion.

E.g. hook registration looks like:

next_object_access_hook = object_access_hook;
object_access_hook = sepgsql_object_access;

static void
sepgsql_object_access(ObjectAccessType access, Oid classId, Oid
objectId, int subId)
{
if (next_object_access_hook)
(*next_object_access_hook) (access, classId, objectId, subId);

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-17 Thread Kohei KaiGai
2012/1/17 Robert Haas robertmh...@gmail.com:
 On Tue, Jan 10, 2012 at 7:51 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The attached patch adds OAT_DROP object-access-hook around permission
 checks of object deletion.
 Due to the previous drop statement reworks, the number of places to
 put this hook is limited to these six points: RemoveObjects,
 RemoveRelations, ATExecDropColumn, dropdb, DropTableSpace and
 shdepDropOwned().

 In sepgsql side, it checks {drop} permission for each object types,
 and {remove_name} permission to the schema that owning the object
 being removed. I'm still considering whether the drop permission
 should be applied on objects being removed in cascade mode.
 It is not difficult to implement. We can determine the bahavior on
 object deletion with same manner of creation; that saves contextual
 information using ProcessUtility hook.

 At this moment, the current proposed patch does not apply checks on
 cascaded deletion, according to SQL behavior. However, my concern is
 that user can automatically have right to remove all the objects
 underlying a partidular schema being removable, even if individual
 tables or functions are not able to be removed.

 So, my preference is sepgsql references dependency tables to check
 {drop} permissions of involved objects, not only the target object.

 Hmm.  I kind of wonder if we shouldn't just have the OAT_DROP hook get
 invoked for every actual drop, rather than only for the top-level
 object.  It's somewhat appealing to have the hook more-or-less match
 up the permissions checks, as you have it here, but in general it
 seems more useful to have a callback for each thing dropped than to
 have a callback for each thing named to be dropped.  What is your
 opinion?

I think it is more ideal design.

If we could track object deletion only top-level, we have no option to
implement security features based on the object-access-hook, even
if security model is suitable to apply checks all the object deletion.
In addition, I believe it should be used to clean-up something set up
by external modules, not only permission checks.

Do I modify the patch to place object-access-hook on deleteOneObject
(probably, it is the best position to track actual deletion)?
One problem is case of deletion of columns by ALTER TABLE.
It just marks attisdropped flag; without removing catalog entry.
Do we ought to put this hook on ATExecDropColumn exceptionally?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-17 Thread Greg Smith

On 01/17/2012 12:06 AM, Tom Lane wrote:

Well, since I see other committers sending in patches the day after the
nominal commitfest deadline, I don't feel too bad about being a bit late
as well.


To clarify the fairness standard here:  submitting a patch before the 
CommitFest deadline, then adding it to the app, means that we will try 
very hard to find a reviewer for the submission during that CF.  It's 
setting a worst-case bound on how long someone who contributes will have 
to wait for feedback.  That delay, how long it would take before someone 
saw community feedback after they sent in a patch, used to be far less 
predictable.


Something like this, sent just after the deadline, won't be assigned a 
reviewer by the CommitFest manager until the next CF.  That doesn't mean 
it won't be reviewed anyway.  Also, submissions that fix a regression, 
like this one, can easily end up on a fast track unrelated to the normal 
schedule.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Peter Eisentraut
On mån, 2012-01-16 at 17:25 -0500, Greg Smith wrote:
 The most reasonable answer to this is for people to publish a git repo
 URL in addition to the official submission of changes to the list in
 patch form. 

Note that the original complaint was that for the occasional reviewer,
the current system takes at least 5 partially redundant steps in two
different systems.  I doubt that adding a third system and more
partially redundant steps it going to help that.

I don't have anything against the general idea, but it won't address the
original point.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Andrew Dunstan



On 01/16/2012 05:40 PM, Alvaro Herrera wrote:

It's expected that we'll get a more reasonable interface to attachments,
one that will allow you to download patches separately.  (Currently,
attachments that have mime types other than text/plain are already
downloadable separately).



Are you really sure about that? My recent JSON patch is at 
http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net. 
I don't see any download link for the patch there, yet my mailer set the 
attachment type to text/x-patch, not text/plain.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logging in high performance systems.

2012-01-17 Thread Alvaro Herrera

Excerpts from Marti Raudsepp's message of mar ene 17 12:12:50 -0300 2012:
 On Thu, Nov 24, 2011 at 04:28, Theo Schlossnagle je...@omniti.com wrote:
  So... here's my first whack at solving this with some flexibility.
 
  The first thing I did was add hook points where immediate statement
  logging happens pre_exec and those that present duration
  post_exec.  These should, with optimization turned on, have only a
  few instructions of impact when no hooks are registered (we could
  hoist the branch outside the function call if that were identified as
  an issue).
 
 Note that the hook mechanism you've built is a departure from how
 other hooks are managed in Postgres. Traditionally hooks are just
 global function pointers, and each consumer is responsible for storing
 the previous value of the hook and chain-calling it in the handler. If
 you want to change this pattern, I think you should start another
 discussion.

Hm.  We already have places doing the other thing, for example
see XactCallback and ExprContextCallback.  Not sure we have an actual
criteria for deciding when to use which.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-01-17 Thread Peter Eisentraut
It appears that the only way to create a non-inherited CHECK constraint
is using ALTER TABLE.  Is there no support in CREATE TABLE planned?
That looks a bit odd.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-17 Thread Peter Eisentraut
On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
 On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut pete...@gmx.net wrote:
  On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
  I don't see how setting indisvalid to false helps with this, because
  IIUC when a session sees indisvalid = false, it is supposed to avoid
  using the index for queries but still make new index entries when a
  write operation happens - but to drop an index, I think you'd need to
  get into a state where no one was using the index for anything at all.
 
  ISTM that one would need to set indisready to false instead.
 
 Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense.  There is
only just-created - ready - valid.  We might as well convert that to a
single char column, as you had indicated in your earlier email.  But
that's independent of the proposed patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-01-17 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mar ene 17 13:59:57 -0300 2012:
 It appears that the only way to create a non-inherited CHECK constraint
 is using ALTER TABLE.  Is there no support in CREATE TABLE planned?
 That looks a bit odd.

There are no plans to do that AFAIR, though maybe you could convince
Nikhil to write the patch to do so.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Matteo Beccati
On 17/01/2012 17:50, Alvaro Herrera wrote:
 
 Excerpts from Matteo Beccati's message of mar ene 17 12:33:27 -0300 2012:
 My proof of concept archive for the hackers ML site is still online, in
 case anyone has trouble downloading the patches or just wants to have
 the full thread handy.
 
 I was going to ping you about this, because I tried it when I wrote this
 message and it got stuck waiting for response.

Hmm, works for me, e.g. the recently cited message:

http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net


 Now that we've migrated the website, it's time to get back to our
 conversations about migrating archives to your stuff too.  How confident
 with Django are you?

I've never wrote a line of Python in my life, so someone else should
work on porting the web part, I'm afraid...


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Matteo Beccati
On 17/01/2012 18:10, Matteo Beccati wrote:
 On 17/01/2012 17:50, Alvaro Herrera wrote:

 Excerpts from Matteo Beccati's message of mar ene 17 12:33:27 -0300 2012:
 My proof of concept archive for the hackers ML site is still online, in
 case anyone has trouble downloading the patches or just wants to have
 the full thread handy.

 I was going to ping you about this, because I tried it when I wrote this
 message and it got stuck waiting for response.
 
 Hmm, works for me, e.g. the recently cited message:
 
 http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net

Erm... I meant

http://archives.beccati.org/message-id/4f12f9e5.3090...@dunslane.net

which redirects to:

http://archives.beccati.org/pgsql-hackers/message/305925

for me.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/16/2012 05:40 PM, Alvaro Herrera wrote:
 It's expected that we'll get a more reasonable interface to attachments,
 one that will allow you to download patches separately.  (Currently,
 attachments that have mime types other than text/plain are already
 downloadable separately).

 Are you really sure about that? My recent JSON patch is at 
 http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net. 
 I don't see any download link for the patch there, yet my mailer set the 
 attachment type to text/x-patch, not text/plain.

Yeah, AFAICT the archives treat text/x-patch the same as text/plain.
I tend to send stuff that way if I mean it primarily to be read in the
email.  If I'm thinking people will download and apply it, it's better
to gzip the patch and pick a mime type appropriate to that, because that
makes it much easier to pull the patch off the archives at need, at the
cost that you can't just eyeball it in your mail reader.

Anyway, I agree with the general tenor of this thread that it'd be nice
to reduce the impedance mismatches a bit.  Don't have any great ideas
about specific ways to do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-17 Thread Marti Raudsepp
Hi,

Here's my review for the logging hooks patch queued for the 2012-01
CommitFest by Martin Pihlak.

Submission review

The patch is in context diff format and applies fine. Tests are not
included and don't seem practical for this patch.

More documentation would always be nice, but as other hooks aren't
documented either, it seems that's acceptable. The function prototype
and ErrorData structure are largely self-documenting.

Usability review

 Do we want that?
 Do we already have it?

The patch aims to allow custom processing of log messages. There are
two slightly overlapping features already in Postgres: csvlog and
syslog. Both seem to have their drawbacks; csvlog is written to disk
first, and has to be processed in the background in batches. Syslog is
pretty universal, but the output format is more limited, and details
of a single report are split up into several lines. Also passing extra
data for parsing via log_line_prefix seems hacky and failure prone.

The proposal also allows more flexible filtering/instrumentation of
log messages, which is not available with current methods.

 Does the patch actually implement that?

The hooked EmitErrorReport function is responsible for calling other
log handlers, so all relevant logging and context information is
available.

 Does it follow SQL spec, or the community-agreed behavior?

So far, a few people have stated that this sort of a logging hook is
desirable. Nobody has opposed it so far.

Feature test

I tested the hook using Martin's own pg_logforward extension:
https://github.com/mpihlak/pg_logforward

I verified that the hook works for messages with DEBUG, LOG, NOTICE,
ERROR and FATAL levels, from the backend as well as postmaster.

 Are there corner cases the author has failed to consider?

Whether the hook is called depends on both the 'client_min_messages'
and 'log_min_messages' settings because of this optimization in
errstart:
/* Skip processing effort if non-error message will not be output */
if (elevel  ERROR  !output_to_server  !output_to_client)
return false;

This will certainly be surprising to users. I think making it depend
*only* on output_to_server (and thus log_min_messages) would be more
predictable.

Coding review

 Does it follow the project coding guidelines?

There's a minor whitespace problem. When declaring variables, and the
data type is longer than 12 characters, just use 1 single space
character to delimit the variable name, instead of tab:

extern emit_log_hook_type   emit_log_hook;

 Will it work on Windows/BSD etc?

I see that other hooks are declared with PGDLLIMPORT. I presume that
this is necessary on Windows:
extern PGDLLIMPORT emit_log_hook_type emit_log_hook;

 Are the comments sufficient and accurate?

I think the hook warrants a comment that, whether the messages will be
seen, depends on the log_min_messages setting.



PS: This is my first patch review ever, any feedback would be welcome.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-17 Thread Heikki Linnakangas

On 17.01.2012 16:35, Peter Geoghegan wrote:

On 16 January 2012 08:11, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I think it might be simpler if it wasn't the background writer that's
responsible for driving the group commit queue, but the backends
themselves. When a flush request comes in, you join the queue, and if
someone else is already doing the flush, sleep until the driver wakes you
up. If no-one is doing the flush yet (ie. the queue is empty), start doing
it yourself. You'll need a state variable to keep track who's driving the
queue, but otherwise I think it would be simpler as there would be no
dependency on WAL writer.


I think this replaces one problem with another. You've now effectively
elevated a nominated backend to the status of an auxiliary process -
do you intend to have the postmaster look after it, as with any other
auxiliary process?


The GroupCommitWaitForLSN() call happens within a critical section. If 
the process dies, you're screwed no matter what. It's not very different 
from the current situation where if one backend flushes the WAL, another 
backend will notice that once it gets the WALWriteLock, and returns quickly.



wal_writer_delay is still needed for controlling how often asynchronous
commits are flushed to disk.


That had occurred to me of course, but has anyone ever actually
tweaked wal_writer_delay with adjusting the behaviour of asynchronous
commits in mind?


I found it very helpful to reduce wal_writer_delay in pgbench tests, 
when running with synchronous_commit=off. The reason is that hint bits 
don't get set until the commit record is flushed to disk, so making the 
flushes more frequent reduces the contention on the clog. However, Simon 
made async commits nudge WAL writer if the WAL page fills up, so I'm not 
sure how relevant that experience is anymore.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-01-17 Thread Greg Smith
Attached is the pg_test_timing utility portion of this submission, 
broken out into its own patch.  It's a contrib module modeled on 
pg_test_fsync.


The documentation is still a bit rough, I'm not done with that yet.  I 
have included an example of good timing results, switching to a bad 
clock source, and the resulting bad results.  Code review found some 
formatting things to nitpick I've already fixed:  non-standard brace 
locations and not including enough spaces in expressions were the main two.


This is now referenced by the existing cryptic documentation comment 
around EXPLAIN ANALYZE, which says that overhead can be high because 
gettimeofday is slow on some systems.  Since this utility measures that 
directly, I think it's a clear win to include it just for that purpose.  
The fact that there are more places coming where timing overhead matters 
is also true.  But this existing one is already bad enough to justify 
shipping something to help measure/manage it in my mind.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

diff --git a/contrib/Makefile b/contrib/Makefile
index 0c238aa..45b601c 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
+		pg_test_timing	\
 		pg_trgm		\
 		pg_upgrade	\
 		pg_upgrade_support \
diff --git a/contrib/pg_test_timing/Makefile b/contrib/pg_test_timing/Makefile
new file mode 100644
index 000..b8b266a
--- /dev/null
+++ b/contrib/pg_test_timing/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_test_timing/Makefile
+
+PGFILEDESC = pg_test_timing - test timing overhead
+PGAPPICON = win32
+
+PROGRAM  = pg_test_timing
+OBJS = pg_test_timing.o
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_test_timing
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c
new file mode 100644
index 000..bcf3c3a
--- /dev/null
+++ b/contrib/pg_test_timing/pg_test_timing.c
@@ -0,0 +1,157 @@
+/*
+ *	pg_test_timing.c
+ *		tests overhead of timing calls and their monoticity:  that
+ * 		they always move forward
+ */
+
+#include postgres_fe.h
+
+#include getopt_long.h
+#include portability/instr_time.h
+
+static const char *progname;
+
+static int32 test_duration = 3;
+
+static void handle_args(int argc, char *argv[]);
+static void test_timing(int32);
+
+int
+main(int argc, char *argv[])
+{
+	progname = get_progname(argv[0]);
+
+	handle_args(argc, argv);
+
+	test_timing(test_duration);
+
+	return 0;
+}
+
+static void
+handle_args(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{duration, required_argument, NULL, 'd'},
+		{NULL, 0, NULL, 0}
+	};
+	int option;			/* Command line option */
+	int	optindex = 0;	/* used by getopt_long */
+
+	if (argc  1)
+	{
+		if (strcmp(argv[1], --help) == 0 || strcmp(argv[1], -h) == 0 ||
+			strcmp(argv[1], -?) == 0)
+		{
+			printf(Usage: %s [-d DURATION]\n, progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], --version) == 0 || strcmp(argv[1], -V) == 0)
+		{
+			puts(pg_test_timing (PostgreSQL)  PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((option = getopt_long(argc, argv, d:,
+ long_options, optindex)) != -1)
+	{
+		switch (option)
+		{
+			case 'd':
+test_duration = atoi(optarg);
+break;
+
+			default:
+fprintf(stderr, Try \%s --help\ for more information.\n,
+		progname);
+exit(1);
+break;
+		}
+	}
+
+	if (argc  optind)
+	{
+		fprintf(stderr,
+%s: too many command-line arguments (first is \%s\)\n,
+progname, argv[optind]);
+		fprintf(stderr, Try \%s --help\ for more information.\n,
+progname);
+		exit(1);
+	}
+
+	if (test_duration  0)
+	{
+		printf(Testing timing overhead for %d seconds.\n, test_duration);
+	}
+	else
+	{
+		printf(Testing timing was interrupted.\n);
+	}
+}
+
+static void
+test_timing(int32 duration)
+{
+	uint64 total_time;
+	int64 time_elapsed = 0;
+	uint64 loop_count = 0;
+	uint64 prev, cur;
+	int32 diff, i, bits, found;
+
+	instr_time start_time, end_time, temp;
+
+	static int64 histogram[32];
+
+	total_time = duration  0 ? duration * 100 : 0;
+
+	INSTR_TIME_SET_CURRENT(start_time);
+	cur = INSTR_TIME_GET_MICROSEC(start_time);
+
+	while (time_elapsed  total_time)
+	{
+		prev = cur;
+		INSTR_TIME_SET_CURRENT(temp);
+		cur = INSTR_TIME_GET_MICROSEC(temp);
+		diff = cur - prev;
+
+		if (diff  0)
+		{
+			printf(Detected clock going backwards in time.\n);
+			printf(Time warp: %d microseconds\n, diff);
+			exit(1);
+		}
+
+		bits = 0;
+		while (diff)
+		{
+			diff = 1;
+			bits++;
+		}
+		histogram[bits]++;
+
+		loop_count++;
+		INSTR_TIME_SUBTRACT(temp, start_time);
+		time_elapsed = INSTR_TIME_GET_MICROSEC(temp);
+	}
+
+	INSTR_TIME_SET_CURRENT(end_time);
+
+	

Re: [HACKERS] Group commit, revised

2012-01-17 Thread Peter Geoghegan
On 17 January 2012 17:37, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I found it very helpful to reduce wal_writer_delay in pgbench tests, when
 running with synchronous_commit=off. The reason is that hint bits don't get
 set until the commit record is flushed to disk, so making the flushes more
 frequent reduces the contention on the clog. However, Simon made async
 commits nudge WAL writer if the WAL page fills up, so I'm not sure how
 relevant that experience is anymore.

It's quite possible that the WAL Writer will spin continuously, if
given enough work to do, with this patch.

Although it's hard to tell from the graph I sent, there is a modest
improvement in TPS for even a single client. See the tables in the
PDF.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Greg Smith

On 01/17/2012 11:50 AM, Peter Eisentraut wrote:

On mån, 2012-01-16 at 17:25 -0500, Greg Smith wrote:

The most reasonable answer to this is for people to publish a git repo
URL in addition to the official submission of changes to the list in
patch form.

Note that the original complaint was that for the occasional reviewer,
the current system takes at least 5 partially redundant steps in two
different systems.  I doubt that adding a third system and more
partially redundant steps it going to help that.


Publishing the submission via git is an extra step for the patch 
submitter.  If that happens, the reviewer can test just be cloning that, 
instead of first closing the PostgreSQL one then applying the patch.  It 
removes the how do I fish the patch out of the archives? problem from 
the reviewer's side of things.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar ene 17 14:24:05 -0300 2012:
 
 Andrew Dunstan and...@dunslane.net writes:
  On 01/16/2012 05:40 PM, Alvaro Herrera wrote:
  It's expected that we'll get a more reasonable interface to attachments,
  one that will allow you to download patches separately.  (Currently,
  attachments that have mime types other than text/plain are already
  downloadable separately).
 
  Are you really sure about that? My recent JSON patch is at 
  http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net. 
  I don't see any download link for the patch there, yet my mailer set the 
  attachment type to text/x-patch, not text/plain.
 
 Yeah, AFAICT the archives treat text/x-patch the same as text/plain.

Right, maybe it's text/* or something like that.

 I tend to send stuff that way if I mean it primarily to be read in the
 email.  If I'm thinking people will download and apply it, it's better
 to gzip the patch and pick a mime type appropriate to that, because that
 makes it much easier to pull the patch off the archives at need, at the
 cost that you can't just eyeball it in your mail reader.

Maybe we could find a way to convince Mhonarc to present links to
download all mime parts separately, not only those that are
undisplayable.

 Anyway, I agree with the general tenor of this thread that it'd be nice
 to reduce the impedance mismatches a bit.  Don't have any great ideas
 about specific ways to do that.

I'm hopeful that the migration to the Archivopteryx stuff by Matteo will
improve things a bit.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Alvaro Herrera

Excerpts from Matteo Beccati's message of mar ene 17 12:33:27 -0300 2012:
 
 On 16/01/2012 23:40, Alvaro Herrera wrote:
  
  Excerpts from Greg Smith's message of lun ene 16 19:25:50 -0300 2012:
  On 01/16/2012 03:48 PM, Josh Berkus wrote:
  
  I'll also point out that the process for *applying* a patch, if you
  don't subscribe to hackers and keep archives around on your personal
  machine for months, is also very cumbersome and error-prone.  Copy and
  paste from a web page?  Really?
 
  The most reasonable answer to this is for people to publish a git repo 
  URL in addition to the official submission of changes to the list in 
  patch form.
  
  It's expected that we'll get a more reasonable interface to attachments,
  one that will allow you to download patches separately.  (Currently,
  attachments that have mime types other than text/plain are already
  downloadable separately).
 
 My proof of concept archive for the hackers ML site is still online, in
 case anyone has trouble downloading the patches or just wants to have
 the full thread handy.

I was going to ping you about this, because I tried it when I wrote this
message and it got stuck waiting for response.

Now that we've migrated the website, it's time to get back to our
conversations about migrating archives to your stuff too.  How confident
with Django are you?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Matteo Beccati
On 16/01/2012 23:40, Alvaro Herrera wrote:
 
 Excerpts from Greg Smith's message of lun ene 16 19:25:50 -0300 2012:
 On 01/16/2012 03:48 PM, Josh Berkus wrote:
 
 I'll also point out that the process for *applying* a patch, if you
 don't subscribe to hackers and keep archives around on your personal
 machine for months, is also very cumbersome and error-prone.  Copy and
 paste from a web page?  Really?

 The most reasonable answer to this is for people to publish a git repo 
 URL in addition to the official submission of changes to the list in 
 patch form.
 
 It's expected that we'll get a more reasonable interface to attachments,
 one that will allow you to download patches separately.  (Currently,
 attachments that have mime types other than text/plain are already
 downloadable separately).

My proof of concept archive for the hackers ML site is still online, in
case anyone has trouble downloading the patches or just wants to have
the full thread handy.

All you need to do is to swap postgresql.org with beccati.org in the
message-id link:

http://archives.postgresql.org/message-id/1320343602-sup-2...@alvh.no-ip.org

-

http://archives.beccati.org/message-id/1320343602-sup-2...@alvh.no-ip.org


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Double-write with Fast Checksums

2012-01-17 Thread Dan Scales
We have some numbers for 9.2 runs with and without double writes now.  We
are still using the double-write patch that assumes checksums on data
pages, so checksums must be turned on for double writes.

The first set of runs are 50-warehouse 2-processor DBT2 60-minute run,
with checkpoints every 5 minutes.  Machine memory is 8G, cache size is
5G.  Database size is about 9G.  The disks are enterprise Fibre Channel
disks, so there is good disk write-caching at the array.  All runs are
for virtual machines.  (We expect that the virtual machine numbers would
be representative of performance for non-virtual machines, but we know
that we need to get non-virtual numbers as well.)

  orig 9.2| 9.2 + DW patch
  -
  FPW off  FPW off FPW off  FPW on  DW on/FPW off
   CK off  CK onCK on   CK on
  
one disk: 1557415308   15135   13337   13052 [5G shared_buffer, 8G RAM]
sep log disk: 1873918134   18063   15823   16033

(First row is everything on one disk, second row is where the WAL log is
on a separate disk.)

So, in this case where cache is large and disks probably have
write-caching, we get about same performance with full_page_write on and
double-writes on.  We need to run these numbers more to get a good
average -- in some runs last night, double writes did better, closer to
what we were seeing with 9.0 (score of 17721 instead of 16033).

Note that, for one disk, there is no significant different between the
original 9.2 code and the patched code with checksums (and double-writes)
turned off.  For two disks, there is a bigger difference (3.3%), but I'm
not sure that is really significant.

The second set of numbers is for a hard disk with write cache turned off,
closer to internal hard disks of servers (people were quite interested in
that result).  These runs are for 50-warehouse 8-processor DBT2 60-minute
run, with checkpoints every 5 minutes.  The RAM size is 8G, and the cache
size is 6G.

  9.2 + DW patch
  ---
  FPW off  FPW on  DW on/FPW off
  CK onCK on   CK on
one disk: 1208478499766[6G shared_buffers, 8G RAM]

So, here we see a performance advantage for double writes where the cache
is large and the disks do not have write-caching.  Presumably, the cost
of fsyncing the big writes (with full pages) to the WAL log on a slow
disk are traded against the fsyncs of the double writes.

Third set of numbers is back to the first hardware setup, but with much
smaller shared_buffers.  Again, the runs are 50-warehouse 2-processor DBT2
60-minute run, with checkpoints every 5 minutes.  But shared_buffers is
set to 1G, so there will be a great many more dirty evictions by the
backends.

  9.2 + DW patch
  ---
  FPW off  FPW on  DW on/FPW off
  CK onCK on   CK on
one disk: 11078   103943296  [1G shared_buffers, 8G RAM]
sep log disk: 13605   120153412
  
one disk:  773166132670  [1G shared_buffers, 2G RAM]
sep log disk:  675261292722

Here we see that double writes does very badly, because of all the double
writes being done for individual blocks by the backends.  With the small
shared cache, the backends are now writing 3 times as many blocks as the
checkpointer.

Clearly, the double write option would have to be completely optional,
available for use for database configurations which have a well-sized
cache.

It would still be preferable that performance didn't have such a cliff
when dirty evictions become high, so, with that in mind, I am doing some
prototyping of the double-write buffer idea that folks have proposed on
this thread. 

Happy to hear all comments/suggestions.  Thanks,

Dan

- Original Message -
From: Dan Scales sca...@vmware.com
To: Heikki Linnakangas heikki.linnakan...@enterprisedb.com
Cc: PG Hackers pgsql-hackers@postgresql.org, jks...@gmail.com, David 
Fetter da...@fetter.org
Sent: Wednesday, January 11, 2012 1:25:21 PM
Subject: Re: [HACKERS] [WIP] Double-write with Fast Checksums

Thanks for all the comments and suggestions on the double-write patch.  We are 
working on generating performance results for the 9.2 patch, but there is 
enough difference between 9.0 and 9.2 that it will take some time.

One thing in 9.2 that may be causing problems with the current patch is the 
fact that the checkpointer and bgwriter are separated and can run at the same 
time (I think), and therefore will contend on the double-write file.  Is there 
any thought that the bgwriter might be paused while the checkpointer is doing a 
checkpoint, since the checkpointer is doing some of the cleaning that the 
bgwriter wants to do anyways?

The current patch (as mentioned) also may not do well if there are a lot of 
dirty-page 

Re: [HACKERS] Why is CF 2011-11 still listed as In Progress?

2012-01-17 Thread Peter Eisentraut
On mån, 2012-01-16 at 22:00 -0500, Greg Smith wrote:
 Adjusting that expectation is another side to pragmatism based on
 recent history I think needs to be acknowledged, but is unlikely to be
 improved on.  9.0 shipped on September 20.  9.1 shipped on September
 11.  If we say the last CF of each release is unlikely to wrap up
 before early March each year, that's 6 months of settling time
 between major feature freeze and release.  So far that seems to result
 in stable releases to be proud of, on a predictable enough yearly
 schedule.

Well, has it?  I think, it took until version 9.1.2 to have a release
without major issues that you could consider for production.  So do we
need 8 or 10 months of settling time?  Or should we release earlier,
realizing that we won't get proper testing before the final release
anyway?  I don't know.

Another concern is that we are now essentially freezing 9.2 features
with at best about four weeks of production experience and feedback from
9.1.  I expect that this will also contribute to dragging out the
finalization of 9.2 once more.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-17 Thread Tomas Vondra
On 20.12.2011 19:59, Tomas Vondra wrote:
 On 20.12.2011 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.

 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!
 
 I've added the docs (see the attachment) and rebased to current head.
 
 Tomas

Fixed a failing regression test (check of pg_stat_database structure).

Tomas
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a12a9a2..3635c3f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -691,6 +691,26 @@ postgres: replaceableuser/ replaceabledatabase/ 
replaceablehost/ re
  /row
 
  row
+  
entryliteralfunctionpg_stat_get_db_temp_files/function(typeoid/type)/literal/entry
+  entrytypebigint/type/entry
+  entry
+   Nuber of temporary files written for the database. All temporary files 
are
+   counted, regardless of why the temporary file was created (sorting or 
hash
+   join) or file size (log_temp_file does not affect this).
+  /entry
+ /row
+
+ row
+  
entryliteralfunctionpg_stat_get_db_temp_bytes/function(typeoid/type)/literal/entry
+  entrytypebigint/type/entry
+  entry
+   Amount of data written to temporary files for the database. All 
temporary
+   files are counted, regardless of why the temporary file was created 
(sorting
+   or hash join) or file size (log_temp_file does not affect this).
+  /entry
+ /row
+
+ row
   
entryliteralfunctionpg_stat_get_numscans/function(typeoid/type)/literal/entry
   entrytypebigint/type/entry
   entry
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 50ba20c..3a0dca3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -574,6 +574,8 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
 pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
 pg_stat_get_db_conflict_all(D.oid) AS conflicts,
+pg_stat_get_db_temp_files(D.oid) AS temp_files,
+pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM pg_database D;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 323d42b..6a2ff1d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -286,7 +286,7 @@ static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, 
int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int 
len);
-
+static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
 /* 
  * Public functions called from postmaster follow
@@ -1339,6 +1339,29 @@ pgstat_report_recovery_conflict(int reason)
pgstat_send(msg, sizeof(msg));
 }
 
+
+/* 
+ * pgstat_report_tempfile() -
+ *
+ * Tell the collector about a temporary file.
+ * 
+ */
+void
+pgstat_report_tempfile(size_t filesize)
+{
+   PgStat_MsgTempFile msg;
+
+   if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+   return;
+
+   pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_TEMPFILE);
+   msg.m_databaseid = MyDatabaseId;
+   msg.m_filesize = filesize;
+   pgstat_send(msg, sizeof(msg));
+}
+
+;
+
 /* --
  * pgstat_ping() -
  *
@@ -3185,6 +3208,10 @@ PgstatCollectorMain(int argc, char *argv[])

pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) msg, len);
break;
 
+   case PGSTAT_MTYPE_TEMPFILE:
+   
pgstat_recv_tempfile((PgStat_MsgTempFile *) msg, len);
+   break;
+
default:
break;
}
@@ -3266,6 +3293,8 @@ pgstat_get_db_entry(Oid databaseid, bool create)
result-n_conflict_snapshot = 0;
result-n_conflict_bufferpin = 0;
result-n_conflict_startup_deadlock = 0;
+   result-n_temp_files = 0;
+   result-n_temp_bytes = 0;
 
result-stat_reset_timestamp = GetCurrentTimestamp();
 
@@ -4177,6 +4206,8 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int 
len)
dbentry-n_tuples_updated = 0;
dbentry-n_tuples_deleted = 0;
dbentry-last_autovac_time = 0;
+   dbentry-n_temp_bytes = 0;
+   dbentry-n_temp_files = 0;
 
dbentry-stat_reset_timestamp = 

Re: [HACKERS] [WIP] Double-write with Fast Checksums

2012-01-17 Thread Kevin Grittner
Dan Scales sca...@vmware.com wrote:
 
 The second set of numbers is for a hard disk with write cache
 turned off, closer to internal hard disks of servers (people were
 quite interested in that result).  These runs are for 50-warehouse
 8-processor DBT2 60-minute run, with checkpoints every 5 minutes. 
 The RAM size is 8G, and the cache size is 6G.
 
   9.2 + DW patch
   ---
   FPW off  FPW on  DW on/FPW off
   CK onCK on   CK on
 one disk: 1208478499766[6G shared_buffers, 8G RAM]
 
 So, here we see a performance advantage for double writes where
 the cache is large and the disks do not have write-caching. 
 Presumably, the cost of fsyncing the big writes (with full pages)
 to the WAL log on a slow disk are traded against the fsyncs of the
 double writes.
 
I'm very curious about what impact DW would have on big servers with
write-back cache that becomes saturated, like in Greg Smith's post
here:
 
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00883.php
 
This is a very different approach from what has been tried so far to
address that issue, but when I look at the dynamics of that
situation, I can't help thinking that DW is the most promising
approached for improving that which I've seen suggested so far.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-17 Thread Martin Pihlak
On 01/17/2012 07:35 PM, Marti Raudsepp wrote:
 Here's my review for the logging hooks patch queued for the 2012-01
 CommitFest by Martin Pihlak.
 

Thanks for reviewing!

 There's a minor whitespace problem. When declaring variables, and the
 data type is longer than 12 characters, just use 1 single space
 character to delimit the variable name, instead of tab:
 

Fixed both in .h and .c

 I see that other hooks are declared with PGDLLIMPORT. I presume that
 this is necessary on Windows:
 extern PGDLLIMPORT emit_log_hook_type emit_log_hook;

Indeed, fixed now.

 I think the hook warrants a comment that, whether the messages will be
 seen, depends on the log_min_messages setting.
 

Comment added.

regards,
Martin
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
***
*** 136,141  static int	errordata_stack_depth = -1; /* index of topmost active frame */
--- 136,148 
  
  static int	recursion_depth = 0;	/* to detect actual recursion */
  
+ /*
+  * Hook for intercepting log messages. Note that the hook does not get
+  * called for messages that are supressed by GUC settings such as
+  * log_min_messages.
+  */
+ emit_log_hook_type emit_log_hook = NULL;
+ 
  /* buffers for formatted timestamps that might be used by both
   * log_line_prefix and csv logs.
   */
***
*** 1276,1281  EmitErrorReport(void)
--- 1283,1291 
  	CHECK_STACK_DEPTH();
  	oldcontext = MemoryContextSwitchTo(ErrorContext);
  
+ 	if (emit_log_hook)
+ 		emit_log_hook(edata);
+ 
  	/* Send to server log, if enabled */
  	if (edata-output_to_server)
  		send_message_to_server_log(edata);
*** a/src/include/utils/elog.h
--- b/src/include/utils/elog.h
***
*** 327,332  typedef struct ErrorData
--- 327,334 
  	int			saved_errno;	/* errno at entry */
  } ErrorData;
  
+ typedef void (*emit_log_hook_type)(ErrorData *edata);
+ 
  extern void EmitErrorReport(void);
  extern ErrorData *CopyErrorData(void);
  extern void FreeErrorData(ErrorData *edata);
***
*** 347,352  typedef enum
--- 349,355 
  extern int	Log_error_verbosity;
  extern char *Log_line_prefix;
  extern int	Log_destination;
+ extern PGDLLIMPORT emit_log_hook_type emit_log_hook;
  
  /* Log destination bitmap */
  #define LOG_DESTINATION_STDERR	 1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-17 Thread Marti Raudsepp
Hi!

On Tue, Jan 17, 2012 at 23:07, Martin Pihlak martin.pih...@gmail.com wrote:
 I think the hook warrants a comment that, whether the messages will be
 seen, depends on the log_min_messages setting.

 Comment added.

Nice :)

It seems you missed a comment, that the current implementation is also
affected by client_min_messages. I think that being affected by
client-specific settings is surprising. I would put the
if(emit_log_hook) inside the existing if(edata-output_to_server)
condition. Unless you have some reason to do it this way?

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-17 Thread Dimitri Fontaine
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Robert Haas robertmh...@gmail.com writes:
 But on the flip side, I think we're generally a bit more flexible
 about exposing things via C than through the procedural languages.

 Then as Andres proposed, a new function would be available to get the
 value, we're not changing the trigger procedure function API in case the
 language is C…

I've been updating my github branch with a patch that provides the
parsetree to C coded command trigger functions only, as their 5th
argument, of type INTERNAL (so that only C coded procs apply).

  https://github.com/dimitri/postgres/compare/master...command_triggers

I still have some cleaning to do before to prepare the next patch
version, such as documentation updating and dealing with rewrites of
CHECK and DEFAULT column constraints in CREATE TABLE.  I had to add
support for the T_A_Const parser node, and now I'm about to see about
adding support for the T_A_Expr one, but I can't help to wonder how the
rewriter could work without them.

What is this simple thing I'm missing here, if any?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA

2012-01-17 Thread Thomas Munro
On 16 January 2012 08:06, Ilya Kosmodemiansky hydrobi...@gmail.com wrote:
 That is quite useful feature to implement smth. like message queues
 based on database and so on.
 Now there is possibility to jump over luck of such feature in Postgres
 using current advisory lock implementation (pg_try_advisory_xact_lock
 to determine if somebody already acquired log on particular row).
 So Im not sure this is an urgent matter.

Thanks Ilya.  I knew about advisory locks but hadn't though of using
them like this.  I tried some simple examples using SELECT ... FROM
... WHERE pg_try_advisory_xact_lock(id) [FOR UPDATE] LIMIT 1 in
transactions from a couple of different sessions and it achieves the
right effect.  I could imagine that in theory there might be order of
evaluation subtleties in some cases where you have more things in your
WHERE clause though.  I guess you want the pg_try_advisory_xact_lock
to be tested last after all other conditions are satisfied (ie for
minimal lock contention, avoiding false positives).

So I guess the question is whether it's worth implementing an explicit
feature to match other RDMBSs, complement NOWAIT and avoid theoretical
order of evaluation problems, or if this technique is enough.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-17 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 I still have some cleaning to do before to prepare the next patch
 version, such as documentation updating and dealing with rewrites of
 CHECK and DEFAULT column constraints in CREATE TABLE.  I had to add
 support for the T_A_Const parser node, and now I'm about to see about
 adding support for the T_A_Expr one, but I can't help to wonder how the
 rewriter could work without them.

It doesn't, and it shouldn't have to.  If those nodes get to the
rewriter then somebody forgot to apply parse analysis.  What's your test
case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is CF 2011-11 still listed as In Progress?

2012-01-17 Thread Robert Haas
On Tue, Jan 17, 2012 at 3:27 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-16 at 22:00 -0500, Greg Smith wrote:
 Adjusting that expectation is another side to pragmatism based on
 recent history I think needs to be acknowledged, but is unlikely to be
 improved on.  9.0 shipped on September 20.  9.1 shipped on September
 11.  If we say the last CF of each release is unlikely to wrap up
 before early March each year, that's 6 months of settling time
 between major feature freeze and release.  So far that seems to result
 in stable releases to be proud of, on a predictable enough yearly
 schedule.

 Well, has it?  I think, it took until version 9.1.2 to have a release
 without major issues that you could consider for production.  So do we
 need 8 or 10 months of settling time?  Or should we release earlier,
 realizing that we won't get proper testing before the final release
 anyway?  I don't know.

There's definitely some things that we're not going to catch until
after final release; after a certain point, settling doesn't help
much.  But the general pattern for the last few releases is that after
the end of the cycle we've done a sweep for open issues and found many
of them.  Early testing also tends to shake out a bunch of bugs in
whatever the new features are.  During the 9.0 cycle, it took until
June to clear that backlog; during the 9.1 cycle, it took until
sometime around June-July.  In both cases, a large percentage of those
issues were bugs introduced during the final CommitFest, because
that's when nearly all the big features hit.  If we froze the tree
today, we likely wouldn't need more than a month to put out a good
beta, but a month from now it'll take four or five months.  People
know when the end of the cycle is coming; just look at the number of
patches in the last CommitFest of any given cycle versus the earlier
ones.  We've been doing a pretty good job fielding it, but it isn't
perfect.

 Another concern is that we are now essentially freezing 9.2 features
 with at best about four weeks of production experience and feedback from
 9.1.  I expect that this will also contribute to dragging out the
 finalization of 9.2 once more.

I don't believe that this really matters all that much; a lot of the
things we're fixing have been an issue for years.  Even for the
patches that are improving on 9.1 features (e.g. recv and apply sync
rep modes), I don't think this particular consideration is going to
hold things up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-01-17 Thread Jim Nasby
On Jan 17, 2012, at 11:07 AM, Alvaro Herrera wrote:
 Excerpts from Peter Eisentraut's message of mar ene 17 13:59:57 -0300 2012:
 It appears that the only way to create a non-inherited CHECK constraint
 is using ALTER TABLE.  Is there no support in CREATE TABLE planned?
 That looks a bit odd.
 
 There are no plans to do that AFAIR, though maybe you could convince
 Nikhil to write the patch to do so.

That certainly doesn't meet the principle of least surprise... CREATE TABLE 
should support this.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Jim Nasby
On Jan 13, 2012, at 4:15 PM, Christopher Browne wrote:
 Have two logical tasks:
 a) A process that manages the list, and
 b) Child processes doing vacuums.
 
 Each time a child completes a table, it asks the parent for another one.

There is also a middle ground, because having the the scheduling process sounds 
like a lot more work than what Josh was proposing.

CREATE TEMP SEQUENCE s;
SELECT relname, s mod number of backends AS backend_number
  FROM ( SELECT relname
   FROM pg_class
   ORDER BY relpages
);

Of course, having an actual scheduling process is most likely the most 
efficient.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-17 Thread Jim Nasby
On Jan 15, 2012, at 4:42 PM, Peter Geoghegan wrote:
 Attached is a patch that myself and Simon Riggs collaborated on. I
 took the group commit patch that Simon posted to the list back in
 November, and partially rewrote it.

Forgive me if this is a dumb question, but I noticed a few places doing things 
like:

if (...)
  Blah();
else
{
  ...
}

Is that allowed PG formatting? I thought that if braces were required on one 
branch of an if they were supposed to go on both sides.

Also, I didn't see any README changes in the patch... perhaps this is big 
enough to warrant them?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-17 Thread Alvaro Herrera

Excerpts from Jim Nasby's message of mar ene 17 21:21:57 -0300 2012:
 On Jan 15, 2012, at 4:42 PM, Peter Geoghegan wrote:
  Attached is a patch that myself and Simon Riggs collaborated on. I
  took the group commit patch that Simon posted to the list back in
  November, and partially rewrote it.
 
 Forgive me if this is a dumb question, but I noticed a few places doing 
 things like:
 
 if (...)
   Blah();
 else
 {
   ...
 }
 
 Is that allowed PG formatting? I thought that if braces were required on one 
 branch of an if they were supposed to go on both sides.

Yeah, we even used to have pg_indent remove braces around single
statements, so if you had one statement in the if branch and more
around the other one, pg_indent would have left things like that anyway.

(This particular pg_indent behavior got removed because it messed up
PG_TRY blocks.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Double-write with Fast Checksums

2012-01-17 Thread Greg Smith

On 01/17/2012 03:47 PM, Kevin Grittner wrote:

I'm very curious about what impact DW would have on big servers with
write-back cache that becomes saturated, like in Greg Smith's post
here...


My guess is that a percentage of the dbt-2 run results posted here are 
hitting that sort of problem.  We just don't know which, because the 
results numbers posted were all the throughput numbers.  I haven't 
figure out a way to look for cache saturation issues other than 
collecting all the latency information for each transaction, then 
graphing that out if the worst-case value is poor.  It's quite possible 
they have that data but just didn't post just to keep the summary size 
managable, since dbt-2 collects a lot of information.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of mar ene 17 13:50:20 -0300 2012:
 
 On 01/16/2012 05:40 PM, Alvaro Herrera wrote:
  It's expected that we'll get a more reasonable interface to attachments,
  one that will allow you to download patches separately.  (Currently,
  attachments that have mime types other than text/plain are already
  downloadable separately).
 
 
 Are you really sure about that? My recent JSON patch is at 
 http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net. 
 I don't see any download link for the patch there, yet my mailer set the 
 attachment type to text/x-patch, not text/plain.

I tweaked the Mhonarc config and now this attachment (as well as many
others) is shown as a downloadable link.  Please give it a look.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-17 Thread Robert Haas
On Tue, Jan 17, 2012 at 12:37 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I found it very helpful to reduce wal_writer_delay in pgbench tests, when
 running with synchronous_commit=off. The reason is that hint bits don't get
 set until the commit record is flushed to disk, so making the flushes more
 frequent reduces the contention on the clog. However, Simon made async
 commits nudge WAL writer if the WAL page fills up, so I'm not sure how
 relevant that experience is anymore.

There's still a small but measurable effect there in master.  I think
we might be able to make it fully auto-tuning, but I don't think we're
fully there yet (not sure how much this patch changes that equation).

I suggested a design similar to the one you just proposed to Simon
when he originally suggested this feature.  It seems that if the WAL
writer is the only one doing WAL flushes, then there must be some IPC
overhead - and context switching - involved whenever WAL is flushed.
But clearly we're saving something somewhere else, on the basis of
Peter's results, so maybe it's not worth worrying about.  It does seem
pretty odd to have all the regular backends going through the WAL
writer and the auxiliary processes using a different mechanism,
though.  If we got rid of that, maybe WAL writer wouldn't even require
a lock, if there's only one process that can be doing it at a time.

What happens in standalone mode?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-17 Thread Andrew Dunstan



On 01/17/2012 07:09 PM, Jim Nasby wrote:

On Jan 13, 2012, at 4:15 PM, Christopher Browne wrote:

Have two logical tasks:
a) A process that manages the list, and
b) Child processes doing vacuums.

Each time a child completes a table, it asks the parent for another one.

There is also a middle ground, because having the the scheduling process sounds 
like a lot more work than what Josh was proposing.

CREATE TEMP SEQUENCE s;
SELECT relname, s modnumber of backends  AS backend_number
   FROM ( SELECT relname
FROM pg_class
ORDER BY relpages
);

Of course, having an actual scheduling process is most likely the most 
efficient.


We already have a model for this in parallel pg_restore. It would 
probably not be terribly hard to adapt to parallel vacuum.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-17 Thread Robert Haas
On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
 On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut pete...@gmx.net wrote:
  On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
  I don't see how setting indisvalid to false helps with this, because
  IIUC when a session sees indisvalid = false, it is supposed to avoid
  using the index for queries but still make new index entries when a
  write operation happens - but to drop an index, I think you'd need to
  get into a state where no one was using the index for anything at all.
 
  ISTM that one would need to set indisready to false instead.

 Maybe we should set both to false?

 Well, ready = false and valid = true doesn't make any sense.  There is
 only just-created - ready - valid.  We might as well convert that to a
 single char column, as you had indicated in your earlier email.  But
 that's independent of the proposed patch.

Sure, but the point is that I think if you want everyone to stop
touching the index, you ought to mark it both not-valid and not-ready,
which the current patch doesn't do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \timing vs failed statements

2012-01-17 Thread Robert Haas
On Tue, Jan 17, 2012 at 10:01 AM, Magnus Hagander mag...@hagander.net wrote:
 Right now, psql \timing output only gives output for successful
 queries. Is there any actual reason for this, or just a it happened?

 In particular,I just had a very long run of a CREATE UNIQUE INDEX fail
 pretty far in - and I would've really liked to have timing output for
 that one even though it failed.

 Thus - if I were to change psql to output timing on failed queries as
 well, will anybody object? ;)

I won't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-17 Thread Fujii Masao
On Wed, Jan 18, 2012 at 2:35 AM, Marti Raudsepp ma...@juffo.org wrote:
 Are there corner cases the author has failed to consider?

The hook can be executed by various processes since it's in
EmitErrorReport(). OTOH, log messages are written to the log
file by one process like syslogger (if you use csvlog or stderr)
or syslog process (if you use syslog). So ISTM that there is no
guarantee that the order of log messages processed by the
hook is same as that of messages written to the log file. For
example, imagine the case where two backends call EmitErrorReport()
at the same time. Is this OK? If not, the hook might need to be
in syslogger.

EmitErrorReport() can be called before shared library like Martin's
pg_logforward is preloaded. Which means that the hook may miss
some log messages. Is this OK?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-17 Thread Jim Nasby
On Jan 15, 2012, at 8:13 PM, Greg Smith wrote:
 On 01/15/2012 04:17 PM, Heikki Linnakangas wrote:
 I think it makes more sense to use the max read rate as the main knob, 
 rather than write rate. That's because the max read rate is higher than the 
 write rate, when you don't need to dirty pages. Or do you think saturating 
 the I/O system with writes is so much bigger a problem than read I/O that it 
 makes more sense to emphasize the writes?
 
 I haven't had the I/O rate logging available for long enough to have a good 
 feel for which is more important to emphasize.  I'm agnostic on this.  I'd 
 have no problem accepting the argument that exposing the larger of the two 
 rates--which is the read one--makes for a cleaner UI.  Or that it is the one 
 more like other knobs setting precedents here.

Could we expose both?

On our systems writes are extremely cheap... we don't do a ton of them 
(relatively speaking), so they tend to just fit into BBU cache. Reads on the 
other hard are a lot more expensive, at least if they end up actually hitting 
disk. So we actually set page_dirty and page_hit the same.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of mar ene 17 22:23:13 -0300 2012:
 Excerpts from Andrew Dunstan's message of mar ene 17 13:50:20 -0300 2012:
  
  On 01/16/2012 05:40 PM, Alvaro Herrera wrote:
   It's expected that we'll get a more reasonable interface to attachments,
   one that will allow you to download patches separately.  (Currently,
   attachments that have mime types other than text/plain are already
   downloadable separately).
  
  
  Are you really sure about that? My recent JSON patch is at 
  http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net. 
  I don't see any download link for the patch there, yet my mailer set the 
  attachment type to text/x-patch, not text/plain.
 
 I tweaked the Mhonarc config and now this attachment (as well as many
 others) is shown as a downloadable link.  Please give it a look.

Hm, I notice it works almost every patch I've checked, except the ones
from Tom such as this one:
http://archives.postgresql.org/message-id/4643.1326776...@sss.pgh.pa.us

The problem is that this one doesn't have the
Content-Disposition: attachment
line in the MIME header.  I don't know what we can do about it.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-17 Thread Robert Haas
On Tue, Jan 17, 2012 at 10:55 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Do I modify the patch to place object-access-hook on deleteOneObject
 (probably, it is the best position to track actual deletion)?
 One problem is case of deletion of columns by ALTER TABLE.
 It just marks attisdropped flag; without removing catalog entry.
 Do we ought to put this hook on ATExecDropColumn exceptionally?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)

2012-01-17 Thread Robert Haas
On Sun, Jan 8, 2012 at 10:52 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 BTW, can you also resubmit the leakproof stuff as a separate patch for
 the last CF?  Want to make sure we get that into 9.2, if at all
 possible.

 Yes, it shall be attached on the next message.

 The attached patch adds LEAKPROOF attribute to pg_proc; that
 enables DBA to set up obviously safe functions to be pushed down
 into sub-query even if it has security-barrier attribute.
 We assume this LEAKPROOF attribute shall be applied on operator
 functions being used to upgrade execute plan from Seq-Scan to
 Index-Scan.

 The default is without-leakproof attribute on creation of functions,
 and it requires superuser privilege to switch on.

The create_function_3 regression test fails for me with this applied:

*** /Users/rhaas/pgsql/src/test/regress/expected/create_function_3.out
 2012-01-17 22:09:01.0 -0500
--- /Users/rhaas/pgsql/src/test/regress/results/create_function_3.out
 2012-01-17 22:14:48.0 -0500
***
*** 158,165 
   'functext_E_2'::regproc);
 proname| proleakproof
  --+--
-  functext_e_2 | t
   functext_e_1 | t
  (2 rows)

  -- list of built-in leakproof functions
--- 158,165 
   'functext_E_2'::regproc);
 proname| proleakproof
  --+--
   functext_e_1 | t
+  functext_e_2 | t
  (2 rows)

  -- list of built-in leakproof functions
***
*** 476,485 
   'functext_F_4'::regproc);
 proname| proisstrict
  --+-
-  functext_f_1 | f
   functext_f_2 | t
   functext_f_3 | f
   functext_f_4 | t
  (4 rows)

  -- Cleanups
--- 476,485 
   'functext_F_4'::regproc);
 proname| proisstrict
  --+-
   functext_f_2 | t
   functext_f_3 | f
   functext_f_4 | t
+  functext_f_1 | f
  (4 rows)

  -- Cleanups

The new regression tests I just committed need updating as well.

Instead of contains_leakable_functions I suggest
contains_leaky_functions or contains_non_leakproof_functions, because
leakable isn't really a word (although I know what you mean).

The design of this function also doesn't seem very future-proof.  If
someone adds a new node type that can contain a function call, and
forgets to add it here, then we've got a subtle security hole.  Is
there some reasonable way to design this so that we assume
everything's dangerous except for those things we know are safe,
rather than the reverse?

I think you need to do a more careful check of which functions you're
marking leakproof - e.g. timestamp_ne_timestamptz isn't, at least
according to my understanding of the term.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-01-17 Thread Nikhil Sontakke
  It appears that the only way to create a non-inherited CHECK constraint
  is using ALTER TABLE.  Is there no support in CREATE TABLE planned?
  That looks a bit odd.
 
  There are no plans to do that AFAIR, though maybe you could convince
  Nikhil to write the patch to do so.

 That certainly doesn't meet the principle of least surprise... CREATE
 TABLE should support this.


Well, the above was thought about during the original discussion and
eventually we felt that CREATE TABLE already has other issues as well, so
not having this done as part of creating a table was considered acceptable
then:

http://postgresql.1045698.n5.nabble.com/Check-constraints-on-partition-parents-only-tt464.html#a4647144

But, let me have a stab at it when I get some free cycles.

Regards,
Nikhils


Re: [HACKERS] psql \timing vs failed statements

2012-01-17 Thread Noah Misch
On Tue, Jan 17, 2012 at 04:01:23PM +0100, Magnus Hagander wrote:
 Thus - if I were to change psql to output timing on failed queries as
 well, will anybody object? ;)

+1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Client Messages

2012-01-17 Thread Fujii Masao
On Fri, Jan 6, 2012 at 1:38 AM, Jim Mlodgenski jimm...@gmail.com wrote:
 I have a need to send banner messages to a psql client that I can set
 on the server and will be displayed on any psql client that connects
 to the database. This would be mostly used as an additional indicator
 to which database you are connecting, but could also be used by people
 to force their users to see an security message when connecting to the
 database. The attached patch will allow you to execute

 ALTER DATABASE postgres SET
 client_message=E'\nBEWARE:
 You are connecting to a production database. If you do anything to\n
     bring this server down, you will be destroyed by your supreme
 overlord.\n\n';

 And then when you connect to psql, you will see:

 [e3@workstation bin]$ ./psql -U user1 postgres
 psql (9.2devel)
 
 BEWARE: You are connecting to a production database. If you do anything to
        bring this server down, you will be destroyed by your supreme overlord.
 

 Type help for help.

 postgres=


 Any feedback is welcome.

Adding new GUC parameter only for the purpose of warning psql users
seems overkill to me. Basically we try to reduce the number of GUC
parameters to make a configuration easier to a user, so I don't think that
it's good idea to add new GUC for such a small benefit. Instead, how
about using .psqlrc file and writing a warning message in it by using
\echo command?

Anyway, I found one problem in the patch. The patch defines client_message
as PGC_USERSET parameter, which means that any psql can falsify a
warning message, e.g., by setting the environment variable PGOPTIONS
to -c client_message=hoge. This seems to be something to avoid from
security point of view.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding shutdown checkpoint at failover

2012-01-17 Thread Fujii Masao
On Sun, Nov 13, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Nov 1, 2011 at 12:11 PM, Simon Riggs si...@2ndquadrant.com wrote:

 When I say skip the shutdown checkpoint, I mean remove it from the
 critical path of required actions at the end of recovery. We can still
 have a normal checkpoint kicked off at that time, but that no longer
 needs to be on the critical path.

 Any problems foreseen? If not, looks like a quick patch.

 Patch attached for discussion/review.

This feature is what I want, and very helpful to shorten the failover time in
streaming replication.

Here are the review comments. Though I've not checked enough whether
this feature works fine in all recovery patterns yet.

LocalSetXLogInsertAllowed() must be called before LogEndOfRecovery().
LocalXLogInsertAllowed must be set to -1 after LogEndOfRecovery().

XLOG_END_OF_RECOVERY record is written to the WAL file with new
assigned timeline ID. But it must be written to the WAL file with old one.
Otherwise, when re-entering a recovery after failover, we cannot find
XLOG_END_OF_RECOVERY record at all.

Before XLOG_END_OF_RECOVERY record is written,
RmgrTable[rmid].rm_cleanup() might write WAL records. They also
should be written to the WAL file with old timeline ID.

When recovery target is specified, we cannot write new WAL to the file
with old timeline because which means that valid WAL records in it are
overwritten with new WAL. So when recovery target is specified,
ISTM that we cannot skip end of recovery checkpoint. Or we might need
to save all information about timelines in the database cluster instead
of writing XLOG_END_OF_RECOVERY record, and use it when re-entering
a recovery.

LogEndOfRecovery() seems to need to call XLogFlush(). Otherwise,
what if the server crashes after new timeline history file is created and
recovery.conf is removed, but before XLOG_END_OF_RECOVERY record
has not been flushed to the disk yet?

During recovery, when we replay XLOG_END_OF_RECOVERY record, we
should close the currently-opened WAL file and read the WAL file with
the timeline which XLOG_END_OF_RECOVERY record indicates.
Otherwise, when re-entering a recovery with old timeline, we cannot
reach new timeline.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix Leaky View Problem

2012-01-17 Thread Robert Haas
On Sun, Jan 8, 2012 at 10:32 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I guess you concerned about that expected/select_views_1.out is
 patched, not expected/select_views.out.
 I'm not sure the reason why regression test script tries to make diff
 between results/select_views and expected/select_views_1.out.

 select_views.out and select_views_1.out are alternate expected output
 files.  The regression tests pass if the actual output matches either
 one.  Thus, you have to patch both.

 It was new for me. The attached patch updates both of the expected
 files, however, I'm not certain whether select_view.out is suitable, or
 not, because my results/select_view.out matched with 
 expected/select_view_1.out.

Committed.  We'll see what the buildfarm thinks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers