date:20120312

Corrected: Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-12 Thread Etsuro Fujita

(2012/03/12 13:04), Etsuro Fujita wrote:
 (2012/03/09 23:48), Tom Lane wrote:
 Etsuro Fujitafujita.ets...@lab.ntt.co.jp   writes:

 2. IMHO RelOptInfo.fdw_private seems confusing.  How about renaming it
 to e.g., RelOptInfo.fdw_state?

 Why is that better?  It seems just as open to confusion with another
 field (ie, the execution-time fdw_state).
 
 I thought the risk.  However, I feel that the naming of
 RelOptInfo.fdw_state is not so bad because it is used only at the query
 planning time, not used along with the execution-time fdw_private.

I wrote the execution-time fdw_private by mistake.  I meant the
execution-time fdw_state.  I'm sorry about that.

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-12 Thread Alexander Korotkov

On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  True. If (max count - min count + 1) is small, enumerating of frequencies
  is both more compact and more precise representation. Simultaneously,
  if (max count - min count + 1) is large, we can run out of
  statistics_target with such representation. We can use same
 representation
  of count distribution as for scalar column value: MCV and HISTOGRAM, but
 it
  would require additional statkind and statistics slot. Probably, you've
  better ideas?

 I wasn't thinking of introducing two different representations,
 but just trimming the histogram length when it's larger than necessary.

 On reflection my idea above is wrong; for example assume that we have a
 column with 900 arrays of length 1 and 100 arrays of length 2.  Going by
 what I said, we'd reduce the histogram to {1,2}, which might accurately
 capture the set of lengths present but fails to show that 1 is much more
 common than 2.  However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
 would capture the situation perfectly in one-tenth the space that the
 current logic does.

 More generally, by limiting the histogram to statistics_target entries,
 we are already accepting errors of up to 1/(2*statistics_target) in the
 accuracy of the bin-boundary representation.  What the above example
 shows is that sometimes we could meet the same accuracy requirement with
 fewer entries.  I'm not sure how this could be mechanized but it seems
 worth thinking about.


I can propose following representation of histogram.

If (max_count - min_count + 1) = statistics_target then
1) store max_count and min_count in stavalues
2) store frequencies from min_count ot max_count in numvalues

If (max_count - min_count + 1)  statistics_target then
store histogram in current manner. I think in this case it's unlikely to be
many repeating values.

I can propose patch which change histogram representation to this.

Comments?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-12 Thread Simon Riggs

On Sat, Mar 10, 2012 at 2:59 PM, Andrew Dunstan and...@dunslane.net wrote:

 The devil is in the details, though, pace Mies van der Rohe.

 In particular, it's the tight integration piece I'm worried about.

 What is the postmaster supposed to do if the daemon start fails? What if it
 gets a flood of failures? What access will the daemon have to Postgres
 internals? What OS privileges will it have, since this would have to run as
 the OS postgres user? In general I think we don't want arbitrary processes
 running as the OS postgres user.

So why are the answers to those questions different for a daemon than
for a C function executed from an external client? What additional
exposure is there?

 I accept that cron might not be the best tool for the jobs, since a) its
 finest granularity is 1 minute and b) it would need a new connection for
 each job. But a well written external daemon that runs as a different user
 and is responsible for making its own connection to the database and
 re-establishing it if necessary, seems to me at least as clean a design for
 a job scheduler as one that is stopped and started by the postmaster.

As of this thread, you can see that many people don't agree. Bear in
mind that nobody is trying to prevent you from writing a program in
that way if you believe that. That route will remain available.

It's a key aspect of modular software we're talking about. People want
to have programs that are intimately connected to the database, so
that nobody needs to change the operational instructions when they
start or stop the database.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-12 Thread Albe Laurenz

Tom Lane wrote:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 Thanks for the review.  Agreed to write own depraser for pgsql_fdw
 which handles nodes which can be pushed down.  Every SQL-based FDW
 which constructs SQL statement for each local query would need such
 module inside.
 
 Yeah.  That's kind of annoying, and the first thing you think of is
that
 we ought to find a way to share that code somehow.  But I think it's
 folly to try to design a shared implementation until we have some
 concrete implementations to compare.  An Oracle FDW, for instance,
would
 need to emit SQL code with many differences in detail from pgsql_fdw.
 It's not clear to me whether a shared implementation is even
practical,
 but for sure I don't want to try to build it before we've done some
 prototype single-purpose implementations.

Having written something like that for Oracle, I tend to share that
opinion.  Anything general-purpose enough to cater for every whim and
oddity of the remote system would probably be so unwieldy that it
wouldn't be much easier to use it than to write the whole thing from
scratch.  To illustrate this, a few examples from the Oracle case:

- Empty strings have different semantics in Oracle (to wit, they mean
NULL).
  So you can push down all string constants except empty strings.
- Oracle can only represent intervals with just year and month
  or with just day of month and smaller fields.  So you can either
  punt on intervals or translate only the ones that fit the bill.
- You can push down - for date arithmetic except when both
  operands on the Oracle side are of type DATE, because that would
  result in a NUMERIC value (number of days between).

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] poll: CHECK TRIGGER?

2012-03-12 Thread Pavel Stehule

2012/3/11 Petr Jelinek pjmo...@pjmodos.net:
 On 03/10/2012 04:36 PM, Pavel Stehule wrote:

 and there some cleaned version




 Reran my tests and adjusted docs a bit, tbh I considered the previous
 versions way more useful but even in this form it's nice and useful
 functionality.


remove two lines of death code

Regards

Pavel

 Regards
 Petr Jelinek


plpgsql_check_function2-2012-03-12.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] NOT NULL violation error handling in file_fdw

2012-03-12 Thread Etsuro Fujita

According to the following documentation on IterateForeignScan() in
50.2. Foreign Data Wrapper Callback Routines, I have created a patch to
support the error handling in file_fdw.  Please find attached a patch.

Note that PostgreSQL's executor doesn't care whether the rows
returned violate the NOT NULL constraints which were defined
on the foreign table columns - but the planner does care, and
may optimize queries incorrectly if NULL values are present
in a column declared not to contain them. If a NULL value is
encountered when the user has declared that none should be
present, it may be appropriate to raise an error (just as you
would need to do in the case of a data type mismatch).

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 502,507  fileIterateForeignScan(ForeignScanState *node)
--- 502,510 
  {
FileFdwExecutionState *festate = (FileFdwExecutionState *) 
node-fdw_state;
TupleTableSlot *slot = node-ss.ss_ScanTupleSlot;
+   Relationrel = node-ss.ss_currentRelation;
+   TupleConstr *constr = rel-rd_att-constr;
+   boolhas_not_null = (constr != NULL) ? constr-has_not_null 
: false;
boolfound;
ErrorContextCallback errcontext;
  
***
*** 528,534  fileIterateForeignScan(ForeignScanState *node)
--- 531,556 
 slot-tts_values, 
slot-tts_isnull,
 NULL);
if (found)
+   {
ExecStoreVirtualTuple(slot);
+   if (has_not_null)
+   {
+   int natts = rel-rd_att-natts;
+   int attrChk;
+ 
+   for (attrChk = 1; attrChk = natts; attrChk++)
+   {
+   if (rel-rd_att-attrs[attrChk - 1]-attnotnull 

+   slot_attisnull(slot, attrChk))
+   ereport(ERROR,
+   
(errcode(ERRCODE_NOT_NULL_VIOLATION),
+errmsg(null value in 
column \%s\ violates not-null constraint,
+   
NameStr(rel-rd_att-attrs[attrChk - 1]-attname)),
+errdetail(Failing row 
contains %s.,
+  
ExecBuildSlotValueDescription(slot, 64;
+   }
+   }
+   }
  
/* Remove error callback. */
error_context_stack = errcontext.previous;
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 86,93  static void ExecutePlan(EState *estate, PlanState *planstate,
DestReceiver *dest);
  static bool ExecCheckRTEPerms(RangeTblEntry *rte);
  static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
- static char *ExecBuildSlotValueDescription(TupleTableSlot *slot,
-   
   int maxfieldlen);
  static void EvalPlanQualStart(EPQState *epqstate, EState *parentestate,
  Plan *planTree);
  static void OpenIntoRel(QueryDesc *queryDesc);
--- 86,91 
***
*** 1604,1610  ExecConstraints(ResultRelInfo *resultRelInfo,
   * here since heap field values could be very long, whereas index entries
   * typically aren't so wide.
   */
! static char *
  ExecBuildSlotValueDescription(TupleTableSlot *slot, int maxfieldlen)
  {
StringInfoData buf;
--- 1602,1608 
   * here since heap field values could be very long, whereas index entries
   * typically aren't so wide.
   */
! char *
  ExecBuildSlotValueDescription(TupleTableSlot *slot, int maxfieldlen)
  {
StringInfoData buf;
*** a/src/include/executor/executor.h
--- b/src/include/executor/executor.h
***
*** 184,189  extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, 
Oid relid);
--- 184,191 
  extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids);
  extern void ExecConstraints(ResultRelInfo *resultRelInfo,
TupleTableSlot *slot, EState *estate);
+ extern char *ExecBuildSlotValueDescription(TupleTableSlot *slot,
+   
   int maxfieldlen);
  extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti);
  extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List 
*targetlist);
  extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Bug: walsender and high CPU usage

2012-03-12 Thread Heikki Linnakangas


On 09.03.2012 13:40, Fujii Masao wrote:

I found the bug which causes walsender to enter into busy loop
when replication connection is terminated. Walsender consumes
lots of CPU resource (%sys), and this situation lasts until it has
detected the termination of replication connection and exited.

The cause of this bug is that the walsender loop doesn't call
ResetLatch at all in the above case. Since the latch remains set,
the walsender loop cannot sleep on the latch, i.e., WaitLatch
always returns immediately.

We can fix this bug by adding ResetLatch into the top of the
walsender loop. Patch attached.

This bug exists in 9.1 but not in 9.2dev. In 9.2dev, this bug has
already been fixed by the commit
(cff75130b5f63e45423c2ed90d6f2e84c21ef840). This commit
refactors and refines the walsender loop logic in addition to
adding ResetLatch. So I'm tempted to backport this commit
(except the deletion of wal_sender_delay) to 9.1 rather than
applying the attached patch. OTOH, attached patch is quite simple,
and its impact on 9.1 would be very small, so it's easy to backport that.
Thought?


This patch makes the code that follows bogus:


/*
 * If we don't have any pending data in the output buffer, try 
to send
 * some more.
 */
if (!pq_is_send_pending())
{
XLogSend(output_message, caughtup);

/*
 * Even if we wrote all the WAL that was available when 
we started
 * sending, more might have arrived while we were 
sending this
 * batch. We had the latch set while sending, so we 
have not
 * received any signals from that time. Let's arm the 
latch again,
 * and after that check that we're still up-to-date.
 */
if (caughtup  !pq_is_send_pending())
{
ResetLatch(MyWalSnd-latch);

XLogSend(output_message, caughtup);
}
}


The comment is no longer valid, and the calls to ResetLatch and XLogSend 
are no longer necessary, once you have the ResetLatch() call at the top 
of the loop.


I also think we should backport commit 
cff75130b5f63e45423c2ed90d6f2e84c21ef840, except for the removal of 
wal_sender_delay).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect behaviour when using a GiST index on points

2012-03-12 Thread Alexander Korotkov

I believe that attached version of patch can be backpatched. It fixes this
problem without altering of index building procedure. It just makes checks
in internal pages softener enough to compensate effect of gist_box_same
implementation.

--
With best regards,
Alexander Korotkov.
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***
*** 70,76  gist_box_consistent(PG_FUNCTION_ARGS)
  
  	if (DatumGetBoxP(entry-key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
! 
  	/*
  	 * if entry is not leaf, use rtree_internal_consistent, else use
  	 * gist_box_leaf_consistent
--- 70,76 
  
  	if (DatumGetBoxP(entry-key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
! 	
  	/*
  	 * if entry is not leaf, use rtree_internal_consistent, else use
  	 * gist_box_leaf_consistent
***
*** 80,88  gist_box_consistent(PG_FUNCTION_ARGS)
  query,
  strategy));
  	else
! 		PG_RETURN_BOOL(rtree_internal_consistent(DatumGetBoxP(entry-key),
   query,
   strategy));
  }
  
  static void
--- 80,102 
  query,
  strategy));
  	else
! 	{
! 		/*
! 		 * Box in internal page can be narrower than box in leaf page not
! 		 * more than EPSILON in each boundary. Do corresponding correction.
! 		 */
! 		BOX			key, *entrykey;
! 		
! 		entrykey = DatumGetBoxP(entry-key);
! 		key.low.x = entrykey-low.x - EPSILON;
! 		key.low.y = entrykey-low.y - EPSILON;
! 		key.high.x = entrykey-high.x + EPSILON;
! 		key.high.y = entrykey-high.y + EPSILON;
! 		
! 		PG_RETURN_BOOL(rtree_internal_consistent(key,
   query,
   strategy));
+ 	}
  }
  
  static void
***
*** 847,852  gist_box_same(PG_FUNCTION_ARGS)
--- 861,871 
  	BOX		   *b2 = PG_GETARG_BOX_P(1);
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
+ 	/*
+ 	 * box_same function allow difference between boxes limited by EPSILON.
+ 	 * Thus box in internal page can be narrower than box in leaf page not
+ 	 * more than EPSILON in each boundary.
+ 	 */
  	if (b1  b2)
  		*result = DatumGetBool(DirectFunctionCall2(box_same,
     PointerGetDatum(b1),
***
*** 1072,1077  gist_poly_consistent(PG_FUNCTION_ARGS)
--- 1091,1097 
  	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
  	POLYGON*query = PG_GETARG_POLYGON_P(1);
  	StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
+ 	BOX			key, *entrykey;
  
  	/* Oid		subtype = PG_GETARG_OID(3); */
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
***
*** 1083,1094  gist_poly_consistent(PG_FUNCTION_ARGS)
  	if (DatumGetBoxP(entry-key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
  
  	/*
  	 * Since the operators require recheck anyway, we can just use
  	 * rtree_internal_consistent even at leaf nodes.  (This works in part
  	 * because the index entries are bounding boxes not polygons.)
  	 */
! 	result = rtree_internal_consistent(DatumGetBoxP(entry-key),
  	   (query-boundbox), strategy);
  
  	/* Avoid memory leak if supplied poly is toasted */
--- 1103,1128 
  	if (DatumGetBoxP(entry-key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
  
+ 	entrykey = DatumGetBoxP(entry-key);
+ 	if (!GIST_LEAF(entry))
+ 	{
+ 		/*
+ 		 * Box in internal page can be narrower than box in leaf page not
+ 		 * more than EPSILON in each boundary. Do corresponding correction.
+ 		 */
+ 		key.low.x = entrykey-low.x - EPSILON;
+ 		key.low.y = entrykey-low.y - EPSILON;
+ 		key.high.x = entrykey-high.x + EPSILON;
+ 		key.high.y = entrykey-high.y + EPSILON;
+ 		entrykey = key;
+ 	}
+ 	
  	/*
  	 * Since the operators require recheck anyway, we can just use
  	 * rtree_internal_consistent even at leaf nodes.  (This works in part
  	 * because the index entries are bounding boxes not polygons.)
  	 */
! 	result = rtree_internal_consistent(entrykey,
  	   (query-boundbox), strategy);
  
  	/* Avoid memory leak if supplied poly is toasted */
***
*** 1152,1158  gist_circle_consistent(PG_FUNCTION_ARGS)
  
  	/* Oid		subtype = PG_GETARG_OID(3); */
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
! 	BOX			bbox;
  	bool		result;
  
  	/* All cases served by this function are inexact */
--- 1186,1192 
  
  	/* Oid		subtype = PG_GETARG_OID(3); */
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
! 	BOX			bbox, *entrykey, key;
  	bool		result;
  
  	/* All cases served by this function are inexact */
***
*** 1170,1177  gist_circle_consistent(PG_FUNCTION_ARGS)
  	bbox.low.x = query-center.x - query-radius;
  	bbox.high.y = query-center.y + query-radius;
  	bbox.low.y = query-center.y - query-radius;
  
! 	result = rtree_internal_consistent(DatumGetBoxP(entry-key),
  	   bbox, strategy);
  
  	PG_RETURN_BOOL(result);
--- 1204,1225 
  	bbox.low.x = query-center.x - query-radius;
  	bbox.high.y = query-center.y + query-radius;
  	bbox.low.y =

Re: [HACKERS] Bug: walsender and high CPU usage

2012-03-12 Thread Fujii Masao

On Mon, Mar 12, 2012 at 7:58 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 09.03.2012 13:40, Fujii Masao wrote:

 I found the bug which causes walsender to enter into busy loop
 when replication connection is terminated. Walsender consumes
 lots of CPU resource (%sys), and this situation lasts until it has
 detected the termination of replication connection and exited.

 The cause of this bug is that the walsender loop doesn't call
 ResetLatch at all in the above case. Since the latch remains set,
 the walsender loop cannot sleep on the latch, i.e., WaitLatch
 always returns immediately.

 We can fix this bug by adding ResetLatch into the top of the
 walsender loop. Patch attached.

 This bug exists in 9.1 but not in 9.2dev. In 9.2dev, this bug has
 already been fixed by the commit
 (cff75130b5f63e45423c2ed90d6f2e84c21ef840). This commit
 refactors and refines the walsender loop logic in addition to
 adding ResetLatch. So I'm tempted to backport this commit
 (except the deletion of wal_sender_delay) to 9.1 rather than
 applying the attached patch. OTOH, attached patch is quite simple,
 and its impact on 9.1 would be very small, so it's easy to backport that.
 Thought?


 This patch makes the code that follows bogus:

                /*
                 * If we don't have any pending data in the output buffer,
 try to send
                 * some more.
                 */
                if (!pq_is_send_pending())
                {
                        XLogSend(output_message, caughtup);

                        /*
                         * Even if we wrote all the WAL that was available
 when we started
                         * sending, more might have arrived while we were
 sending this
                         * batch. We had the latch set while sending, so we
 have not
                         * received any signals from that time. Let's arm
 the latch again,
                         * and after that check that we're still
 up-to-date.
                         */
                        if (caughtup  !pq_is_send_pending())
                        {
                                ResetLatch(MyWalSnd-latch);

                                XLogSend(output_message, caughtup);
                        }
                }


 The comment is no longer valid, and the calls to ResetLatch and XLogSend are
 no longer necessary, once you have the ResetLatch() call at the top of the
 loop.

Right.

 I also think we should backport commit
 cff75130b5f63e45423c2ed90d6f2e84c21ef840, except for the removal of
 wal_sender_delay).

Agreed. The attached patch is the same as
cff75130b5f63e45423c2ed90d6f2e84c21ef840,
except for the removal of wal_sender_delay. Could you apply this?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug: walsender and high CPU usage

2012-03-12 Thread Fujii Masao

On Mon, Mar 12, 2012 at 10:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 7:58 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 09.03.2012 13:40, Fujii Masao wrote:

 I found the bug which causes walsender to enter into busy loop
 when replication connection is terminated. Walsender consumes
 lots of CPU resource (%sys), and this situation lasts until it has
 detected the termination of replication connection and exited.

 The cause of this bug is that the walsender loop doesn't call
 ResetLatch at all in the above case. Since the latch remains set,
 the walsender loop cannot sleep on the latch, i.e., WaitLatch
 always returns immediately.

 We can fix this bug by adding ResetLatch into the top of the
 walsender loop. Patch attached.

 This bug exists in 9.1 but not in 9.2dev. In 9.2dev, this bug has
 already been fixed by the commit
 (cff75130b5f63e45423c2ed90d6f2e84c21ef840). This commit
 refactors and refines the walsender loop logic in addition to
 adding ResetLatch. So I'm tempted to backport this commit
 (except the deletion of wal_sender_delay) to 9.1 rather than
 applying the attached patch. OTOH, attached patch is quite simple,
 and its impact on 9.1 would be very small, so it's easy to backport that.
 Thought?


 This patch makes the code that follows bogus:

                /*
                 * If we don't have any pending data in the output buffer,
 try to send
                 * some more.
                 */
                if (!pq_is_send_pending())
                {
                        XLogSend(output_message, caughtup);

                        /*
                         * Even if we wrote all the WAL that was available
 when we started
                         * sending, more might have arrived while we were
 sending this
                         * batch. We had the latch set while sending, so we
 have not
                         * received any signals from that time. Let's arm
 the latch again,
                         * and after that check that we're still
 up-to-date.
                         */
                        if (caughtup  !pq_is_send_pending())
                        {
                                ResetLatch(MyWalSnd-latch);

                                XLogSend(output_message, caughtup);
                        }
                }


 The comment is no longer valid, and the calls to ResetLatch and XLogSend are
 no longer necessary, once you have the ResetLatch() call at the top of the
 loop.

 Right.

 I also think we should backport commit
 cff75130b5f63e45423c2ed90d6f2e84c21ef840, except for the removal of
 wal_sender_delay).

 Agreed. The attached patch is the same as
 cff75130b5f63e45423c2ed90d6f2e84c21ef840,
 except for the removal of wal_sender_delay. Could you apply this?

Oh, I forgot to attach the patch. Patch attached really.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 476,482  ProcessRepliesIfAny(void)
  {
  	unsigned char firstchar;
  	int			r;
! 	int			received = false;
  
  	for (;;)
  	{
--- 476,482 
  {
  	unsigned char firstchar;
  	int			r;
! 	bool			received = false;
  
  	for (;;)
  	{
***
*** 700,705  WalSndLoop(void)
--- 700,708 
  	/* Loop forever, unless we get an error */
  	for (;;)
  	{
+ 		/* Clear any already-pending wakeups */
+ 		ResetLatch(MyWalSnd-latch);
+ 
  		/*
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
***
*** 718,777  WalSndLoop(void)
  		/* Normal exit from the walsender is here */
  		if (walsender_shutdown_requested)
  		{
! 			/* Inform the standby that XLOG streaming was done */
  			pq_puttextmessage('C', COPY 0);
  			pq_flush();
  
  			proc_exit(0);
  		}
  
  		/*
  		 * If we don't have any pending data in the output buffer, try to send
! 		 * some more.
  		 */
  		if (!pq_is_send_pending())
- 		{
  			XLogSend(output_message, caughtup);
  
  			/*
! 			 * Even if we wrote all the WAL that was available when we started
! 			 * sending, more might have arrived while we were sending this
! 			 * batch. We had the latch set while sending, so we have not
! 			 * received any signals from that time. Let's arm the latch again,
! 			 * and after that check that we're still up-to-date.
  			 */
! 			if (caughtup  !pq_is_send_pending())
  			{
! ResetLatch(MyWalSnd-latch);
  
  XLogSend(output_message, caughtup);
  			}
  		}
  
- 		/* Flush pending output to the client */
- 		if (pq_flush_if_writable() != 0)
- 			break;
- 
  		/*
! 		 * When SIGUSR2 arrives, we send any outstanding logs up to the
! 		 * shutdown checkpoint record (i.e., the latest record) and exit.
  		 */
! 		if (walsender_ready_to_stop  !pq_is_send_pending())
! 		{
! 			XLogSend(output_message,

[HACKERS] elegant and effective way for running jobs inside a database

2012-03-12 Thread Artur Litwinowicz

Dear Developers,
   I am looking for elegant and effective way for running jobs inside a
database or cluster - for now I can not find that solution.
OK if You say use cron or pgAgent I say I know that solutions, but
the are not effective and elegant. Compilation of pgAgent is a pain
(especially wxWidgets usage on system with no X) - it can run jobs with
minimal 60s periods but what when someone needs run it faster for eg.
with 5s period ? Of course using cron I can do that but it is not
effective and elegant solution. Why PostgreSQL can not have so elegant
solution like Oracle database ? I am working with Oracle databases for
many years, but I like much more PostgreSQL project but this one
thing... I can not understand - the lack of jobs inside the database...

Best regards,
Artur



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] query planner does not canonicalize infix operators

2012-03-12 Thread Will Leinweber

I created an index on an hstore function, fetchval(hstore, text), however
when I use the - infix operator which resolves to the very same function,
this index is not used. It should be used.

I have included an example:

Table with hstore index:

de10keipt01939= \d log_data
  Table public.log_data
 Column |   Type   |   Modifiers

+--+---
 id | bigint   | not null default
nextval('log_data_id_seq'::regclass)
 time   | timestamp with time zone |
 data   | hstore   |
Indexes:
index_log_data_by_time btree (time)
index_participant_id btree (fetchval(data, 'participant_id'::text))

query with function notation:

de10keipt01939= explain ANALYZE select * from log_data where
(data-'participant_id')='2851' order by id desc;
 QUERY PLAN

-
 Sort  (cost=16432.56..16433.36 rows=1583 width=315) (actual
time=198.643..198.777 rows=183 loops=1)
   Sort Key: id
   Sort Method: quicksort  Memory: 119kB
   -  Seq Scan on log_data  (cost=0.00..16415.74 rows=1583 width=315)
(actual time=6.926..198.297 rows=183 loops=1)
 Filter: ((data - 'participant_id'::text) = '2851'::text)
 Total runtime: 198.922 ms
(6 rows)

query with infix notation:

de10keipt01939= explain ANALYZE select * from log_data where
fetchval(data,'participant_id')='2851' order by id desc;
  QUERY PLAN

---
 Sort  (cost=341.14..341.23 rows=179 width=315) (actual time=0.724..0.841
rows=183 loops=1)
   Sort Key: id
   Sort Method: quicksort  Memory: 119kB
   -  Bitmap Heap Scan on log_data  (cost=2.35..339.80 rows=179 width=315)
(actual time=0.091..0.489 rows=183 loops=1)
 Recheck Cond: (fetchval(data, 'participant_id'::text) =
'2851'::text)
 -  Bitmap Index Scan on index_participant_id  (cost=0.00..2.34
rows=179 width=0) (actual time=0.060..0.060 rows=183 loops=1)
   Index Cond: (fetchval(data, 'participant_id'::text) =
'2851'::text)
 Total runtime: 1.010 ms
(8 rows)

—Will

[HACKERS] Requesting Ideas fro project proposal.

2012-03-12 Thread Ali Ahmed

I am looking for some project suggestions to make proposals for an OS
course at Columbia university (4-6weeks length single person), working with
Scribe ( kernel based application record and replay) focusing on DBMS ( I
would like to focus on postgres) .
I have some ideas on the line of explore usage for debugging and profiling
scenarios but wanted if someone can propose a better suggestion ?

Thanks,
-Ali

Re: [HACKERS] Requesting Ideas fro project proposal.

2012-03-12 Thread Ali Ahmed

Sorry ,here is the link the scribe project in reference it's a kernel
based record and replay system.
https://github.com/nviennot/linux-2.6-scribe

On Wed, Mar 7, 2012 at 5:25 PM, Ali Ahmed ahmal...@gmail.com wrote:


 I am looking for some project suggestions to make proposals for an OS
 course at Columbia university (4-6weeks length single person), working with
 Scribe ( kernel based application record and replay) focusing on DBMS ( I
 would like to focus on postgres) .
 I have some ideas on the line of explore usage for debugging and profiling
 scenarios but wanted if someone can propose a better suggestion ?

 Thanks,
 -Ali




-- 
-Ali

Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas

On Sun, Mar 11, 2012 at 11:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

 Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
 when it's set to -1. Thanks to your result, we should increase the max to
 32MB?

I think that might be a good idea, although I'm not entirely convinced
that we understand why increasing wal_buffers is helping as much as it
is.  I stuck an elog() into AdvanceXLInsertBuffer() to complain in the
case that we were writing buffers while holding the insert lock.
Then, I reran 30-minute tests 32 clients, one with wal_buffers=16MB
and the other wal_buffers=32MB.  On the 16MB test, the elog() fired 15
times in a single second shortly after the start of the test, and then
9 more times over the rest of the test.  On the 32MB test, the elog()
fired a total 6 times over the course of the test.  The first test got
14320 tps, while the second got 15026 tps.  I find that quite
surprising, because although WAL buffer wraparound is certainly bad
(on this test, it probably brings the system completely to a halt
until fsync() finishes) it really shouldn't lock up the system for
multiple seconds at a time.  And yet that's what it would need to be
doing to account for the tps discrepancy on this test, considering how
rarely it occurs.

Studying AdvanceXLInsertBuffer() a bit more, I'm wondering if the
problem isn't so much - or isn't only - that it's expensive to write
buffers while also holding WALInsertLock.  Maybe it's too expensive
even to acquire WalWriteLock in the first place - that is, the real
problem isn't so much the wraparound condition itself, but the expense
of testing whether a possible wraparound has actually occurred.  A
quick test suggests that we acquire WALWriteLock here much more often
than we actually write anything while holding it, and that we get a
big burst of WALWriteLock acquisitions here immediately after a
checkpoint.  I don't have any proof that this is what's causing the
tps drop with smaller wal_buffers, but I think there has to be
something other than an actual wraparound condition causing problems
here, because that just doesn't seem to happen frequently enough to be
an issue.

Anyway, maybe none of that matters at the moment.  Perhaps it's enough
to know that wal_buffers16MB can help, and just bump up the maximum
auto-tuned value a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Kohei KaiGai

2012/3/9 Robert Haas robertmh...@gmail.com:
 On Tue, Mar 6, 2012 at 9:14 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 [ new patch ]

 Are we absolutely certain that we want the semantics of
 sepgsql_setcon() to be transactional?  Because if we made them
 non-transactional, this would be a whole lot simpler, and it would
 still meet the originally proposed use case, which is to allow the
 security context of a connection to be changed on a one-time basis
 before handing it off to a client application.

I hesitate to implement sepgsql_setcon() being not-transaction aware,
because any other functionality are all transaction aware, such as
SECURITY LABEL statement.
Even though the original use-case from Joshua does not require it
being transaction-aware, I'd like to keep consistency of behavior
with any other features. Is it really unacceptable complexity?

 It seems to me that it would make more sense to view the set of
 security labels in effect as a stack.  When we enter a trusted
 procedure, it pushes a new label on the stack; when we exit a trusted
 procedure, it pops the top label off the stack.  sepgsql_setcon()
 changes the top label on the stack.  If we want to retain
 transactional semantics, then you can also have a toplevel label that
 doesn't participate in the stack; a commit copies the sole item on the
 stack into the toplevel label, and transaction start copies the
 toplevel label into an empty stack.  In the current coding, I think
 client_label_peer is redundant with client_label_committed - once the
 latter is set, the former is unused, IIUC - and what I'm proposing is
 that client_label_func shouldn't be separate, but rather should mutate
 the stack of labels maintained by client_label_pending.

I almost agree with your opinion. A semantics to stack security label
will be more straight-forward.

One reason of the redundant client_label_peer is to support reset
client label using NULL-input on sepgsql_setcon(). We need to save
the original value somewhere.

In case of sepgsql_setcon() being invoked inside of the trusted-
procedure, indeed, the existing behavior is confusing as you pointed
out. If we would changed the semantics using a stack structure,
an invocation of trusted procedure takes a label transition from A to B,
then it invokes sepgsql_setcon() to take a label transition from B to C,
and the label B shall be popped from the stack at end of the trusted
procedure. Eventually, the label C shall be applied on top of the stack.
Do we share same image to whole of the idea?

 The docs need updating, both to reflect the version bump in
 sepgsql-regtest.te and to describe the actual feature.

OK, please wait for a few days.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query planner does not canonicalize infix operators

2012-03-12 Thread Tom Lane

Will Leinweber w...@heroku.com writes:
 I created an index on an hstore function, fetchval(hstore, text), however
 when I use the - infix operator which resolves to the very same function,
 this index is not used. It should be used.

Don't hold your breath.  Create an index on the expression you intend to
use, not random respellings of it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Kohei KaiGai

2012/3/11 Yeb Havinga yebhavi...@gmail.com:
 On 2012-03-10 10:39, I  wrote:


 I can probably write some docs tomorrow.


 Attached is v5 of the patch, with is exactly equal to v4 but with added
 documentation.

Thanks for your dedicated volunteer. I'm under checking of the updates
at documentation.

 Some other notes.

 - Robert asked why sepgsql_setcon with NULL to reset the value to the
 initial client label was supported. Maybe this could be a leftover from the
 initial implementation as GUC variable?

It is a practical reason. In case when httpd open the connection to PG and
set a suitable security label according to the given credential prior to launch
of user application, then keep this connection for upcoming request, it is
worthwhile to reset security label of the client.

 - earlier I suggested preventing setting a new client label from a trusted
 procedure, however I just read in the original post that this was how the
 current usecase of Joshua is set up. Suggestion withdrawn.

In the scenario of Joshua's security policy, it does not allow httpd to issue
SQL commands except for the trusted procedure that calls sepgsql_setcon()
according to the given credential. (Thus, we have no way to set an arbitrary
security label without credentials.) The security label being switched is
allowed to issue SQL commands with restricted privileges, and also allows
to translate into httpd's domain.
If we would not support sepgsql_setcon() in trusted procedure, we have to
allow httpd to translate an arbitrary label thus it also disallow to keep
connection because it should not revert the client label to httpd (it means
restricted users enable to switch someone arbitrary!)

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Robert Haas

On Sat, Mar 10, 2012 at 4:35 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 The semantics are muddled because the client labels are mixed with labels
 from trusted procedures. The stack you described upthread may also exhibit
 surprising behaviour. If a trusted procedure is called, it's label is pushed
 onto the stack.

What I was proposing is that it would *replace* the label on top of the stack.

 Suppose it pushes another label by calling sepgsql_setcon().
 In the stack case, this is now the top of the stack and the result of
 sepgsql_get_client_label(). The procedure exists. What should now be the
 result of sepgsql_get_client_label()? Since labels are managed by a stack,
 we can only pop what's on top and need to pop twice,

The above avoids the need to pop twice.

 so we've lost the label
 pushed onto the stack by the trusted function, which means that trusted
 procedures cannot be used to change client labels beyond their own scope,
 but other functions would.

That's true, but I'm not convinced it's bad.  I mean, if you instruct
the system to change security labels for the duration of a trusted
procedure, then it shouldn't be surprising that you end up with the
same security label after it exits that you had before entering it.
At the very least, it has the virtue of being consistent with other
things, like how GUCs behave.  The current behavior is that you change
the context and you don't see the results of your own context change,
which seems far worse.

 Maybe this semantics muddling of trusted procedures and setting the client
 label can be avoided by denying changing the client label with
 sepgsql_setcon() from a trusted procedure, which would also make some sense:

 ERROR: cannot override the client label of a trusted procedure during
 execution.

That also seems possibly reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 10:58 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It is a practical reason. In case when httpd open the connection to PG and
 set a suitable security label according to the given credential prior to 
 launch
 of user application, then keep this connection for upcoming request, it is
 worthwhile to reset security label of the client.

But wait a minute - how is that any good?  That allows the client to
pretty trivially circumvent the security restriction that we were
trying to impose by doing sepgsql_setcon() in the first place.  It
doesn't matter how convenient it is if it's flagrantly insecure.

Am I missing something here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql COPY vs. ON_ERROR_ROLLBACK, multi-command strings

2012-03-12 Thread Alvaro Herrera


Excerpts from Noah Misch's message of jue mar 08 12:11:37 -0300 2012:
 
 On Wed, Mar 07, 2012 at 04:57:12PM -0500, Robert Haas wrote:

  As a side note, the documentation for PQexec() is misleading about
  what will happen if COPY is present in a multi-command string.  It
  says: Note however that the returned PGresult structure describes
  only the result of the last command executed from the string. Should
  one of the commands fail, processing of the string stops with it and
  the returned PGresult describes the error condition.  It does not
  explain that it also stops if it hits a COPY.  I had to read the
  source code for libpq to understand why this psql logic was coded the
  way it is.
 
 Agreed; I went through a similar process.  Awhile after reading the code, I
 found the behavior documented in section Functions Associated with the COPY
 Command:
 
   If a COPY command is issued via PQexec in a string that could contain
   additional commands, the application must continue fetching results via
   PQgetResult after completing the COPY sequence. Only when PQgetResult
   returns NULL is it certain that the PQexec command string is done and it is
   safe to issue more commands.
 
 I'm not quite sure what revision would help most here -- a cross reference,
 moving that content, duplicating that content.  Offhand, I'm inclined to move
 it to the PQexec() documentation with some kind of reference back from its
 original location.  Thoughts?

I would vote for moving it and adding a reference in the COPY functions
section.  That way, the PQexec doc is complete by itself without having
to duplicate anything.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Kohei KaiGai

2012/3/12 Robert Haas robertmh...@gmail.com:
 On Mon, Mar 12, 2012 at 10:58 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It is a practical reason. In case when httpd open the connection to PG and
 set a suitable security label according to the given credential prior to 
 launch
 of user application, then keep this connection for upcoming request, it is
 worthwhile to reset security label of the client.

 But wait a minute - how is that any good?  That allows the client to
 pretty trivially circumvent the security restriction that we were
 trying to impose by doing sepgsql_setcon() in the first place.  It
 doesn't matter how convenient it is if it's flagrantly insecure.

 Am I missing something here?

It is a practical reason. If we would not support the reset feature,
the application has to know the security label of itself, as a target
label to be reverted. However, I'm not certain the status of script-
language binding of libselinux feature to obtain the self label,
although it is supported on Perl, Ruby and PHP (with extension
by myself) at least.

It seems to me a reasonable cost to track the original label to
eliminate a restriction of application side that tries to revert
the security label once switched.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partitioning triggers doc patch

2012-03-12 Thread Kevin Grittner

Attached is a doc patch based on a suggestion by Rural Hunter to
address something he found confusing while setting up partitioning.
 
Original thread is here:
 
http://archives.postgresql.org/pgsql-admin/2012-03/msg00066.php
 
-Kevin

*** a/doc/src/sgml/ddl.sgml
--- b/doc/src/sgml/ddl.sgml
***
*** 2411,2416  VALUES ('New York', NULL, NULL, 'NY');
--- 2411,2420 
  constraints on this table, unless you intend them to
  be applied equally to all partitions.  There is no point
  in defining any indexes or unique constraints on it, either.
+ Because triggers on this table will not be fired for operations
+ on the related partitions, there is usually no reason to create
+ triggers on it other than the optional trigger described below
+ to redirect operations to the appropriate partitions.
 /para
/listitem
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 11:13 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/3/12 Robert Haas robertmh...@gmail.com:
 On Mon, Mar 12, 2012 at 10:58 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It is a practical reason. In case when httpd open the connection to PG and
 set a suitable security label according to the given credential prior to 
 launch
 of user application, then keep this connection for upcoming request, it is
 worthwhile to reset security label of the client.

 But wait a minute - how is that any good?  That allows the client to
 pretty trivially circumvent the security restriction that we were
 trying to impose by doing sepgsql_setcon() in the first place.  It
 doesn't matter how convenient it is if it's flagrantly insecure.

 Am I missing something here?

 It is a practical reason. If we would not support the reset feature,
 the application has to know the security label of itself, as a target
 label to be reverted. However, I'm not certain the status of script-
 language binding of libselinux feature to obtain the self label,
 although it is supported on Perl, Ruby and PHP (with extension
 by myself) at least.

You're still missing my point.  The issue isn't the particular choice
of mechanism for reverting to the original security label; it's the
fact that such a thing would be possible at all.

Suppose that the connection starts out in context connection_pooler_t.
 Based on the identity of the user, we transition to foo_t, bar_t, or
baz_t.  If it's possible, by any method, for one of those contexts to
get back to connection_pooler_t, then we've got a problem.  We give a
connection to user foo which is in foo_t; he transitions it back to
connection_pooler_t, then to bar_t, and usurps user bar's privileges.
Unless there's some way to prevent that, the only way to make this
secure is to make the transition to foo_t irreversible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Kohei KaiGai

2012/3/12 Robert Haas robertmh...@gmail.com:
 On Mon, Mar 12, 2012 at 11:13 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/3/12 Robert Haas robertmh...@gmail.com:
 On Mon, Mar 12, 2012 at 10:58 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It is a practical reason. In case when httpd open the connection to PG and
 set a suitable security label according to the given credential prior to 
 launch
 of user application, then keep this connection for upcoming request, it is
 worthwhile to reset security label of the client.

 But wait a minute - how is that any good?  That allows the client to
 pretty trivially circumvent the security restriction that we were
 trying to impose by doing sepgsql_setcon() in the first place.  It
 doesn't matter how convenient it is if it's flagrantly insecure.

 Am I missing something here?

 It is a practical reason. If we would not support the reset feature,
 the application has to know the security label of itself, as a target
 label to be reverted. However, I'm not certain the status of script-
 language binding of libselinux feature to obtain the self label,
 although it is supported on Perl, Ruby and PHP (with extension
 by myself) at least.

 You're still missing my point.  The issue isn't the particular choice
 of mechanism for reverting to the original security label; it's the
 fact that such a thing would be possible at all.

 Suppose that the connection starts out in context connection_pooler_t.
  Based on the identity of the user, we transition to foo_t, bar_t, or
 baz_t.  If it's possible, by any method, for one of those contexts to
 get back to connection_pooler_t, then we've got a problem.  We give a
 connection to user foo which is in foo_t; he transitions it back to
 connection_pooler_t, then to bar_t, and usurps user bar's privileges.
 Unless there's some way to prevent that, the only way to make this
 secure is to make the transition to foo_t irreversible.

It is the reason why I advocate the idea to allow sepgsql_setcon()
inside of trusted-procedures.

The original use-case of Joshua does not allow connection_pooler_t
to execute any SQL commands except for invocation of a particular
trusted-procedures; that takes a secret credential as an argument,
then it switches the client label to foo_t, bar_t or baz_t according to
the supplied credential.
These labels are allowed to switch back to the original
connection_pooler_t, but it is unavailable to switch arbitrary label
without suitable credential.

Please also note that the reset of security label is handled as
a switch from the current one to the original one; that takes
permission check as normal manner.
So, it is an option to prevent to reset the client label to the original
one; that is allowed to switch arbitrary label, in an environment
without connection pooling.

Do we still have problematic scenario here?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Jeff Janes

On Sat, Mar 10, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

On Nate Boley's machine, the difference was ~100% increase rather than
~10%.  Do you think the difference is in the CPU architecture, or the
IO subsystem?

Also, do you have the latency numbers?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgfiledump - pg_filedump: Updates for latest Postgres 9.2 sources.

2012-03-12 Thread Alvaro Herrera


Excerpts from tgl's message of lun mar 12 12:08:33 -0400 2012:
 Log Message:
 ---
 Updates for latest Postgres 9.2 sources.
 We no longer need pg_crc.c, and hence not a source tree, yay.

FWIW I had some fun last week compiling this to 9.1 sources because I
was using a VPATH build.  I had to add another -I to LDFLAGS or
something like that.

Also, what do you think about adding the ability to dump pg_filenode.map
files?  Do you think it belongs in pg_filedump, or should we look at
doing that elsewhere?  (Hmm, I just realized we might also conceivably
want to dump VM and FSM forks, too)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-12 Thread Jeff Janes

On Fri, Mar 9, 2012 at 2:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:


 Thanks!

 BTW, I haven't forgotten about the recovery bugs Jeff found earlier. I'm
 planning to do a longer run with his test script - I only run it for about
 1000 iterations - to see if I can reproduce the PANIC with both the earlier
 patch version he tested, and this new one.

Hi Heikki,

I've run the v12 patch for 17,489 rounds of crash and recovery, and
detected no assertion failures or other problems.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query planner does not canonicalize infix operators

2012-03-12 Thread Daniel Farina

On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Will Leinweber w...@heroku.com writes:
 I created an index on an hstore function, fetchval(hstore, text), however
 when I use the - infix operator which resolves to the very same function,
 this index is not used. It should be used.

 Don't hold your breath.  Create an index on the expression you intend to
 use, not random respellings of it.

Is this saying there no need for that or no one is working on it,
and I certainly don't intend to, or definitely not in the next
version or something else entirely?

I disagreement with the former, and am in total understanding of the latter two.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_filedump improvements

2012-03-12 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Also, what do you think about adding the ability to dump pg_filenode.map
 files?  Do you think it belongs in pg_filedump, or should we look at
 doing that elsewhere?

Not sure.  It does already contain the ability to dump pg_control, but
that seems like rather a wart (not to mention redundant with
pg_controldata) because you have to explicitly say -c to get it to
realize that's what you want.  Map files would have to be another
special switch I think.

 (Hmm, I just realized we might also conceivably
 want to dump VM and FSM forks, too)

Yeah, that's been on the to-do list for awhile.  There's not a way that
pg_filedump could automatically recognize such files, is there?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas

On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas robertmh...@gmail.com wrote:
 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

 That's probably true, although there is some deferred cost that is
 hard to account for.  You might not block immediately, but then later
 somebody might block either because the mxact SLRU now needs fsyncs or
 because they've got to decode an mxid long after the relevant segment
 has been evicted from the SLRU buffers.  In general, it's hard to
 bound that latter cost, because you only avoid blocking once (when the
 initial update happens) but you might pay the extra cost of decoding
 the mxid as many times as the row is read, which could be arbitrarily
 many.  How much of a problem that is in practice, I'm not completely
 sure, but it has worried me before and it still does.  In the worst
 case scenario, a handful of frequently-accessed rows with MXIDs all of
 whose members are dead except for the UPDATE they contain could result
 in continual SLRU cache-thrashing.

 From a performance standpoint, we really need to think not only about
 the cases where the patch wins, but also, and maybe more importantly,
 the cases where it loses.  There are some cases where the current
 mechanism, use SHARE locks for foreign keys, is adequate.  In
 particular, it's adequate whenever the parent table is not updated at
 all, or only very lightly.  I believe that those people will pay
 somewhat more with this patch, and especially in any case where
 backends end up waiting for fsyncs in order to create new mxids, but
 also just because I think this patch will have the effect of
 increasing the space consumed by each individual mxid, which imposes a
 distributed cost of its own.

I spent some time thinking about this over the weekend, and I have an
observation, and an idea.  Here's the observation: I believe that
locking a tuple whose xmin is uncommitted is always a noop, because if
it's ever possible for a transaction to wait for an XID that is part
of its own transaction (exact same XID, or sub-XIDs of the same top
XID), then a transaction could deadlock against itself.  I believe
that this is not possible: if a transaction were to wait for an XID
assigned to that same backend, then the lock manager would observe
that an ExclusiveLock on the xid is already held, so the request for a
ShareLock would be granted immediately.  I also don't believe there's
any situation in which the existence of an uncommitted tuple fails to
block another backend, but a lock on that same uncommitted tuple would
have caused another backend to block.  If any of that sounds wrong,
you can stop reading here (but please tell me why it sounds wrong).

If it's right, then here's the idea: what if we stored mxids using
xmin rather than xmax?  This would mean that, instead of making mxids
contain the tuple's original xmax, they'd need to instead contain the
tuple's original xmin.  This might seem like rearranging the deck
chairs on the titanic, but I think it actually works out substantially
better, because if we can assume that the xmin is committed, then we
only need to know its exact value until it becomes older than
RecentGlobalXmin.  This means that a tuple can be both updated and
locked at the same time without the MultiXact SLRU needing to be
crash-safe, because if we crash and restart, any mxids that are still
around from before the crash are known to contain only xmins that are
now all-visible.  We therefore don't need their exact values, so it
doesn't matter if that data actually made it to disk.  Furthermore, in
the case where a previously-locked tuple is read repeatedly, we only
need to keep doing SLRU lookups until the xmin becomes all-visible;
after that, we can set a hint bit indicating that the tuple's xmin is
all-visible, and any future readers (or writers) can use that to skip
the SLRU lookup.  In the degenerate (and probably common) case where a
tuple is already all-visible at the time it's locked, we don't really
need to record the original xmin at all; we can still do so if
convenient, but we can set the xmin-all-visible hint right away, so
nobody needs to probe the SLRU just to get xmin.

In other words, we'd entirely avoid needing to make mxacts crash-safe,
and we'd avoid most of the extra SLRU lookups that the current
implementation requires; they'd only be needed when (and for as long
as) the locked tuple was not yet all-visible.

This also seems like it would make the anti-wraparound issues simpler
to handle - once an mxid is old enough that any xmin it contains must
be all-visible, we can simply overwrite the tuple's xmin with
FrozenXID, which is pretty much what we're already doing anyway.  It's
already the case that a

Re: [HACKERS] Partitioning triggers doc patch

2012-03-12 Thread Jaime Casanova

On Mon, Mar 12, 2012 at 10:41 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Attached is a doc patch based on a suggestion by Rural Hunter to
 address something he found confusing while setting up partitioning.

 Original thread is here:

 http://archives.postgresql.org/pgsql-admin/2012-03/msg00066.php


you can create trigger that execute before the trigger that redirect
to the partition, for example to stamp the time or the user... so it
should state something like i said about constraints


Do not define any check constraints on this table, unless you intend
them to be applied equally to all partitions.


-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning triggers doc patch

2012-03-12 Thread Kevin Grittner

Jaime Casanova ja...@2ndquadrant.com wrote:
 On Mon, Mar 12, 2012 at 10:41 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Attached is a doc patch based on a suggestion by Rural Hunter to
 address something he found confusing while setting up
 partitioning.

 Original thread is here:

 http://archives.postgresql.org/pgsql-admin/2012-03/msg00066.php
  
 
 you can create trigger that execute before the trigger that
 redirect to the partition, for example to stamp the time or the
 user... so it should state something like i said about constraints
 
 
 Do not define any check constraints on this table, unless you
 intend them to be applied equally to all partitions.
 
 
Yeah, I thought about that issue, but the problem is that the rule
or trigger that routes the operation to another partition might
prevent the trigger from firing at all.  I figured that if I went as
far as what you describe, I would need to get into those timing
issues, too; and I wasn't sure whether that might be too much
information for that particular spot.  But if you think we should
go there, I could draft something which pointed out that an AFTER
trigger is never useful and that a BEFORE trigger to enforce things
globally must sort ahead of the routing trigger, and should not be
used at all if a RULE is used to route operations to the partitions.
 
Do we really want to get into all that in the middle of step-by-step
instructions for the simple case?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 12:30 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Suppose that the connection starts out in context connection_pooler_t.
  Based on the identity of the user, we transition to foo_t, bar_t, or
 baz_t.  If it's possible, by any method, for one of those contexts to
 get back to connection_pooler_t, then we've got a problem.  We give a
 connection to user foo which is in foo_t; he transitions it back to
 connection_pooler_t, then to bar_t, and usurps user bar's privileges.
 Unless there's some way to prevent that, the only way to make this
 secure is to make the transition to foo_t irreversible.

 It is the reason why I advocate the idea to allow sepgsql_setcon()
 inside of trusted-procedures.

 The original use-case of Joshua does not allow connection_pooler_t
 to execute any SQL commands except for invocation of a particular
 trusted-procedures; that takes a secret credential as an argument,
 then it switches the client label to foo_t, bar_t or baz_t according to
 the supplied credential.
 These labels are allowed to switch back to the original
 connection_pooler_t, but it is unavailable to switch arbitrary label
 without suitable credential.

Oh, I get it.

Given that that's the intended use case, the current design does make
sense, but it seems awfully special-purpose.  Not knowing that this is
what you had in mind, I never would have guessed the reason for all
this complexity.  I worry that this is too much of a purpose-built
mechanism, and that nobody will ever be able to use it for much of
anything beyond the extremely specific use case that you've laid out
here.  I think that, at the very least, the comments and documentation
need to make it clear that this is very deliberately intended to
modify only the toplevel security context of the session, which may be
different from the currently active context if a TP is in use; and
also that the change will apply to future transactions only if the
current transaction commits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs

On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas robertmh...@gmail.com wrote:

 In other words, we'd entirely avoid needing to make mxacts crash-safe,
 and we'd avoid most of the extra SLRU lookups that the current
 implementation requires; they'd only be needed when (and for as long
 as) the locked tuple was not yet all-visible.

The current implementation only requires additional lookups in the
update/check case, which is the case that doesn't do anything other
than block right now. Since we're replacing lock contention with
physical access contention even the worst case situation is still
better than what we have now. Please feel free to point out worst case
situations and show that isn't true.

I've also pointed out how to avoid overhead of making mxacts crash
safe when the new facilities are not in use, so I don't see problems
with the proposed mechanism. Given that I am still myself reviewing
the actual code.

So those things are not something we need to avoid.

My feeling is that overwriting xmin is a clever idea, but arrives too
late to require sensible analysis in this stage of the CF. It's not
solving a problem, its just an alternate mechanism and at best an
optimisation of the mechanism. Were we to explore it now, it seems
certain that another person would observe that design were taking
place and so the patch should be rejected, which would be unnecessary
and wasteful. I also think it would alter our ability to diagnose
problems, not least the normal test that xmax matches xmin across an
update.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Nate Boley's machine, the difference was ~100% increase rather than
 ~10%.

Oh, right.  I had forgotten how dramatic the changes were in those
test runs.  I guess I should be happy that the absolute numbers on
this machine were as high as they were.  This machine seems to be
beating that one on every metric.

 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

That is an excellent question.  I tried looking at vmstat output, but
a funny thing kept happening: periodically, the iowait column would
show a gigantic negative number instead of a number between 0 and 100.
 This makes me a little chary of believing any of it.  Even if I did,
I'm not sure that would fully answer the question.  So I guess the
short answer is that I don't know, and I'm not even sure how I might
go about figuring it out.  Any ideas?

 Also, do you have the latency numbers?

Not at the moment, but I'll generate them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning triggers doc patch

2012-03-12 Thread Jaime Casanova

On Mon, Mar 12, 2012 at 12:39 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Do we really want to get into all that in the middle of step-by-step
 instructions for the simple case?


is there any other place we can add a more detailed explanation? so
from your text after saying usually there is no reason for you can
add a parentesis saying more info in link

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 1:50 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas robertmh...@gmail.com wrote:
 In other words, we'd entirely avoid needing to make mxacts crash-safe,
 and we'd avoid most of the extra SLRU lookups that the current
 implementation requires; they'd only be needed when (and for as long
 as) the locked tuple was not yet all-visible.

 The current implementation only requires additional lookups in the
 update/check case, which is the case that doesn't do anything other
 than block right now. Since we're replacing lock contention with
 physical access contention even the worst case situation is still
 better than what we have now. Please feel free to point out worst case
 situations and show that isn't true.

I think I already have:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01258.php

The case I'm worried about is where we are allocating mxids quickly,
and we end up having to wait for fsyncs on mxact segments.  That might
be very slow, but you could argue that it could *possibly* be still
worthwhile if it avoids blocking.  That doesn't strike me as a
slam-dunk, though, because we've already seen and fixed cases where
too many fsyncs causes the performance of the entire system to go down
the tubes (cf. commit 7f242d880b5b5d9642675517466d31373961cf98).  But
it's really bad if there are no updates on the parent table - then,
whatever extra overhead there is will be all for naught, since the
more fine-grained locking doesn't help anyway.

 I've also pointed out how to avoid overhead of making mxacts crash
 safe when the new facilities are not in use, so I don't see problems
 with the proposed mechanism. Given that I am still myself reviewing
 the actual code.

The closest thing I can find to a proposal from you in that regard is
this comment:

# I was really thinking we could skip the fsync of a page if we've not
# persisted anything important on that page, since that was one of
# Robert's performance points.

It might be possible to do something with that idea, but at the moment
I'm not seeing how to make it work.

 So those things are not something we need to avoid.

 My feeling is that overwriting xmin is a clever idea, but arrives too
 late to require sensible analysis in this stage of the CF. It's not
 solving a problem, its just an alternate mechanism and at best an
 optimisation of the mechanism. Were we to explore it now, it seems
 certain that another person would observe that design were taking
 place and so the patch should be rejected, which would be unnecessary
 and wasteful.

Considering that nobody's done any work to resolve the uncertainty
about whether the worst-case performance characteristics of this patch
are acceptable, and considering further that it was undergoing massive
code churn for more than a month after the final CommitFest, I think
it's not that unreasonable to think it might not be ready for prime
time at this point.  In any event, your argument is exactly backwards:
we need to first decide whether the patch needs a redesign and then,
if it does, postpone it.  Deciding that we don't want to postpone it
first, and therefore we're not going to redesign it even if that is
what's really needed makes no sense.

 I also think it would alter our ability to diagnose
 problems, not least the normal test that xmax matches xmin across an
 update.

There's nothing stopping the new tuple from being frozen before the
old one, even today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning triggers doc patch

2012-03-12 Thread Kevin Grittner

Jaime Casanova ja...@2ndquadrant.com wrote:
 
 is there any other place we can add a more detailed explanation?
 so from your text after saying usually there is no reason for
 you can add a parentesis saying more info in link
 
Sure, I'll look around for a good spot.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query planner does not canonicalize infix operators

2012-03-12 Thread Tom Lane

Daniel Farina dan...@heroku.com writes:
 On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Will Leinweber w...@heroku.com writes:
 I created an index on an hstore function, fetchval(hstore, text), however
 when I use the - infix operator which resolves to the very same function,
 this index is not used. It should be used.

 Don't hold your breath.  Create an index on the expression you intend to
 use, not random respellings of it.

 Is this saying there no need for that or no one is working on it,
 and I certainly don't intend to, or definitely not in the next
 version or something else entirely?

The reason this is difficult is that the bulk of the planner's
optimization knowledge is attached to operators, not functions.  It'd be
easy enough to smash operators down to their underlying functions during
expression preprocessing, sure.  But then we would fail to recognize index
applicability at all --- for instance an index on an integer column can
get matched to indexcol = 42, but not to int4eq(indexcol, 42).  And
we'd have little clue about the selectivity of the expression, either,
since selectivity estimators are attached to operators not functions.

Arguably the Berkeley guys got this wrong 25 years ago, and they should
have defined indexes and selectivity in terms of functions not
operators.  However I really don't foresee us reinventing all the
operator class infrastructure to make that happen; the amount of work
required is far out of proportion to the benefit, even without
considering the number of external projects that would get impacted.
(Inventing selectivity estimators for functions is a less daunting task,
and we might do that sometime ... but I think it would likely live
alongside operator selectivity rather than replace it.)

More generally, I'm not prepared to buy into the idea that the planner
should be expected to recognize alternate spellings of the same
expression.  There are too many variants of that idea that are
infeasible either because the planner doesn't have the necessary
knowledge, or it does but trying to recognize the equivalence would cost
an impractical number of planning cycles.  Neither of those objections
apply to replace an operator by its underlying function; but they do
apply to other comparable requests we've gotten such as recognize that
x + 0 is the same as x or x + 1 is the same as 1 + x.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs

On Mon, Mar 12, 2012 at 6:14 PM, Robert Haas robertmh...@gmail.com wrote:

 Considering that nobody's done any work to resolve the uncertainty
 about whether the worst-case performance characteristics of this patch
 are acceptable, and considering further that it was undergoing massive
 code churn for more than a month after the final CommitFest, I think
 it's not that unreasonable to think it might not be ready for prime
 time at this point.

Thank you for cutting to the chase.

The uncertainty of which you speak is a theoretical point you
raised. It has been explained, but nobody has yet shown the
performance numbers to illustrate the point but only because they
seemed so clear. I would point out that you haven't demonstrated the
existence of a problem either, so redesigning something without any
proof of a problem seems a strange.

Let me explain again what this patch does and why it has such major
performance benefit.

This feature give us a step change in lock reductions from FKs. A real
world best case might be to examine the benefit this patch has on a
large batch load that inserts many new orders for existing customers.
In my example case the orders table has a FK to the customer table. At
the same time as the data load, we attempt to update a customer's
additional details, address or current balance etc. The large load
takes locks on the customer table and keeps them for the whole
transaction. So the customer updates are locked out for multiple
seconds, minutes or maybe hours, depending upon how far you want to
stretch the example. With this patch the customer updates don't cause
lock conflicts but they require mxact lookups in *some* cases, so they
might take 1-10ms extra, rather than 1-10 minutes more. 1000x faster.
The only case that causes the additional lookups is the case that
otherwise would have been locked. So producing best case results is
trivial and can be as enormous as you like.

I agree with you that some worst case performance tests should be
done. Could you please say what you think the worst cases would be, so
those can be tested? That would avoid wasting time or getting anything
backwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-12 Thread Heikki Linnakangas


On 09.03.2012 12:04, Heikki Linnakangas wrote:

I've been doing some performance testing with this, using a simple C
function that just inserts a dummy WAL record of given size. I'm not
totally satisfied. Although the patch helps with scalability at 3-4
concurrent backends doing WAL insertions, it seems to slow down the
single-client case with small WAL records by about 5-10%. This is what
Robert also saw with an earlier version of the patch
(http://archives.postgresql.org/pgsql-hackers/2011-12/msg01223.php). I
tested this with the data directory on a RAM drive, unfortunately I
don't have a server with a hard drive that can sustain the high
insertion rate. I'll post more detailed results, once I've refined the
tests a bit.


So, here's more detailed test results, using Greg Smith's excellent 
pgbench-tools test suite:


http://community.enterprisedb.com/xloginsert-scale-tests/

The workload in all of these tests was a simple C function that writes a 
lot of very small WAL records, with 16 bytes of payload each. I ran the 
tests with the data directory on a regular hard drive, on an SDD, and on 
a ram drive (/dev/shm). With HDD, I also tried fsync=off and 
synchronous_commit_off. For each of those, I ran the tests with 1-16 
concurrent backends.


Summary: The patch hurts single-backend performance by about 10%, except 
for the synchronous_commit=off test. Between 2-6 clients, it either 
helps, doesn't make any difference, or hurts. With  6 clients, it hurts.


So, that's quite disappointing. The patch has two problems: the 10% 
slowdown in single-client case, and the slowdown with  6 clients. I 
don't know where exactly the single-client slowdown comes from, although 
I'm not surprised that the bookkeeping with slots etc. has some 
overhead. Hopefully that overhead can be made smaller, if not eliminated 
completely..


The slowdown with  6 clients seems to be spinlock contention. I ran 
perf record for a short duration during one of the ramdrive tests, and 
saw the spinlock acquisition in ReserveXLogInsertLocation() consuming 
about 80% of all CPU time.


I then hacked the patch a little bit, removing the check in XLogInsert 
for fullPageWrites and forcePageWrites, as well as the check for did a 
checkpoint just happen (see 
http://community.enterprisedb.com/xloginsert-scale-tests/disable-fpwcheck.patch). 
My hunch was that accessing those fields causes cache line stealing, 
making the cache line containing the spinlock even more busy. That hunch 
seems to be correct; when I reran the tests with that patch, the 
performance with high # of clients became much better. See the results 
with xloginsert-scale-13.patch. With that change, the single-client 
case is still about 10% slower than current code, but the performance 
with  8 clients is almost as good as with current code. Between 2-6 
clients, the patch is a win.


The hack that restored the  6 clients performance to current level is 
not safe, of course, so I'll have to figure out a safe way to get that 
effect. Also, even when the performance is as good as current code, it's 
not good to spend all the CPU time spinning on the spinlock. I didn't 
measure the CPU usage with current code, but I would expect it to be 
sleeping, not spinning, when not doing useful work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query planner does not canonicalize infix operators

2012-03-12 Thread Daniel Farina

On Mon, Mar 12, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 On Mon, Mar 12, 2012 at 7:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Will Leinweber w...@heroku.com writes:
 I created an index on an hstore function, fetchval(hstore, text), however
 when I use the - infix operator which resolves to the very same function,
 this index is not used. It should be used.

 Don't hold your breath.  Create an index on the expression you intend to
 use, not random respellings of it.

 Is this saying there no need for that or no one is working on it,
 and I certainly don't intend to, or definitely not in the next
 version or something else entirely?

 The reason this is difficult is that the bulk of the planner's
 optimization knowledge is attached to operators, not functions.  It'd be
 easy enough to smash operators down to their underlying functions during
 expression preprocessing, sure.  But then we would fail to recognize index
 applicability at all --- for instance an index on an integer column can
 get matched to indexcol = 42, but not to int4eq(indexcol, 42).  And
 we'd have little clue about the selectivity of the expression, either,
 since selectivity estimators are attached to operators not functions.

Argh. That is very sad. Given that an operator definitely resolves to
a function (right?) in the function catalog it seems a little insane.

 Arguably the Berkeley guys got this wrong 25 years ago, and they should
 have defined indexes and selectivity in terms of functions not
 operators.  However I really don't foresee us reinventing all the
 operator class infrastructure to make that happen; the amount of work
 required is far out of proportion to the benefit, even without
 considering the number of external projects that would get impacted.
 (Inventing selectivity estimators for functions is a less daunting task,
 and we might do that sometime ... but I think it would likely live
 alongside operator selectivity rather than replace it.)

I see. That is ugly.

 More generally, I'm not prepared to buy into the idea that the planner
 should be expected to recognize alternate spellings of the same
 expression.  There are too many variants of that idea that are
 infeasible either because the planner doesn't have the necessary
 knowledge, or it does but trying to recognize the equivalence would cost
 an impractical number of planning cycles.  Neither of those objections
 apply to replace an operator by its underlying function; but they do
 apply to other comparable requests we've gotten such as recognize that
 x + 0 is the same as x or x + 1 is the same as 1 + x.

I think that the case of infix-operator-to-function stands out in that
it doesn't really rely on knowing any algebriac properties of the
underlying function such as commutativity, associativity, the notion
of a nil-value in the algebra, et al.  It's a bit concerning because
ORMs are starting to pick up on the extra data types in Postgres and
they'll have to all spell the preferred spelling of the
functionality to properly compose.  If SQLAlchemy generates fetchval
and ActiveRecord 4 uses -, tough times are ahead.

Thanks for your detailed response.  As a small point of criticism, I
think your response to Will came off as a bit strong, even though
there are good reasons why this deceptively small request turns into a
snarl that may only be of interest to optimizer hackers.

On the more constructive side, if I were to till the fields to change
this aspect of the optimizer, is there any interest in rectifying the
operator-function confusion?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Jeff Janes

On Mon, Mar 12, 2012 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Nate Boley's machine, the difference was ~100% increase rather than
 ~10%.

 Oh, right.  I had forgotten how dramatic the changes were in those
 test runs.  I guess I should be happy that the absolute numbers on
 this machine were as high as they were.  This machine seems to be
 beating that one on every metric.

 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

 That is an excellent question.  I tried looking at vmstat output, but
 a funny thing kept happening: periodically, the iowait column would
 show a gigantic negative number instead of a number between 0 and 100.

On which machine was that happening?

  This makes me a little chary of believing any of it.  Even if I did,
 I'm not sure that would fully answer the question.  So I guess the
 short answer is that I don't know, and I'm not even sure how I might
 go about figuring it out.  Any ideas?

Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
machines) with fsync=off (as well as synchronous_commit=off still)
might help clarify things.
If it increases the TPS of Nate@16MB, but doesn't change the other 3
situations much, then that suggests the IO system is driving it.
Basically moving up to 32MB is partially innoculating against slow
fsyncs upon log switch on that machine.

Does the POWER7 have a nonvolatile cache?  What happened with
synchronous_commit=on?

Also, since all data fits in shared_buffers, making
checkpoint_segments and checkpoint_timeout be larger than the
benchmark period should remove the only other source of writing from
the system.  With no checkpoints, no evictions, and no fysncs, it is
unlikely for the remaining IO to be the bottleneck.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GitHub mirror is not updating

2012-03-12 Thread Marti Raudsepp

Hi list,

I don't know who's maintaining the PostgreSQL GitHub mirror, but it
hasn't been updated for 6 days now:
https://github.com/postgres/postgres

Just letting you know.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GitHub mirror is not updating

2012-03-12 Thread Magnus Hagander

On Mon, Mar 12, 2012 at 21:53, Marti Raudsepp ma...@juffo.org wrote:
 Hi list,

 I don't know who's maintaining the PostgreSQL GitHub mirror, but it
 hasn't been updated for 6 days now:
 https://github.com/postgres/postgres

 Just letting you know.

Thanks for letting us know. I think it's because we missed to approve
an SSH key after the github security hole a week ago. I've approved it
now, hopefully the mirror should kick back to life shortly.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-12 Thread Bruce Momjian

On Thu, Mar 08, 2012 at 08:33:48PM -0500, Bruce Momjian wrote:
 OK, it seems people do care about keeping log files from multiple runs
 so I went with Tom's idea and have:
 
   -
 pg_upgrade run on Thu Mar  8 19:30:12 2012
   -
   
   Performing Consistency Checks
   -
 
 Updated patch attached.
 
 FYI, in retain mode, these are the files left in the current directory:
 
   delete_old_cluster.sh
   pg_upgrade_dump_all.sql
   pg_upgrade_dump_db.sql
   pg_upgrade_dump_globals.sql
   pg_upgrade_internal.log
   pg_upgrade_restore.log
   pg_upgrade_server.log
   pg_upgrade_utility.log
 
 I will address the idea of using /tmp in another email.

OK, attached pg_upgrade patch applied, with this list of improvements in
logging:

add ability to control permissions of created files
have psql echo its queries for easier debugging
output four separate log files, and delete them on success
add -r/--retain option to keep log files after success
make log files append-only
remove -g/-G/-l logging options
sugggest tailing appropriate log file on failure
enhance -v/--verbose behavior

Thanks for all the feedback.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index a5f63eb..cf43384
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** issue_warnings(char *sequence_script_fil
*** 165,176 
  		if (sequence_script_file_name)
  		{
  			prep_status(Adjusting sequences);
! 			exec_prog(true,
! 	  SYSTEMQUOTE \%s/psql\ --set ON_ERROR_STOP=on 
  	  --no-psqlrc --port %d --username \%s\ 
! 	  -f \%s\ --dbname template1  \%s\ SYSTEMQUOTE,
  	  new_cluster.bindir, new_cluster.port, os_info.user,
! 	  sequence_script_file_name, log_opts.filename2);
  			unlink(sequence_script_file_name);
  			check_ok();
  		}
--- 165,177 
  		if (sequence_script_file_name)
  		{
  			prep_status(Adjusting sequences);
! 			exec_prog(true, true, UTILITY_LOG_FILE,
! 	  SYSTEMQUOTE \%s/psql\ --echo-queries 
! 	  --set ON_ERROR_STOP=on 
  	  --no-psqlrc --port %d --username \%s\ 
! 	  -f \%s\ --dbname template1  \%s\ 21 SYSTEMQUOTE,
  	  new_cluster.bindir, new_cluster.port, os_info.user,
! 	  sequence_script_file_name, UTILITY_LOG_FILE);
  			unlink(sequence_script_file_name);
  			check_ok();
  		}
*** create_script_for_old_cluster_deletion(c
*** 393,402 
  
  	prep_status(Creating script to delete old cluster);
  
! 	snprintf(*deletion_script_file_name, MAXPGPATH, %s/delete_old_cluster.%s,
! 			 os_info.cwd, SCRIPT_EXT);
  
! 	if ((script = fopen(*deletion_script_file_name, w)) == NULL)
  		pg_log(PG_FATAL, Could not open file \%s\: %s\n,
  			   *deletion_script_file_name, getErrorText(errno));
  
--- 394,403 
  
  	prep_status(Creating script to delete old cluster);
  
! 	snprintf(*deletion_script_file_name, MAXPGPATH, delete_old_cluster.%s,
! 			 SCRIPT_EXT);
  
! 	if ((script = fopen_priv(*deletion_script_file_name, w)) == NULL)
  		pg_log(PG_FATAL, Could not open file \%s\: %s\n,
  			   *deletion_script_file_name, getErrorText(errno));
  
*** check_for_isn_and_int8_passing_mismatch(
*** 541,548 
  		return;
  	}
  
! 	snprintf(output_path, sizeof(output_path), %s/contrib_isn_and_int8_pass_by_value.txt,
! 			 os_info.cwd);
  
  	for (dbnum = 0; dbnum  cluster-dbarr.ndbs; dbnum++)
  	{
--- 542,549 
  		return;
  	}
  
! 	snprintf(output_path, sizeof(output_path),
! 			 contrib_isn_and_int8_pass_by_value.txt);
  
  	for (dbnum = 0; dbnum  cluster-dbarr.ndbs; dbnum++)
  	{
*** check_for_isn_and_int8_passing_mismatch(
*** 569,575 
  		for (rowno = 0; rowno  ntups; rowno++)
  		{
  			found = true;
! 			if (script == NULL  (script = fopen(output_path, w)) == NULL)
  pg_log(PG_FATAL, Could not open file \%s\: %s\n,
  	   output_path, getErrorText(errno));
  			if (!db_used)
--- 570,576 
  		for (rowno = 0; rowno  ntups; rowno++)
  		{
  			found = true;
! 			if (script == NULL  (script = fopen_priv(output_path, w)) == NULL)
  pg_log(PG_FATAL, Could not open file \%s\: %s\n,
  	   output_path, getErrorText(errno));
  			if (!db_used)
*** check_for_reg_data_type_usage(ClusterInf
*** 628,635 
  
  	prep_status(Checking for reg* system OID user data types);
  
! 	snprintf(output_path, sizeof(output_path), %s/tables_using_reg.txt,
! 			 os_info.cwd);
  
  	for (dbnum = 0; dbnum  cluster-dbarr.ndbs; dbnum++)
  	{
--- 629,635 
  
  	prep_status(Checking for reg* system OID user data types);

Re: [HACKERS] pg_upgrade and umask

2012-03-12 Thread Bruce Momjian

On Fri, Mar 09, 2012 at 11:33:36AM -0500, Bruce Momjian wrote:
 On Fri, Mar 09, 2012 at 10:41:53AM -0500, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   The problem is that these files are being created often by shell
   redirects, e.g. pg_dump -f out 2 log_file.  There is no clean way to
   control the file creation permissions in this case --- only umask gives
   us a process-level setting.   Actually, one crafty idea would be to do
   the umask only when I exec something, and when I create the initial
   files with the new banner you suggested.  Let me look into that.
  
  You could create empty log files with the desired permissions, and then
  do the execs with log_file, and thereby not have to globally change
  umask.
 
 Yes, that is what I have done, with the attached patch.  I basically
 wrapped the fopen call with umask calls, and have the system() call
 wrapped too.  That takes care of all the files pg_upgrade creates.
 
   Frankly, the permissions are already being modified by the default
   umask, e.g. 0022.  Do we want a zero umask?
  
  I'm not so worried about default umask; nobody's complained yet about
  wrong permissions on pg_upgrade output files.  But umask 077 would be
  likely to do things like get rid of group access to postgresql.conf,
  which some people intentionally set.
 
 Yes, that was my conclusion too, but I wanted to ask.  FYI, this doesn't
 affect the install itself, just what pg_upgrade changes, and it doesn't
 touch postgresql.conf, but, as you, I am worried there might be
 long-term problems with an aggressive umask that covered the entire
 executable.

I ended up creating fopen_priv to centralize the umask calls to a single
function, and added an is_priv boolean to exec_prog for the same purpose.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlog location arithmetic

2012-03-12 Thread Bruce Momjian

On Fri, Mar 09, 2012 at 03:04:23PM -0500, Tom Lane wrote:
 The main actual simplification would be in getting rid of the hole
 at the end of each 4GB worth of WAL, cf this bit in xlog_internal.h:
 
 /*
  * We break each logical log file (xlogid value) into segment files of the
  * size indicated by XLOG_SEG_SIZE.  One possible segment at the end of each
  * log file is wasted, to ensure that we don't have problems representing
  * last-byte-position-plus-1.
  */
 #define XLogSegSize   ((uint32) XLOG_SEG_SIZE)
 #define XLogSegsPerFile (((uint32) 0x) / XLogSegSize)
 #define XLogFileSize  (XLogSegsPerFile * XLogSegSize)
 
 If we can't get rid of that and have a continuous 64-bit WAL address
 space then it's unlikely we can actually simplify any logic.
 
 Now, doing that doesn't break the naming convention exactly; what it
 changes is that there will be WAL files numbered xxx (for some
 number of trailing-1-bits I'm too lazy to work out at the moment) where
 before there were not.  So the question really is how much external code
 there is that is aware of that specific noncontiguous numbering behavior
 and would be broken if things stopped being that way.

Our current WAL naming is hopelessly arcane, and we would certainly be
benfitting users to simplify it.  Is this a TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Noah Misch

On Mon, Mar 12, 2012 at 01:28:11PM -0400, Robert Haas wrote:
 I spent some time thinking about this over the weekend, and I have an
 observation, and an idea.  Here's the observation: I believe that
 locking a tuple whose xmin is uncommitted is always a noop, because if
 it's ever possible for a transaction to wait for an XID that is part
 of its own transaction (exact same XID, or sub-XIDs of the same top
 XID), then a transaction could deadlock against itself.  I believe
 that this is not possible: if a transaction were to wait for an XID
 assigned to that same backend, then the lock manager would observe
 that an ExclusiveLock on the xid is already held, so the request for a
 ShareLock would be granted immediately.  I also don't believe there's
 any situation in which the existence of an uncommitted tuple fails to
 block another backend, but a lock on that same uncommitted tuple would
 have caused another backend to block.  If any of that sounds wrong,
 you can stop reading here (but please tell me why it sounds wrong).

When we lock an update-in-progress row, we walk the t_ctid chain and lock all
descendant tuples.  They may all have uncommitted xmins.  This is essential to
ensure that the final outcome of the updating transaction does not affect
whether the locking transaction has its KEY SHARE lock.  Similarly, when we
update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
to the new version.  That new tuple is both uncommitted and has locks, and we
cannot easily sacrifice either property.  Do you see a way to extend your
scheme to cover these needs?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade and statistics

2012-03-12 Thread Daniel Farina

As noted by the manual, pg_statistic is ported in any way when
performing pg_upgrade.  I have been investigating what it would take
to (even via just a connected SQL superuser client running UPDATE or
INSERT against pg_statistic) get at least some baseline statistics
into the database as quickly as possible, since in practice the
underlying implementation of the statistics and cost estimation does
not change so dramatically between releases as to make the old
statistics useless (AFAIK).  I eventually used a few contortions to be
able to update the anyarray elements in pg_statistic:

  UPDATE pg_statistic SET
stavalues1=array_in(anyarray_out('{thearrayliteral}'::concrete_type[]),
'concrete_type'::regtype, atttypemod)
  WHERE staattnum = attnum and starelid = therelation;

Notably, the type analysis phase is a bit too smart for me to simply
cast to anyarray from a concrete type, so I run it through a
deparse/reparse phase instead to fool it.

Now I'm stuck trying to ensure that autoanalyze will run at least once
after we have committed the old statistics to the new catalogs,
regardless of how much activity has taken place on the table,
regardless of how cold (thus, tuning the GUC thresholds is not
attractive, because at what point should I tune them back to normal
settings?).  One idea I had was to jigger pg_stat to indicate that a
lot of tuples have changed since the last analyze (which will be
automatically fixed after autoanalyze on a relation completes) but
because this is not a regular table it doesn't look too easy unless I
break out a new C extension.

You probably are going to ask: why not just run ANALYZE and be done
with it?  The reasons are:

  * ANALYZE can take a sufficiently long time on large databases that
the downtime of switching versions is not attractive

  * If we don't run ANALYZE and have no old statistics, then the plans
can be disastrously bad for the user

  * If we do run the ANALYZE statement on a user's behalf as part of
the upgrade, any compatibility fixups that require an exclusive lock
(such as some ALTER TABLE statements) would have to block on this
relatively long ANALYZE.  autoanalyze/autovacuum, by comparison, backs
off frequently, so disaster is averted.

If anyone has any insightful comments as to how to meet these
requirements, I'd appreciate them, otherwise I can consider it an
interesting area for improvement and will eat the ANALYZE and salt the
documentation with caveats.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

 That is an excellent question.  I tried looking at vmstat output, but
 a funny thing kept happening: periodically, the iowait column would
 show a gigantic negative number instead of a number between 0 and 100.

 On which machine was that happening?

The IBM server.

  This makes me a little chary of believing any of it.  Even if I did,
 I'm not sure that would fully answer the question.  So I guess the
 short answer is that I don't know, and I'm not even sure how I might
 go about figuring it out.  Any ideas?

 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.
 If it increases the TPS of Nate@16MB, but doesn't change the other 3
 situations much, then that suggests the IO system is driving it.
 Basically moving up to 32MB is partially innoculating against slow
 fsyncs upon log switch on that machine.

Mmm, yeah.  Although, I think it might have been 64MB rather than 32MB
that I tested on that machine.

 Does the POWER7 have a nonvolatile cache?  What happened with
 synchronous_commit=on?

Haven't tried that yet.

 Also, since all data fits in shared_buffers, making
 checkpoint_segments and checkpoint_timeout be larger than the
 benchmark period should remove the only other source of writing from
 the system.  With no checkpoints, no evictions, and no fysncs, it is
 unlikely for the remaining IO to be the bottleneck.

Another thing to test.

Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
machine.  32 clients, 1800 seconds, scale factor 300, synchronous
commit off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB.pngattachment: tps-32MB.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlog location arithmetic

2012-03-12 Thread Fujii Masao

On Sat, Mar 10, 2012 at 3:23 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 9, 2012 at 12:38 PM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Mar 9, 2012 at 18:18, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Fri, Mar 9, 2012 at 15:37, Tom Lane t...@sss.pgh.pa.us wrote:
 Why would it be useful to use pg_size_pretty on xlog locations?
 -1 because of the large expense of bigint-numeric-whatever conversion
 that would be added to existing uses.

 Given the expense, perhaps we need to different (overloaded) functions 
 instead?

Agreed. Attached patch introduces the overloaded funtion
pg_size_pretty(numeric).

 That would be a workable solution, but I continue to not believe that
 this is useful enough to be worth the trouble.

 There's certainly some use to being able to prettify it. Wouldn't a
 pg_size_pretty(numeric) also be useful if you want to pg_size_() a
 sum() of something? Used on files it doesn't make too much sense,
 given how big those files have to be, but it can be used on other
 things as well...

 I can see a usecase for having a pg_size_pretty(numeric) as an option.
 Not necessarily a very big one, but a 0 one.

 +1.

+1, too.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 14989,14995  postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
/row
row
 entry
! literalfunctionpg_size_pretty(typebigint/type)/function/literal
  /entry
 entrytypetext/type/entry
 entryConverts a size in bytes into a human-readable format with size units/entry
--- 14989,14995 
/row
row
 entry
! literalfunctionpg_size_pretty(typebigint/type or typenumeric/type)/function/literal
  /entry
 entrytypetext/type/entry
 entryConverts a size in bytes into a human-readable format with size units/entry
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***
*** 24,29 
--- 24,30 
  #include storage/fd.h
  #include utils/acl.h
  #include utils/builtins.h
+ #include utils/numeric.h
  #include utils/rel.h
  #include utils/relmapper.h
  #include utils/syscache.h
***
*** 550,555  pg_size_pretty(PG_FUNCTION_ARGS)
--- 551,652 
  	PG_RETURN_TEXT_P(cstring_to_text(buf));
  }
  
+ Datum
+ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
+ {
+ 	Numeric		size = PG_GETARG_NUMERIC(0);
+ 	Numeric		limit, limit2;
+ 	char		*buf, *result;
+ 
+ 	limit = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) (10 * 1024;
+ 	limit2 = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) (10 * 1024 * 2 - 1;
+ 
+ 	if (DatumGetBool(DirectFunctionCall2(numeric_lt, NumericGetDatum(size), NumericGetDatum(limit
+ 	{
+ 		buf = DatumGetCString(DirectFunctionCall1(numeric_out, NumericGetDatum(size)));
+ 		result = palloc(strlen(buf) + 7);
+ 		strcpy(result, buf);
+ 		strcat(result,  bytes);
+ 	}
+ 	else
+ 	{
+ 		Numeric		arg2;
+ 
+ 		/* keep one extra bit for rounding */
+ 		/* size = 9 */
+ 		arg2 = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) pow(2, 9;
+ 		size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size), NumericGetDatum(arg2)));
+ 
+ 		if (DatumGetBool(DirectFunctionCall2(numeric_lt, NumericGetDatum(size), NumericGetDatum(limit2
+ 		{
+ 			/* size = (size + 1) / 2 */
+ 			size = DatumGetNumeric(DirectFunctionCall2(numeric_add, NumericGetDatum(size),
+ 	   DirectFunctionCall1(int8_numeric, Int64GetDatum(1;
+ 			size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size),
+ 	   DirectFunctionCall1(int8_numeric, Int64GetDatum(2;
+ 			buf = DatumGetCString(DirectFunctionCall1(numeric_out, NumericGetDatum(size)));
+ 			result = palloc(strlen(buf) + 4);
+ 			strcpy(result, buf);
+ 			strcat(result,  kB);
+ 		}
+ 		else
+ 		{
+ 			Numeric		arg3;
+ 
+ 			/* size = 10 */
+ 			arg3 = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) pow(2, 10;
+ 			size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size), NumericGetDatum(arg3)));
+ 
+ 			if (DatumGetBool(DirectFunctionCall2(numeric_lt, NumericGetDatum(size), NumericGetDatum(limit2
+ 			{
+ /* size = (size + 1) / 2 */
+ size = DatumGetNumeric(DirectFunctionCall2(numeric_add, NumericGetDatum(size),
+ 		   DirectFunctionCall1(int8_numeric, Int64GetDatum(1;
+ size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size),
+ 		   DirectFunctionCall1(int8_numeric, Int64GetDatum(2;
+ buf = DatumGetCString(DirectFunctionCall1(numeric_out, NumericGetDatum(size)));
+ result = palloc(strlen(buf) + 4);
+ strcpy(result, buf);
+ strcat(result,  MB);
+ 			}

Re: [HACKERS] Measuring relation free space

2012-03-12 Thread Noah Misch

On Fri, Mar 09, 2012 at 02:18:02AM -0500, Jaime Casanova wrote:
 On Wed, Feb 22, 2012 at 12:27 AM, Noah Misch n...@leadboat.com wrote:
  On Tue, Feb 14, 2012 at 02:04:26AM -0500, Jaime Casanova wrote:
 
  1) pgstattuple-gin_spgist.patch
  This first patch adds gin and spgist support to pgstattuple, also
  makes pgstattuple use a ring buffer when reading tables or indexes.
 
  The buffer access strategy usage bits look fine to commit.
 
 
 ok. i extracted that part. which basically makes pgstattuple usable in
 production (i mean, by not bloating shared buffers when using the
 function)

I created a CF entry for this and marked it Ready for Committer.  You left the
bstrategy variable non-static, but that didn't seem important enough to
justify another round trip.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SPGiST versus hot standby - question about conflict resolution rules

2012-03-12 Thread Tom Lane

There is one more (known) stop-ship problem in SPGiST, which I'd kind of
like to get out of the way now before I let my knowledge of that code
get swapped out again.  This is that SPGiST is unsafe for use by hot
standby slaves.

The problem comes from redirect tuples, which are short-lifespan
objects that replace a tuple that's been moved to another page.
A redirect tuple can be recycled as soon as no active indexscan could
be in flight from the parent index page to the moved tuple.  SPGiST
implements this by marking each redirect tuple with the XID of the
creating transaction, and assuming that the tuple can be recycled once
that XID is below the OldestXmin horizon (implying that all active
transactions started after it ended).  This is fine as far as
transactions on the master are concerned, but there is no guarantee that
the recycling WAL record couldn't be replayed on a hot standby slave
while there are still HS transactions that saw the old state of the
parent index tuple.

Now, btree has a very similar problem with deciding when it's safe to
recycle a deleted index page: it has to wait out transactions that could
be in flight to the page, and it does that by marking deleted pages with
XIDs.  I see that the problem has been patched for btree by emitting a
special WAL record just before a page is recycled.  However, I'm a bit
nervous about copying that solution, because the details are a bit
different.  In particular, I see that btree marks deleted pages with
ReadNewTransactionId() --- that is, the next-to-be-assigned XID ---
rather than the XID of the originating transaction, and then it
subtracts one from the XID before sending it to the WAL stream.
The comments about this are not clear enough for me, and so I'm
wondering whether it's okay to use the originating transaction XID
in a similar way, or if we need to modify SPGiST's rule for how to
mark redirection tuples.  I think that the use of ReadNewTransactionId
is because btree page deletion happens in VACUUM, which does not have
its own XID; this is unlike the situation for SPGiST where creation of
redirects is caused by index tuple insertion, so there is a surrounding
transaction with a real XID.  But it's not clear to me how
GetConflictingVirtualXIDs makes use of the limitXmin and whether a live
XID is okay to pass to it, or whether we actually need next XID - 1.

Info appreciated.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlog location arithmetic

2012-03-12 Thread Fujii Masao

On Sat, Mar 10, 2012 at 5:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Mar 9, 2012 at 2:34 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 9, 2012 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  I think thousands is an overestimate, but yeah the logic could be
 greatly simplified.  However, I'm not sure we could avoid breaking the
 existing naming convention for WAL files.  How much do we care about
 that?

 Probably not very much, since WAL files aren't portable across major
 versions anyway.  But I don't see why you couldn't keep the naming
 convention - there's nothing to prevent you from converting a 64-bit
 integer back into two 32-bit integers if and where needed.

 On further reflection, this seems likely to break quite a few
 third-party tools.  Maybe it'd be worth it anyway, but it definitely
 seems like it would be worth going to at least some minor trouble to
 avoid it.

 The main actual simplification would be in getting rid of the hole
 at the end of each 4GB worth of WAL, cf this bit in xlog_internal.h:

 /*
  * We break each logical log file (xlogid value) into segment files of the
  * size indicated by XLOG_SEG_SIZE.  One possible segment at the end of each
  * log file is wasted, to ensure that we don't have problems representing
  * last-byte-position-plus-1.
  */
 #define XLogSegSize             ((uint32) XLOG_SEG_SIZE)
 #define XLogSegsPerFile (((uint32) 0x) / XLogSegSize)
 #define XLogFileSize    (XLogSegsPerFile * XLogSegSize)

 If we can't get rid of that and have a continuous 64-bit WAL address
 space then it's unlikely we can actually simplify any logic.

 Now, doing that doesn't break the naming convention exactly; what it
 changes is that there will be WAL files numbered xxx (for some
 number of trailing-1-bits I'm too lazy to work out at the moment) where
 before there were not.  So the question really is how much external code
 there is that is aware of that specific noncontiguous numbering behavior
 and would be broken if things stopped being that way.

A page header contains WAL location, so getting rid of hole seems to
break pg_upgrade. No? Unless pg_upgrade converts noncontinuous
location to continuous one, we still need to handle noncontinuous one
after upgrade.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlog location arithmetic

2012-03-12 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 On Sat, Mar 10, 2012 at 5:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The main actual simplification would be in getting rid of the hole
 at the end of each 4GB worth of WAL, cf this bit in xlog_internal.h:
 If we can't get rid of that and have a continuous 64-bit WAL address
 space then it's unlikely we can actually simplify any logic.
 ...
 Now, doing that doesn't break the naming convention exactly; what it
 changes is that there will be WAL files numbered xxx (for some
 number of trailing-1-bits I'm too lazy to work out at the moment) where
 before there were not.  So the question really is how much external code
 there is that is aware of that specific noncontiguous numbering behavior
 and would be broken if things stopped being that way.

 A page header contains WAL location, so getting rid of hole seems to
 break pg_upgrade. No?

No, why would it do that?  The meaning and ordering of WAL addresses is
the same as before.  The only difference is that after the upgrade, the
system will stop skipping over 16MB of potentially usable WAL addresses
at the end of each subsequently-used 4GB of space.  The holes before
the switchover point are still holes, but that doesn't matter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-12 Thread Bruce Momjian

On Mon, Mar 12, 2012 at 06:38:30PM -0700, Daniel Farina wrote:
 You probably are going to ask: why not just run ANALYZE and be done
 with it?  The reasons are:
 
   * ANALYZE can take a sufficiently long time on large databases that
 the downtime of switching versions is not attractive
 
   * If we don't run ANALYZE and have no old statistics, then the plans
 can be disastrously bad for the user
 
   * If we do run the ANALYZE statement on a user's behalf as part of
 the upgrade, any compatibility fixups that require an exclusive lock
 (such as some ALTER TABLE statements) would have to block on this
 relatively long ANALYZE.  autoanalyze/autovacuum, by comparison, backs
 off frequently, so disaster is averted.
 
 If anyone has any insightful comments as to how to meet these
 requirements, I'd appreciate them, otherwise I can consider it an
 interesting area for improvement and will eat the ANALYZE and salt the
 documentation with caveats.

Copying the statistics from the old server is on the pg_upgrade TODO
list.  I have avoided it because it will add an additional requirement
that will make pg_upgrade more fragile in case of major version changes.

Does anyone have a sense of how often we change the statistics data
between major versions?  Ideally, pg_dump/pg_dumpall would add the
ability to dump statistics, and pg_upgrade could use that.

To answer your specific question, I think clearing the last analyzed
fields should cause autovacuum to run on analyze those tables.  What I
don't know is whether not clearing the last vacuum datetime will cause
the table not to be analyzed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring relation free space

2012-03-12 Thread Jaime Casanova

On Mon, Mar 12, 2012 at 9:41 PM, Noah Misch n...@leadboat.com wrote:

 I created a CF entry for this and marked it Ready for Committer.

i wasn't sure if create an entry this late was a good idea or not...
but now i feel better because is less probable that it will fall out
on the cracks, thanks

 You left the
 bstrategy variable non-static, but that didn't seem important enough to
 justify another round trip.


ah! i forgot that...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query planner does not canonicalize infix operators

2012-03-12 Thread Peter Eisentraut

On mån, 2012-03-12 at 13:04 -0700, Daniel Farina wrote:
 On the more constructive side, if I were to till the fields to change
 this aspect of the optimizer, is there any interest in rectifying the
 operator-function confusion?

I once proposed to do that [1], but there was not much enthusiasm
elsewhere.  I still think it'd be worthwhile, for the reasons you
mentioned.  But it would be a lot of detail work.


[1] 
http://archives.postgresql.org/message-id/1309640525.26660.22.ca...@vanquo.pezone.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xlog location arithmetic

2012-03-12 Thread Fujii Masao

On Tue, Mar 13, 2012 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Sat, Mar 10, 2012 at 5:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The main actual simplification would be in getting rid of the hole
 at the end of each 4GB worth of WAL, cf this bit in xlog_internal.h:
 If we can't get rid of that and have a continuous 64-bit WAL address
 space then it's unlikely we can actually simplify any logic.
 ...
 Now, doing that doesn't break the naming convention exactly; what it
 changes is that there will be WAL files numbered xxx (for some
 number of trailing-1-bits I'm too lazy to work out at the moment) where
 before there were not.  So the question really is how much external code
 there is that is aware of that specific noncontiguous numbering behavior
 and would be broken if things stopped being that way.

 A page header contains WAL location, so getting rid of hole seems to
 break pg_upgrade. No?

 No, why would it do that?  The meaning and ordering of WAL addresses is
 the same as before.  The only difference is that after the upgrade, the
 system will stop skipping over 16MB of potentially usable WAL addresses
 at the end of each subsequently-used 4GB of space.  The holes before
 the switchover point are still holes, but that doesn't matter.

Oh, I see. You're right.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-03-12 Thread Jeff Davis

On Sun, 2012-02-05 at 17:56 -0500, Noah Misch wrote:
 I meant primarily to illustrate the need to be comprehensive, not comment on
 which executable should fsync a particular file.  Bootstrap-mode backends do
 not sync anything during an initdb run on my system.  With your patch, we'll
 fsync a small handful of files and leave nearly everything else vulnerable.

Thank you for pointing that out. With that in mind, I have a new version
of the patch which just recursively fsync's the whole directory
(attached).

I also introduced a new option --nosync (-N) to disable this behavior.

The bad news is that it introduces a lot more time to initdb -- it goes
from about 1s to about 10s on my machine. I tried fsync'ing the whole
directory twice just to make sure that the second was a no-op, and
indeed it didn't make much difference (still about 10s). 

That's pretty inefficient considering that

  initdb -D data --nosync  sync

only takes a couple seconds. Clearly batching the operation is a big
help. Maybe there's some more efficient way to fsync a lot of
files/directories? Or maybe I can mitigate it by avoiding files that
don't really need to be fsync'd?

Regards,
Jeff Davis


initdb-fsync-20120312.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-12 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Copying the statistics from the old server is on the pg_upgrade TODO
 list.  I have avoided it because it will add an additional requirement
 that will make pg_upgrade more fragile in case of major version changes.

 Does anyone have a sense of how often we change the statistics data
 between major versions?

I don't think pg_statistic is inherently any more stable than any other
system catalog.  We've whacked it around significantly just last week,
which might color my perception a bit, but there are other changes on
the to-do list.  (For one example, see nearby complaints about
estimating TOAST-related costs, which we could not fix without adding
more stats data.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-12 Thread Fujii Masao

On Tue, Mar 13, 2012 at 1:59 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Mar 9, 2012 at 2:45 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:


 Thanks!

 BTW, I haven't forgotten about the recovery bugs Jeff found earlier. I'm
 planning to do a longer run with his test script - I only run it for about
 1000 iterations - to see if I can reproduce the PANIC with both the earlier
 patch version he tested, and this new one.

 Hi Heikki,

 I've run the v12 patch for 17,489 rounds of crash and recovery, and
 detected no assertion failures or other problems.

In v12 patch, I could no longer reproduce the assertion failure, too.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

64 matches

Mail list logo