Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-09 Thread Michael Paquier
On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 The additional process at promotion sounds like a good idea, I'll try to
 get a patch done tomorrow. This would result as well in removing the
 XLogArchiveForceDone stuff. Either way, not that I have been able to
 reproduce the problem manually, things can be clearly solved.

Please find attached two patches aimed to fix this issue and to improve the
situation:
- 0001 prevents the apparition of those phantom WAL segment file by
ensuring that when a node is in recovery it will remove it whatever its
status in archive_status. This patch is the real fix, and should be applied
down to 9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the process
more robust IMO.

I imagine that it would be nice to get those things fixed before the next
minor release.
Regards,
-- 
Michael
From 18c2d47b1d5dd3c0439f990ee4da6b305d477ca4 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 9 Oct 2014 15:04:26 +0900
Subject: [PATCH 1/2] Fix apparition of archive status files of .ready on
 streaming standbys

Commit 1bd42cd has removed a check based on the recovery status of a node
when removing old WAL segment files in pg_xlog, causing the apparition of
.ready files that prevented the removal of some WAL segment files that
remained stuck in the archive folder. Note that this does not prevent
the abscence of some .done files as it may still be possible that some
segments cannot be marked correctly as complete after their stream is done
in the case for example of an abrupt disconnection between a standby and
its root node, but it ensures that when a node is in recovery old WAL
segment files are removed whatever their status in the folder
archive_status.

Per report from Jehan-Guillaume de Rorthais.
---
 src/backend/access/transam/xlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a4dbb9..39701a3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3771,7 +3771,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 			strspn(xlde-d_name, 0123456789ABCDEF) == 24 
 			strcmp(xlde-d_name + 8, lastoff + 8) = 0)
 		{
-			if (XLogArchiveCheckDone(xlde-d_name))
+			if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
 			{
 snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name);
 
-- 
2.1.2

From 493a9d05e9386403fc1e6d78df19dd8a59c53cae Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 9 Oct 2014 15:09:44 +0900
Subject: [PATCH 2/2] Enforce all WAL segment files to be marked as .done at
 node promotion

This is a safety mechanism to ensure that there are no files that are not
considered as .done as some segments may have been missed particularly in
the case of partially written files or files being written when a disconnection
occurred between a streaming standby and its root node. This makes the node
reaching promotion having a state consistent with what is expected using
the assumption that all the WAL segment files that are done being streamed
should be always considered as archived by the node.
---
 src/backend/access/transam/xlog.c| 10 
 src/backend/access/transam/xlogarchive.c | 39 +++-
 src/include/access/xlog_internal.h   |  1 +
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 39701a3..af5f548 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6862,6 +6862,16 @@ StartupXLOG(void)
 	}
 
 	/*
+	 * Create a .done entry for each WAL file present in pg_xlog that has
+	 * not been yet marked as such. In some cases where for example a streaming
+	 * replica has had a connection to a remote node cut abruptly, such WAL
+	 * files may have been only partially written or even not flagged correctly
+	 * with .done.
+	 */
+	if (InRecovery)
+		XLogArchiveForceDoneAll();
+
+	/*
 	 * Kill WAL receiver, if it's still running, before we continue to write
 	 * the startup checkpoint record. It will trump over the checkpoint and
 	 * subsequent records if it's still alive when we start writing WAL.
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 047efa2..931106f 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -554,6 +554,40 @@ XLogArchiveNotifySeg(XLogSegNo segno)
 }
 
 /*
+ * XLogArchiveForceDoneAll
+ *
+ * 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 10:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Do you really expect me to do major work on some aspect of the syntax
 like that, given, as you say, that nobody explicitly agreed with me
 (and only you disagreed with me)? The only remark I heard on that was
 from you (you'd prefer to use NEW.* and OLD.*).
 But you spent much
 more time talking about getting something MERGE-like, which
 NEW.*/OLD.* clearly isn't.

 Yes, it is. Look at the AS clause.

You can alias each of the two tables being joined. But I only have one
table, and no join. When you referred to NEW.* and OLD.*, you clearly
were making a comparison with trigger WHEN clauses, and not MERGE
(which is a comparison I made myself, although for more technical
reasons). It hardly matters, though.

 CONFLICTING() is very close (identical?) to MySQL's use of ON
 DUPLICATE KEY UPDATE val = VALUES(val). I'm happy to discuss that,
 but it's news to me that people take particular issue with it.

 3 people have asked you questions or commented about the use of
 CONFLICTING() while I've been watching.

Lots of people have asked me lots of questions. Again, as I said, I
wasn't aware that CONFLICTING() was a particular point of contention.
Please be more specific.

 It's clearly a non-standard
 mechanism and not inline with other Postgres usage.

What would be inline with other Postgres usage? I don't think you've
been clear on what you think is a better alternative.

I felt a function-like expression was appropriate because the user
refers to different tuples of the target table. It isn't like a join.
Plus it's similar to the MySQL thing, but doesn't misuse VALUES() as a
function-like thing.

 If there is disagreement, publishing the summary of changes you plan
 to make in your next version will help highlight that.

I think I've done a pretty good job of collecting and collating the
opinions of others, fwiw.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-09 Thread Heikki Linnakangas

On 10/09/2014 07:47 AM, furu...@pm.nttdata.co.jp wrote:

If we remove --fsync-interval, resoponse time to user will not be delay.
Although, fsync will be executed multiple times in a short period.
And there is no way to solve the problem without --fsync-interval, what

should we do about it?

I'm sorry, I didn't understand that.


Here is an example.
When WAL is sent at 100ms intervals, fsync() is executed 10 times per second.
If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of 
making WAL record) for 1 second, though, fsync() won't be executed several 
times in 1 second.
I think --fsync-interval meets the demands of people who wants to restrain 
fsync() happens for several time in short period, but what do you think?
Is it ok to delete --fsync-interval ?


I still don't see the problem.

In synchronous mode, pg_receivexlog should have similar logic as 
walreceiver does. It should read as much WAL from the socket as it can 
without blocking, and fsync() and send reply after that. And also fsync 
whenever switching to new segment. If the master sends WAL every 100ms, 
then pg_recevexlog will indeed fsync() 10 times per second. There's 
nothing wrong with that.


In asynchronous mode, only fsync whenever switching to new segment.


Yeah. Or rather, add a new message type, to indicate the
synchronous/asynchronous status.


What kind of 'message type' we have to add ?

Do we need to separate 'w' into two types ? synchronous and asynchronous ?

OR

Add a new message type, kind of 'notify synchronous',
and inform pg_receivexlog of synchronous status when it connect to the server.


Better to add a new notify message type. And pg_recevexlog should be 
prepared to receive it at any time. The status might change on the fly, 
if the server's configuration is reloaded.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-10-09 Thread Heikki Linnakangas

On 10/09/2014 01:23 AM, Gavin Flower wrote:

On 09/10/14 10:13, Andres Freund wrote:

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...


Could a 64 bit variant of some kind be useful as an option - in addition
to a correct 32 bit?


More bits allows you to detect more errors. That's the only advantage. 
I've never heard that being a problem, so no, I don't think that's a 
good idea.



As most people have 64 bit processors and storage is less constrained
now-a-days, as well as we tend to store much larger chunks of data.


That's irrelevant to the CRC in the WAL. Each WAL record is CRC'd 
separately, and they tend to be very small (less than 8k typically) 
regardless of how large chunks of data you store in the database.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Simon Riggs
On 9 October 2014 07:27, Peter Geoghegan p...@heroku.com wrote:

 Please be more specific.

Do not use CONFLICTING() which looks like it is a function.

Instead, use a row qualifier, such as NEW, OLD etc to reference values
from the incoming data
e.g. CONFLICTING.value rather than CONFLICTING(value)

Do not use the word CONFLICTING since it isn't clear whether you are
referring to the row in the table or the value in the incoming data. I
suggest the use of two separately named row qualifiers to allow us to
use either of those when desired. I don't have suggestions as to what
you should call those qualifiers, though Postgres already uses NEW and
OLD in similar circumstances in triggers. (This has nothing at all to
do with the MERGE command in the SQL standard, so please don't mention
that here.)

You may also wish to support the AS keyword, as MERGE does to make the
above even more clear.

e.g. SET col = EXISTING.col + NEW.col

Thank you.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-09 Thread Michael Paquier
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to
 created, ever.

 I think we should add a check in walreceiver, to throw an error if the
 master sends an invalid WAL pointer, pointing to an FF segment.
Attached is a patch for that. This would be needed for PG  9.3 if applied.
Regards,
-- 
Michael
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 753316e..4f4d019 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -789,6 +789,24 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
 	walrcv-lastMsgReceiptTime = lastMsgReceiptTime;
 	SpinLockRelease(walrcv-mutex);
 
+	/*
+	 * Check that the WAL position received from sender is valid and does
+	 * refer to a segment file whose name finishes by FF.
+	 */
+	if ((walEnd.xlogid  0xff) == 0xff)
+	{
+		char		xlogfilename[MAXFNAMELEN];
+		uint32		xlogseg;
+		uint32		xlogid;
+
+		XLByteToPrevSeg(walEnd, xlogid, xlogseg);
+		XLogFileName(xlogfilename, ThisTimeLineID, xlogid, xlogseg);
+		ereport(ERROR,
+(errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg_internal(invalid WAL position %X/%X referring to segment %s,
+ walEnd.xlogid, walEnd.xrecoff, xlogfilename)));
+	}
+
 	if (log_min_messages = DEBUG2)
 	{
 		char	   *sendtime;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 12:38 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Do not use CONFLICTING() which looks like it is a function.

So is ROW(). Or COALESCE().

 Instead, use a row qualifier, such as NEW, OLD etc to reference values
 from the incoming data
 e.g. CONFLICTING.value rather than CONFLICTING(value)

 Do not use the word CONFLICTING since it isn't clear whether you are
 referring to the row in the table or the value in the incoming data.

If you don't have a word that you think would more clearly indicate
the intent of the expression, I'm happy to hear suggestions from
others.

 You may also wish to support the AS keyword, as MERGE does to make the
 above even more clear.

 e.g. SET col = EXISTING.col + NEW.col

That's less clear, IMV. EXISTING.col is col - the very same Var. So
why qualify that it's the existing value in one place but not the
other? In fact, you can't do that now with updates in general:

postgres=# update upsert u set u.val = 'foo';
ERROR:  42703: column u of relation upsert does not exist
LINE 1: update upsert u set u.val = 'foo';
^
LOCATION:  transformUpdateStmt, analyze.c:2068

This does work, which is kind of what you outline:

postgres=# update upsert u set val = u.val;
UPDATE 3

But MERGE accepts the former in other systems (in general, and for
MERGE), where Postgres won't (for UPDATEs in general). Parse analysis
of UPDATE targetlists just rejects this outright.

FWIW, is any of the two tuples reference here NEW, in any sense?
Informally, I'd say the new value is the resulting row - the final row
value after the UPDATE. We want to refer to the existing row, and the
row proposed for insertion (with all before trigger effects carried
forward).

Having the column reference go through an alias like this might be tricky.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 11:11 AM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Oct 9, 2014 at 12:38 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Do not use CONFLICTING() which looks like it is a function.

 So is ROW(). Or COALESCE().

ROW and COALESCE behave almost like functions: they operate on any
expression or value you pass to them.

db=# select coalesce('bar');
 coalesce
--
 bar

Not so with CONFLICTING(), it only accepts a column name -- not a
value -- and has knowledge of the surrounding statement that ordinary
function-like constructs don't.

db=# INSERT into evt_type (name) values ('foo') on conflict UPDATE set
name=conflicting('bar');
ERROR:  syntax error at or near 'bar'
LINE 1: ...lues ('foo') on conflict UPDATE set name=conflicting('bar');

 If you don't have a word that you think would more clearly indicate
 the intent of the expression, I'm happy to hear suggestions from
 others.

I also like NEW due to similarity with triggers, but I see your
concern about it not actually being new.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Simon Riggs
On 9 October 2014 09:11, Peter Geoghegan p...@heroku.com wrote:

 You may also wish to support the AS keyword, as MERGE does to make the
 above even more clear.

 e.g. SET col = EXISTING.col + NEW.col

 That's less clear, IMV. EXISTING.col is col - the very same Var. So
 why qualify that it's the existing value in one place but not the
 other? In fact, you can't do that now with updates in general:

 postgres=# update upsert u set u.val = 'foo';
 ERROR:  42703: column u of relation upsert does not exist
 LINE 1: update upsert u set u.val = 'foo';
 ^
 LOCATION:  transformUpdateStmt, analyze.c:2068

YES, which is exactly why I did not say this, I said something different.

 This does work, which is kind of what you outline:

 postgres=# update upsert u set val = u.val;
 UPDATE 3

YES, which is why I said it.

 But MERGE accepts the former in other systems (in general, and for
 MERGE), where Postgres won't (for UPDATEs in general). Parse analysis
 of UPDATE targetlists just rejects this outright.

 FWIW, is any of the two tuples reference here NEW, in any sense?
 Informally, I'd say the new value is the resulting row - the final row
 value after the UPDATE. We want to refer to the existing row, and the
 row proposed for insertion (with all before trigger effects carried
 forward).

YES, which is why I specifically requested the ability to reference
the incoming data.

Common sense interpretations make for quicker and easier discussions.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 1:33 AM, Marti Raudsepp ma...@juffo.org wrote:
 ROW and COALESCE behave almost like functions: they operate on any
 expression or value you pass to them.

Okay, then like CONFLICTING() is like many of the XML expressions.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 1:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 YES, which is why I specifically requested the ability to reference
 the incoming data.

My point is that people are not really inclined to use an alias in
UPDATEs in general when referring to the target. The thing that seems
special (and worthy of special qualification) is the reference to what
you call the incoming data, and what I've called tuples proposed
for insertion (after being affected by any before row triggers).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 1:56 AM, Peter Geoghegan p...@heroku.com wrote:
 My point is that people are not really inclined to use an alias in
 UPDATEs in general when referring to the target. The thing that seems
 special (and worthy of special qualification) is the reference to what
 you call the incoming data, and what I've called tuples proposed
 for insertion (after being affected by any before row triggers).

For simple cases, you might not even bother with CONFLICTING() - you
might find it easier to just repeat the constant in the INSERT and
UPDATE parts of the query.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Andres Freund
On 2014-10-08 13:52:14 -0400, Robert Haas wrote:
 On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Meh.  Even SELECT 1 is going to be doing *far* more pallocs than that to
  get through raw parsing, parse analysis, planning, and execution startup.
  If you can find a few hundred pallocs we can avoid in trivial queries,
  it would get interesting; but I'll be astonished if saving 4 is measurable.
 
 I got nerd-sniped by this problem today, probably after staring at the
 profiling data very similar to what led Andres to ask the question in
 the first place.  The gains are indeed not measurable on a
 macrobenchmark, but I had to write the patch to figure that out, so
 here it is.
 
 Although the gain isn't a measurable percentage of total runtime, it
 does appear to be a significant percentage of the palloc traffic on
 trivial queries. I did 30-second SELECT-only pgbench runs at scale
 factor 10, using prepared queries.  According to perf, on unpatched
 master, StartTransactionCommand accounts for 11.46% of the calls to
 AllocSetAlloc.  (I imagine this is by runtime, not by call count, but
 it probably doesn't matter much either way.)  But with this patch,
 StartTransactionCommand's share drops to 4.43%.  Most of that is
 coming from AtStart_Inval(), which wouldn't be hard to fix either.

Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.

 I'm inclined to think that tamping down palloc traffic is worthwhile
 even if we can't directly measure the savings on a macrobenchmark.
 AllocSetAlloc is often at or near the top of profiling results, but
 there's rarely a single caller responsible for enough of those calls
 for to make it worth optimizing. But that means that if we refuse to
 take patches that save just a few pallocs, we're basically giving up
 on ever improving anything in this area, and that doesn't seem like a
 good idea either; it's only going to get slowly worse over time as we
 add more features.

I think it depends a bit on the callsites. If its somewhere that nobody
will ever care, because it's a slowpath, then we shouldn't care
either. But that's not the case here, so I do think that makes sense.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-09 Thread furuyao
  If we remove --fsync-interval, resoponse time to user will not be
 delay.
  Although, fsync will be executed multiple times in a short period.
  And there is no way to solve the problem without --fsync-interval,
  what
  should we do about it?
 
  I'm sorry, I didn't understand that.
 
  Here is an example.
  When WAL is sent at 100ms intervals, fsync() is executed 10 times per
 second.
  If --fsync-interval is set by 1 second, we have to wait SQL
 responce(kind of making WAL record) for 1 second, though, fsync() won't
 be executed several times in 1 second.
  I think --fsync-interval meets the demands of people who wants to
 restrain fsync() happens for several time in short period, but what do
 you think?
  Is it ok to delete --fsync-interval ?
 
 I still don't see the problem.
 
 In synchronous mode, pg_receivexlog should have similar logic as
 walreceiver does. It should read as much WAL from the socket as it can
 without blocking, and fsync() and send reply after that. And also fsync
 whenever switching to new segment. If the master sends WAL every 100ms,
 then pg_recevexlog will indeed fsync() 10 times per second. There's
 nothing wrong with that.
 
 In asynchronous mode, only fsync whenever switching to new segment.
 
  Yeah. Or rather, add a new message type, to indicate the
  synchronous/asynchronous status.
 
  What kind of 'message type' we have to add ?
 
  Do we need to separate 'w' into two types ? synchronous and
 asynchronous ?
 
  OR
 
  Add a new message type, kind of 'notify synchronous', and inform
  pg_receivexlog of synchronous status when it connect to the server.
 
 Better to add a new notify message type. And pg_recevexlog should be
 prepared to receive it at any time. The status might change on the fly,
 if the server's configuration is reloaded.

Thanks for the reply.

 In synchronous mode, pg_receivexlog should have similar logic as walreceiver 
 does.

OK. I understand that removing --fsync-interval has no problem.

 Better to add a new notify message type. And pg_recevexlog should be 
 prepared to receive it at any time. The status might change on the fly, if 
 the server's configuration is reloaded.

OK. I'll consider it.

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alexey Bashtanov

Hello!

Autovacuum daemon performs vacuum when the number of rows 
updated/deleted (n_dead_tuples) reaches some threshold.
Similarly it performs analyze when the number of rows changed in any way 
(incl. inserted).
When a table is mostly insert-only, its visibility map is not updated as 
vacuum threshold is almost never reached, but analyze does not update 
visibility map.


Why could it be a bad idea to run vacuum after some number of any 
changes including inserts, like analyze?
Or at least make it tunable by user (add a second bunch of paramters to 
control second vacuum threshold, disabled by default)?


Best regards,
  Alexey Bashtanov


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-09 Thread Andres Freund
On 2014-10-09 14:06:35 +0900, Kyotaro HORIGUCHI wrote:
 Hello, simplly inhibit set retry flag when ProcDiePending in
 my_sock_write seems enough.
 
 But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
 I modified the patch 4 as the attached patch.

Why is that necessary? It seems really rather wrong to make
BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
least in my testing, it's not even required because the be_tls_write()
can just check the error properly?

 diff --git a/src/backend/libpq/be-secure-openssl.c 
 b/src/backend/libpq/be-secure-openssl.c
 index 6fc6903..2288fe2 100644
 --- a/src/backend/libpq/be-secure-openssl.c
 +++ b/src/backend/libpq/be-secure-openssl.c
 @@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)
   BIO_clear_retry_flags(h);
   if (res = 0)
   {
 - if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
 + if (!ProcDiePending 
 + (errno == EINTR || errno == EWOULDBLOCK || errno == 
 EAGAIN))
   {
   BIO_set_retry_write(h);
   }


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Robert Haas
On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote:
 Interesting - in my local profile AtStart_Inval() is more pronounced
 than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
 AtStart_Inval() out of readonly queries ontop of your patch. Together
 that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
 tbl WHER pkey = xxx' testcase.

Whoa.  Now that's clearly significant.  You didn't attach the patch;
was that inadvertent, or was it too ugly for that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Andres Freund
On 2014-10-09 08:18:18 -0400, Robert Haas wrote:
 On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote:
  Interesting - in my local profile AtStart_Inval() is more pronounced
  than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
  AtStart_Inval() out of readonly queries ontop of your patch. Together
  that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
  tbl WHER pkey = xxx' testcase.
 
 Whoa.  Now that's clearly significant.

Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.

 You didn't attach the patch; was that inadvertent, or was it too ugly
 for that?

Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Robert Haas
On Wed, Oct 8, 2014 at 1:52 PM, Michael Banck michael.ba...@credativ.de wrote:
 Looking at it from a DBA perspective, this would indeed be better, yes.

 However, I see a few issues with that:

 1. If you are using an init script (or another wrapper around pg_ctl),
 you don't get any of its output it seems.

 2. Having taken a quick look at pg_ctl, it seems to just kill the
 postmaster on shutdown and wait for its PID file to disappear.  I don't
 see how it should figure out that PostgreSQL is waiting for a checkpoint
 to be finished?

 Or do both. I suspect elog( INFO, ... ) might do that.

 That would imply that pg_ctl receives and writes out log messages
 directed at clients, which I don't think is true?  Even if it was,
 client_min_messages does not include an INFO level, and LOG is not being
 logged to clients by default. So the first common level above the
 default of both client_min_messages and log_min_messages would be
 WARNING, which seems excessive to me.

 As I said, I only took a quick look at pg_ctl though, so I might well be
 missing something.

I think you're spot-on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Andres Freund
On 2014-10-02 15:21:48 +0200, Michael Banck wrote:
 Hi,
 
 we have seen repeatedly that users can be confused about why PostgreSQL
 is not shutting down even though they requested it.  Usually, this is
 because `log_checkpoints' is not enabled and the final checkpoint is
 being written, delaying shutdown. As no message besides shutting down
 is written to the server log in this case, we even had users believing
 the server was hanging and pondering killing it manually.
 
 In order to alert those users that a checkpoint is being written, I
 propose to add a log message waiting for checkpoint ... on shutdown,
 even if log_checkpoints is disabled, as this particular checkpoint might
 be important information.
 
 I've attached a trivial patch for this, should it be added to the next
 commitfest?

How about flipping the default for log_checkpoints instead? There really
isn't a good reason for having it disabled by default.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-09 Thread MauMau

From: Simon Riggs si...@2ndquadrant.com

I hope we can get pgAudit in as a module for 9.5. I also hope that it
will stimulate the requirements/funding of further work in this area,
rather than squash it. My feeling is we have more examples of feature
sets that grow over time (replication, view handling, hstore/JSONB
etc) than we have examples of things languishing in need of attention
(partitioning).


I'm hoping PostgreSQL will have an audit trail feature.  I'd like to support 
your work by reviewing and testing, although I'm not sure I can fully 
understand the code soon.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-10-02 15:21:48 +0200, Michael Banck wrote:
  I've attached a trivial patch for this, should it be added to the next
  commitfest?
 
 How about flipping the default for log_checkpoints instead? There really
 isn't a good reason for having it disabled by default.

Yeah, I agree with this- it's extremely useful information and it's
really not that verbose in general..

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-09 Thread MauMau

Hello,

One user reported a hang problem with 9.4 beta2 on Windows.  The PostgreSQL 
is 64-bit version.  I couldn't find the cause, but want to solve the 
problem.  Could you help with this?


I heard that the user had run 16 concurrent psql sessions which executes 
INSERT and UPDATE statements, which is a write-intensive stress test.  He 
encountered the hang phenomenon twice, one of which occured several hours 
after the start of the test, and the other occured about an hour after the 
test launch.


The user gave me the stack traces, which I attached at the end of this mail. 
The problem appears to be related to the xlog insert scaling.  But I can't 
figure out where the root cause lies --- WAL slot handling, spinlock on 
Windows, or PGSemaphoreLock/UnLock on Windows?


The place I suspect is S_UNLOCK().  It doesn't use any memory barrier.  Is 
this correct on Intel64 processors?


#define S_UNLOCK(lock)  (*((volatile slock_t *) (lock)) = 0)


The rest of this mail is the stack trace:

`0043e0a8 7ff8`213d12ee : `0002 `0002 
`0001 ` : ntdll!ZwWaitForMultipleObjects+0xa
`0043e0b0 0001`401de68e : ` 7ff5`e000 
` `04fb6b40 : 
KERNELBASE!WaitForMultipleObjectsEx+0xe1
`0043e390 0001`4023cf11 : `02a55500 `1c117410 
80605042`36ad2501 0001`405546e0 : postgres!PGSemaphoreLock+0x6e 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\port\win32_sema.c @ 
145]
`0043e3e0 0001`4006203b : `f9017d56 `0022 
` `0400 : postgres!LWLockAcquireCommon+0x121 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\storage\lmgr\lwlock.c 
@ 625]
`0043e430 0001`4002c182 : `0005 ` 
`004e2f00 ` : postgres!XLogInsert+0x62b 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\transam\xlog.c 
@ 1110]
`0043e700 0001`400323b6 : ` ` 
`0a63 `0289de10 : postgres!log_heap_clean+0x102 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 
6561]
`0043e7e0 0001`400320e8 : `040ec5c0 `0a63 
`0043f340 `040ec5c0 : postgres!heap_page_prune+0x2a6 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\pruneheap.c 
@ 261]
`0043f2f0 0001`4002dc40 : `0057ea30 ` 
` `028d1810 : postgres!heap_page_prune_opt+0x148 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\pruneheap.c 
@ 150]
`0043f340 0001`4002e7da : `028d1800 `0d26 
`0005 `0057ea30 : postgres!heapgetpage+0xa0 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 
355]
`0043f3e0 0001`4002802c : ` ` 
` ` : postgres!heapgettup_pagemode+0x40a 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 
944]
`0043f460 0001`40126507 : ` `001d 
` `001d : postgres!heap_getnext+0x1c 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 
1478]
`0043f490 0001`401137f5 : `028d05b0 `028d06c0 
` `028a3d30 : postgres!SeqNext+0x27 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\nodeseqscan.c 
@ 76]
`0043f4c0 0001`4010c7b2 : `0058dba0 `028d05b0 
` ` : postgres!ExecScan+0xd5 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execscan.c @ 
167]
`0043f520 0001`4012448d : `028d02e0 `028d02d8 
`028d02e0 `00585a00 : postgres!ExecProcNode+0xd2 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execprocnode.c 
@ 400]
`0043f550 0001`4010c772 : `00587bc0 `028d0110 
` `028d0258 : postgres!ExecModifyTable+0x10d 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\nodemodifytable.c 
@ 926]
`0043f610 0001`4010bb6d : `028d0110 `00587bc0 
` `0056c740 : postgres!ExecProcNode+0x92 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execprocnode.c 
@ 377]
`0043f640 0001`401099d8 : `00570ff0 `0051e400 
`028d0110 `005831f0 : postgres!ExecutePlan+0x5d 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execmain.c @ 
1481]
`0043f680 0001`4024f813 : `00570ff0 `0051e468 
`0051c530 `005831f0 : postgres!standard_ExecutorRun+0xa8 
[d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execmain.c @ 
319]
`0043f6f0 0001`4024ff5a : 

Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 How about flipping the default for log_checkpoints instead? There really
 isn't a good reason for having it disabled by default.

 Yeah, I agree with this- it's extremely useful information and it's
 really not that verbose in general..

-1.  Every time we've turned on default logging of routine events,
there's been pushback and it was eventually turned off again as log spam.
I guarantee you that logging checkpoints will be seen as log spam, except
by people who are actually having trouble with that behavior, which is
a small minority.

I'm not really convinced that there's a problem here that needs fixing
at all, but certainly putting log_checkpoints on by default is not an
acceptable fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-09 Thread Andres Freund
On 2014-10-09 18:17:09 +0530, Amit Kapila wrote:
 On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote:
 
  On another point, I think it would be a good idea to rebase the
  bgreclaimer patch over what I committed, so that we have a
  clean patch against master to test with.
 
 Please find the rebased patch attached with this mail.  I have taken
 some performance data as well and done some analysis based on
 the same.
 
 Performance Data
 
 IBM POWER-8 24 cores, 192 hardware threads
 RAM = 492GB
 max_connections =300
 Database Locale =C
 checkpoint_segments=256
 checkpoint_timeout=15min
 shared_buffers=8GB
 scale factor = 5000
 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
 Duration of each individual run = 5mins

I don't think OLTP really is the best test case for this. Especially not
pgbench with relatilvely small rows *and* a uniform distribution of
access.

Try parallel COPY TO. Batch write loads is where I've seen this hurt
badly.

 patch_ver/client_count 1 8 32 64 128 256
 HEAD 18884 118628 251093 216294 186625 177505
 PATCH 18743 122578 247243 205521 179712 175031

So, pretty much no benefits on any scale, right?


 Here we can see that the performance dips at higher client
 count(=32) which was quite surprising for me, as I was expecting
 it to improve, because bgreclaimer reduces the contention by making
 buffers available on free list.  So I tried to analyze the situation by
 using perf and found that in above configuration, there is a contention
 around freelist spinlock with HEAD and the same is removed by Patch,
 but still the performance goes down with Patch.  On further analysis, I
 observed that actually after Patch there is an increase in contention
 around ProcArrayLock (shared LWlock) via GetSnapshotData which
 sounds bit odd, but that's what I can see in profiles.  Based on analysis,
 few ideas which I would like to further investigate are:
 a.  As there is an increase in spinlock contention, I would like to check
 with Andres's latest patch which reduces contention around shared
 lwlocks.
 b.  Reduce some instructions added by patch in StrategyGetBuffer(),
 like instead of awakening bgreclaimer at low threshold, awaken when
 it tries to do clock sweep.
 

Are you sure you didn't mix up the profiles here? The head vs. patched
look more like profiles from different client counts than different
versions of the code.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-09 Thread Amit Kapila
On Thu, Oct 9, 2014 at 7:31 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-09 18:17:09 +0530, Amit Kapila wrote:
  On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas robertmh...@gmail.com
wrote:
  
   On another point, I think it would be a good idea to rebase the
   bgreclaimer patch over what I committed, so that we have a
   clean patch against master to test with.
 
  Please find the rebased patch attached with this mail.  I have taken
  some performance data as well and done some analysis based on
  the same.
 
  Performance Data
  
  IBM POWER-8 24 cores, 192 hardware threads
  RAM = 492GB
  max_connections =300
  Database Locale =C
  checkpoint_segments=256
  checkpoint_timeout=15min
  shared_buffers=8GB
  scale factor = 5000
  Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
  Duration of each individual run = 5mins

 I don't think OLTP really is the best test case for this. Especially not
 pgbench with relatilvely small rows *and* a uniform distribution of
 access.

 Try parallel COPY TO. Batch write loads is where I've seen this hurt
 badly.

  patch_ver/client_count 1 8 32 64 128 256
  HEAD 18884 118628 251093 216294 186625 177505
  PATCH 18743 122578 247243 205521 179712 175031

 So, pretty much no benefits on any scale, right?

Almost Right, there seem to be slight benefit at client count
8, however that can be due to variation as well.

  Here we can see that the performance dips at higher client
  count(=32) which was quite surprising for me, as I was expecting
  it to improve, because bgreclaimer reduces the contention by making
  buffers available on free list.  So I tried to analyze the situation by
  using perf and found that in above configuration, there is a contention
  around freelist spinlock with HEAD and the same is removed by Patch,
  but still the performance goes down with Patch.  On further analysis, I
  observed that actually after Patch there is an increase in contention
  around ProcArrayLock (shared LWlock) via GetSnapshotData which
  sounds bit odd, but that's what I can see in profiles.  Based on
analysis,
  few ideas which I would like to further investigate are:
  a.  As there is an increase in spinlock contention, I would like to
check
  with Andres's latest patch which reduces contention around shared
  lwlocks.
  b.  Reduce some instructions added by patch in StrategyGetBuffer(),
  like instead of awakening bgreclaimer at low threshold, awaken when
  it tries to do clock sweep.
 

 Are you sure you didn't mix up the profiles here?

I have tried this 2 times, basically I am quite confident from myside,
but human errors can't be ruled out.  I have used below statements:

Steps used for profiling
during configure, use CFLAS=-fno-omit-frame-pointer
Terminal -1
Start Server
Terminal -2
./pgbench -c 64 -j 64 -T 300 -S -M prepared postgres
Terminal-3
perf record -a -g sleep 60  --This command is run after a minute or so of
  -- test start

After test is finished - perf report -g graph,0.5,callee

Do you see any problem in the way I am collecting perf reports?

In any case, I can try once more if you still doubt the profiles.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-09 Thread Chris Bandy
On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  typedef struct
   {
 ! char token[TOKMAXLEN + 1]; /* now always null-terminated */
   char type;
 ! int32 value;
   } datetkn;


Being entirely new to this code, now makes me think of the current
timestamp. I think this word can be removed to reduce ambiguity.


+ /* use strncmp so that we match truncated tokens */
result = strncmp(key, position-token, TOKMAXLEN);


In your proposal you wanted to remove crufty code that deals with
non-null-terminated token strings. Is this some of that crufty code? Can
it be removed?


-- Chris


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-10-09 Thread Kevin Grittner
Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 10/02/2014 03:20 AM, Kevin Grittner wrote:
 My only concern from the benchmarks is that it seemed like there
 was a statistically significant increase in planning time:

 unpatched plan time average: 0.450 ms
 patched plan time average:  0.536 ms

 That *might* just be noise, but it seems likely to be real.  For
 the improvement in run time, I'd put up with an extra 86us in
 planning time per hash join; but if there's any way to shave some
 of that off, all the better.

 The patch doesn't modify the planner at all, so it would be rather
 surprising if it increased planning time. I'm willing to just write that
 off as noise.

Fair enough.  I have seen much larger variations caused by how
the executable code happened to line up relative to cacheline
boundaries; and we've never made any effort to manage that.

I've tried various other tests using \timing rather than EXPLAIN,
and the patched version looks even better in those cases.  I have
seen up to 4x the performance for a query using the patched
version, higher variability in run time without the patch, and have
yet to devise a benchmark where the patched version came out slower
(although I admit to not being as good at creating such cases as
some here).

When given a generous work_mem setting the patched version often
uses more of what it is allowed than the unpatched version (which
is probably one of the reasons it tends to do better).  If someone
has set a high work_mem and has gotten by only because the
configured amount is not all being used when it would benefit
performance, they may need to adjust work_mem down to avoid memory
problems.  That doesn't seem to me to be a reason to reject the
patch.

This is in Waiting on Author status only because I never got an
answer about why the debug code used printf() rather the elog() at
a DEBUG level.  Other than that, I would say this patch is Ready
for Committer.  Tomas?  You there?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-09 Thread Craig Ringer
On 10/09/2014 09:47 PM, MauMau wrote:
 
 I heard that the user had run 16 concurrent psql sessions which executes
 INSERT and UPDATE statements, which is a write-intensive stress test. 
 He encountered the hang phenomenon twice, one of which occured several
 hours after the start of the test, and the other occured about an hour
 after the test launch.

It'd be interesting and useful to run this test on a debug build of
PostgreSQL, i.e. one compiled against the debug version of the C library
and with full debuginfo not just minimal .pdb.

How were the stacks captured - what tool?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-09 Thread Tom Lane
Chris Bandy bandy.ch...@gmail.com writes:
 On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 + /* use strncmp so that we match truncated tokens */
 result = strncmp(key, position-token, TOKMAXLEN);

 In your proposal you wanted to remove crufty code that deals with
 non-null-terminated token strings. Is this some of that crufty code? Can
 it be removed?

Yeah, I had hoped to simplify these things to just strcmp, but on closer
inspection that didn't work, because some of the keywords in the table are
truncated at TOKMAXLEN (10 characters).  If we just made these strcmp then
the code would stop recognizing e.g. milliseconds.

I thought briefly about widening TOKMAXLEN so that there were no truncated
keywords in the table, but that seems risky from a backwards-compatibility
standpoint: as it stands, the code will accept milliseconds,
millisecond, or millisecon, and there might possibly be applications
out there that depend on that.  In any case I'd much rather keep the array
stride at 16 bytes for speed reasons; and who's to say we might not put in
some even-longer keywords in the future?

Another alternative we should maybe consider is leaving the definition
of the token field alone (ie, still not guaranteed null terminated)
which'd leave us with one free byte per datetkn entry.  I can't think
of a likely reason to need another 1-byte field though, and the existing
definition is not without risk.  That comment that was there about don't
change this to strlcpy was there because somebody broke it awhile back,
or at least submitted a patch that would've broken it if it'd been
accepted.  People are too used to null-terminated strings in C; a field
definition that violates that norm is just trouble waiting to happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-09 Thread Andres Freund
On 2014-10-09 22:47:48 +0900, MauMau wrote:
 Hello,
 
 One user reported a hang problem with 9.4 beta2 on Windows.  The PostgreSQL
 is 64-bit version.  I couldn't find the cause, but want to solve the
 problem.  Could you help with this?
 
 I heard that the user had run 16 concurrent psql sessions which executes
 INSERT and UPDATE statements, which is a write-intensive stress test.  He
 encountered the hang phenomenon twice, one of which occured several hours
 after the start of the test, and the other occured about an hour after the
 test launch.
 
 The user gave me the stack traces, which I attached at the end of this mail.
 The problem appears to be related to the xlog insert scaling.  But I can't
 figure out where the root cause lies --- WAL slot handling, spinlock on
 Windows, or PGSemaphoreLock/UnLock on Windows?
 
 The place I suspect is S_UNLOCK().  It doesn't use any memory barrier.  Is
 this correct on Intel64 processors?

What precisely do you mean with Intel64? 64bit x86 or Itanium?

Also, what's the precise workload? Can you reproduce the problem?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-10-09 Thread Rahila Syed
Hello,

Thank you for review.

1) I don't think it's a good idea to put the full page write compression 
   into struct XLogRecord.

Full page write compression information can be stored in varlena struct of
compressed blocks as done for toast data in pluggable compression support
patch. If I understand correctly, it can be done similar to the manner in
which compressed Datum is modified to contain information about compression
algorithm in pluggable compression support patch.

2) You've essentially removed a lot of checks about the validity of bkp 
   blocks in xlogreader. I don't think that's acceptable

To ensure this, the raw size stored in first four byte of compressed datum
can be used to perform error checking for backup blocks
Currently, the error checking for size of backup blocks happens individually
for each block.
If backup blocks are compressed together , it can happen once for the entire
set of backup blocks in a WAL record. The total raw size of compressed
blocks can be checked against the total size stored in WAL record header. 

3) You have both FullPageWritesStr() and full_page_writes_str().

full_page_writes_str() is true/false version of FullPageWritesStr macro. It
is implemented for backward compatibility with pg_xlogdump


4)I don't like FullPageWritesIsNeeded(). For one it, at least to me, 
   sounds grammatically wrong. More importantly when reading it I'm 
   thinking of it being about the LSN check. How about instead directly 
   checking whatever != FULL_PAGE_WRITES_OFF? 

I will modify this.

5) CompressBackupBlockPagesAlloc is declared static but not defined as 
   such. 
7) Unless I miss something CompressBackupBlock should be plural, right? 
   ATM it compresses all the blocks? 
I will correct these.

6)You call CompressBackupBlockPagesAlloc() from two places. Neither is 
  IIRC within a critical section. So you imo should remove the outOfMem 
   handling and revert to palloc() instead of using malloc directly. 

Yes neither is in critical section. outOfMem handling is done in order to
proceed without compression of FPW in case sufficient memory is not
available for compression.


Thank you,
Rahila Syed



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Compression-of-full-page-writes-tp5769039p5822391.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Corporate and Individual Contributor License Agreements (CLAs)

2014-10-09 Thread Joshua D. Drake


Arcadiy,

You may want to refer them to the license itself which will make it very 
easy for them to understand why the Contributor License Agreement is not 
required:


PostgreSQL is released under the PostgreSQL License, a liberal Open 
Source license, similar to the BSD or MIT licenses.


PostgreSQL Database Management System
(formerly known as Postgres, then as Postgres95)

Portions Copyright (c) 1996-2014, The PostgreSQL Global Development Group

Portions Copyright (c) 1994, The Regents of the University of California

Permission to use, copy, modify, and distribute this software and its 
documentation for any purpose, without fee, and without a written 
agreement is hereby granted, provided that the above copyright notice 
and this paragraph and the following two paragraphs appear in all copies.


IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY 
FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, 
INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS 
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF 
THE POSSIBILITY OF SUCH DAMAGE.


THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, 
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY 
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS 
ON AN AS IS BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS 
TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Jeff Janes
On Wed, Oct 8, 2014 at 10:52 AM, Michael Banck michael.ba...@credativ.de
wrote:

 Hi,

 Am Samstag, den 04.10.2014, 15:05 -0500 schrieb Jim Nasby:
  On 10/4/14, 1:21 PM, Jeff Janes wrote:
   On Thu, Oct 2, 2014 at 6:21 AM, Michael Banck wrote:
   we have seen repeatedly that users can be confused about why
 PostgreSQL
   is not shutting down even though they requested it. Usually, this
 is
   because `log_checkpoints' is not enabled and the final checkpoint
 is
   being written, delaying shutdown. As no message besides shutting
 down
   is written to the server log in this case, we even had users
 believing
   the server was hanging and pondering killing it manually.
  
  
  Wouldn't a better place to write this message be the terminal from
  which pg_ctl stop was invoked, rather than the server log file?

 Looking at it from a DBA perspective, this would indeed be better, yes.

 However, I see a few issues with that:

 1. If you are using an init script (or another wrapper around pg_ctl),
 you don't get any of its output it seems.

 2. Having taken a quick look at pg_ctl, it seems to just kill the
 postmaster on shutdown and wait for its PID file to disappear.  I don't
 see how it should figure out that PostgreSQL is waiting for a checkpoint
 to be finished?


It could just print out a reminder that a checkpoint will occur, depending
on what mode of shutdown was requested.  I don't think this reminder has be
validated by the server itself, the intention should be enough.

Most people who don't know that a clean shutdown inherently involves a
checkpoint probably don't monitor the server log closely, either.  Of
course if they use packager scripts to do the shutdown and those scripts
don't pass along the message, I guess that still doesn't help.

Cheers,

Jeff


[HACKERS] Expose options to explain? (track_io_timing)

2014-10-09 Thread Joshua D. Drake


Salut!

Fellow volunteers, I request assistance in understanding the following:

When I explain a query I can get the following information:


   |   I/O Read Time: 0.000,
   |   I/O Write Time: 0.000

I know why it is 0. My question is this, can we expose it to explain 
only? Specifically explain (analyze,buffers). Is there a technical 
reason we must turn on track_io_timing for the whole system? If there 
isn't, is this something the community would want?


JD
--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Marti Raudsepp
On Thu, Oct 2, 2014 at 4:21 PM, Michael Banck michael.ba...@credativ.de wrote:
 we have seen repeatedly that users can be confused about why PostgreSQL
 is not shutting down even though they requested it.  Usually, this is
 because `log_checkpoints' is not enabled and the final checkpoint is
 being written, delaying shutdown.

Are you convinced that that's the *actual* reason? Maybe you're
mis-attributing it.

In my experience shutdown delays are often caused by using the default
smart shutdown mode, which is another way of saying completely
stupid. It waits until all connections to the server are closed --
meaning never in common situations with connection pooling or when an
admin has forgotten their psql shell open. To add insult to injury,
you can't open a new connection to invoke pg_terminate_backend() to
kill them, either.

In the case of a restart, it can cause much longer downtime than a
fast/immediate restart would.

Sorry for the rant, but am I the only one hating that default?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Andres Freund
On 2014-10-09 09:44:09 -0400, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Andres Freund (and...@2ndquadrant.com) wrote:
  How about flipping the default for log_checkpoints instead? There really
  isn't a good reason for having it disabled by default.
 
  Yeah, I agree with this- it's extremely useful information and it's
  really not that verbose in general..
 
 -1.  Every time we've turned on default logging of routine events,
 there's been pushback and it was eventually turned off again as log spam.
 I guarantee you that logging checkpoints will be seen as log spam, except
 by people who are actually having trouble with that behavior, which is
 a small minority.

We're talking about 2 log message per checkpoint_timeout interval
here. That's pretty darn far away from log spam. Was there really any
case of such low frequency message causing ire?

And if it's more frequent you can be happy that you see the log message
- because your config isn't appropriate for your load. The number of
times I've seen people being baffled at the bad performance because of a
inadequate checkpoint configuration, whose effect they couldn't see, is
baffling.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-10-09 09:44:09 -0400, Tom Lane wrote:
  Stephen Frost sfr...@snowman.net writes:
   Yeah, I agree with this- it's extremely useful information and it's
   really not that verbose in general..
  
  -1.  Every time we've turned on default logging of routine events,
  there's been pushback and it was eventually turned off again as log spam.
  I guarantee you that logging checkpoints will be seen as log spam, except
  by people who are actually having trouble with that behavior, which is
  a small minority.
 
 We're talking about 2 log message per checkpoint_timeout interval
 here. That's pretty darn far away from log spam. Was there really any
 case of such low frequency message causing ire?

For embedded devices and similar small-scale systems, I can see Tom's
point.  At the same time, I would expect those to require sufficient
configuration that also setting log_checkpoints to 'off' wouldn't be a
huge deal.

 And if it's more frequent you can be happy that you see the log message
 - because your config isn't appropriate for your load. The number of
 times I've seen people being baffled at the bad performance because of a
 inadequate checkpoint configuration, whose effect they couldn't see, is
 baffling.

I certainly agree with this, but then, look at our default
log_line_prefix and other default log settings..  In general, our
defaults are horrible and fixing this one is really just the tip of the
iceburg..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] alter user set local_preload_libraries.

2014-10-09 Thread Fujii Masao
On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote:
 The attached patch simply changes the context for local_... to
 PGC_USERSET and edits the doc.

 I had this ready to commit, but then

 Invent PGC_SU_BACKEND and mark log_connections/log_disconnections
 that way.

 was committed in the meantime.

 Does this affect what we should do with this change?

 I guess one thing to look into would be whether we could leave
 local_preload_libraries as PGC_BACKEND and change
 session_preload_libraries to PGC_SU_BACKEND, and then investigate
 whether we could allow settings made with ALTER ROLE / SET to change
 PGC_BACKEND settings.

 Yeah, I was wondering about that while I was making the other commit.
 I did not touch those variables at the time, but it would make sense
 to restrict them as you suggest.

+1

Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
about applying the attached patch? This patch allows that and
changes the context of session_preload_libraries to PGC_SU_BACKEND.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 6260,6266  SET XML OPTION { DOCUMENT | CONTENT };
listitem
 para
  This variable specifies one or more shared libraries that are to be
! preloaded at connection start.  Only superusers can change this setting.
  The parameter value only takes effect at the start of the connection.
  Subsequent changes have no effect.  If a specified library is not
  found, the connection attempt will fail.
--- 6260,6267 
listitem
 para
  This variable specifies one or more shared libraries that are to be
! preloaded at connection start.  Only superusers can change this setting
! at session start.
  The parameter value only takes effect at the start of the connection.
  Subsequent changes have no effect.  If a specified library is not
  found, the connection attempt will fail.
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 2892,2898  static struct config_string ConfigureNamesString[] =
  	},
  
  	{
! 		{session_preload_libraries, PGC_SUSET, CLIENT_CONN_PRELOAD,
  			gettext_noop(Lists shared libraries to preload into each backend.),
  			NULL,
  			GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_SUPERUSER_ONLY
--- 2892,2898 
  	},
  
  	{
! 		{session_preload_libraries, PGC_SU_BACKEND, CLIENT_CONN_PRELOAD,
  			gettext_noop(Lists shared libraries to preload into each backend.),
  			NULL,
  			GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_SUPERUSER_ONLY
***
*** 5756,5762  set_config_option(const char *name, const char *value,
  			else if (context != PGC_POSTMASTER 
  	 context != PGC_BACKEND 
  	 context != PGC_SU_BACKEND 
! 	 source != PGC_S_CLIENT)
  			{
  ereport(elevel,
  		(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
--- 5756,5766 
  			else if (context != PGC_POSTMASTER 
  	 context != PGC_BACKEND 
  	 context != PGC_SU_BACKEND 
! 	 source != PGC_S_CLIENT 
! 	 source != PGC_S_DATABASE_USER 
! 	 source != PGC_S_USER 
! 	 source != PGC_S_DATABASE 
! 	 source != PGC_S_GLOBAL)
  			{
  ereport(elevel,
  		(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
***
*** 8820,8827  validate_option_array_item(const char *name, const char *value,
  	 * There are three cases to consider:
  	 *
  	 * name is a known GUC variable.  Check the value normally, check
! 	 * permissions normally (ie, allow if variable is USERSET, or if it's
! 	 * SUSET and user is superuser).
  	 *
  	 * name is not known, but exists or can be created as a placeholder (i.e.,
  	 * it has a prefixed name).  We allow this case if you're a superuser,
--- 8824,8831 
  	 * There are three cases to consider:
  	 *
  	 * name is a known GUC variable.  Check the value normally, check
! 	 * permissions normally (ie, allow if variable is USERSET/BACKEND,
! 	 * or if it's SUSET/SU_BACKEND and user is superuser).
  	 *
  	 * name is not known, but exists or can be created as a placeholder (i.e.,
  	 * it has a prefixed name).  We allow this case if you're a superuser,
***
*** 8861,8877  validate_option_array_item(const char *name, const char *value,
  	}
  
  	/* manual permissions check so we can avoid an error being thrown */
! 	if (gconf-context == PGC_USERSET)
  		 /* ok */ ;
! 	else if (gconf-context == PGC_SUSET  superuser())
  		 /* ok */ ;
  	else if (skipIfNoPermissions)
  		return false;
  	/* if a permissions error should be thrown, let set_config_option do it */
  
! 	/* test for permissions and valid option value */
  	(void) set_config_option(name, value,
! 			 superuser() ? PGC_SUSET : PGC_USERSET,
  			 PGC_S_TEST, 

Re: [HACKERS] replicating DROP commands across servers

2014-10-09 Thread Jim Nasby

On 10/6/14, 11:24 PM, Robert Haas wrote:

Offlist.


FWIW, I've run into situations more than once in userspace where I need a
way to properly separate schema and object name. Generally I can make do
using reg* casts and then hitting catalog tables, but it'd be nice if there
was an easier way.


Sure, although I think that's a bit of a separate problem.  It's hard
to iterate through a string a character at a time from the SQL level
so that you can handle stuff like the quote_literal() rules.  If we
want people to be able to do that easily we need to provide tools to
handle it.  But C is actually quite well-suited to such tasks.


Yeah, I wouldn't want to attempt this in SQL; I was saying that a built-in 
function to do this would be broadly useful, not just for replicating DROPs.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-10-09 09:44:09 -0400, Tom Lane wrote:
 -1.  Every time we've turned on default logging of routine events,
 there's been pushback and it was eventually turned off again as log spam.

 We're talking about 2 log message per checkpoint_timeout interval
 here. That's pretty darn far away from log spam. Was there really any
 case of such low frequency message causing ire?

 For embedded devices and similar small-scale systems, I can see Tom's
 point.  At the same time, I would expect those to require sufficient
 configuration that also setting log_checkpoints to 'off' wouldn't be a
 huge deal.

Here's the problem as I see it: DBAs will be annoyed by the spam and will
turn it off.  Then they'll still be confused when a shutdown takes a long
time.  So this is no fix at all for the original complaint.

I'm also not entirely convinced that checkpoints have anything to do with
the complaint.  Once we get a shutdown request, we're going to have to
perform a checkpoint, which we do at full speed, no delays (or at least
did so last I checked).  Whether a checkpoint was already in progress is
more or less irrelevant.  It's always been like that and I can't recall
anybody complaining about it.  I suspect Marti is correct that the real
problem is elsewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-09 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  For embedded devices and similar small-scale systems, I can see Tom's
  point.  At the same time, I would expect those to require sufficient
  configuration that also setting log_checkpoints to 'off' wouldn't be a
  huge deal.
 
 Here's the problem as I see it: DBAs will be annoyed by the spam and will
 turn it off.  Then they'll still be confused when a shutdown takes a long
 time.  So this is no fix at all for the original complaint.

I've not run into very many folks working with embedded devices, so take
this with a grain of salt, but I have *never* run into a DBA who is
running a production system who doesn't want log_checkpoints,
log_connections, log_disconnections, and a much more verbose
log_line_prefix (and more, really), so I don't buy into this argument at
all.  Our default logging is no where near what logging on a production
system should be and I'd be interested to meet the DBA who disagrees
with that, because they've got some requiremeents that I've not dealt
with before.

Basically, I believe every DBA who is using PG for more than a toy setup
(or strictly development) would be pleasantly surprised to have
checkpoints logged; far too many of them don't even know the option
exists.

 I'm also not entirely convinced that checkpoints have anything to do with
 the complaint.  Once we get a shutdown request, we're going to have to
 perform a checkpoint, which we do at full speed, no delays (or at least
 did so last I checked).  Whether a checkpoint was already in progress is
 more or less irrelevant.  It's always been like that and I can't recall
 anybody complaining about it.  I suspect Marti is correct that the real
 problem is elsewhere.

This is certainly an interesting question and was asked about up-thread
also, I believe.  I agree that if it wasn't slow to shut down due to a
checkpoint then logging checkpoints isn't going to help.  If the issue
is that it's a 'smart' shutdown request with folks logged in, then
perhaps we should consider logging *that* fact..  waiting to shut down
due to user connections or some such.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] replicating DROP commands across servers

2014-10-09 Thread Alvaro Herrera
Jim Nasby wrote:
 On 10/6/14, 11:24 PM, Robert Haas wrote:
 
 Offlist.
 
 FWIW, I've run into situations more than once in userspace where I need a
 way to properly separate schema and object name. Generally I can make do
 using reg* casts and then hitting catalog tables, but it'd be nice if there
 was an easier way.
 
 Sure, although I think that's a bit of a separate problem.  It's hard
 to iterate through a string a character at a time from the SQL level
 so that you can handle stuff like the quote_literal() rules.  If we
 want people to be able to do that easily we need to provide tools to
 handle it.  But C is actually quite well-suited to such tasks.
 
 Yeah, I wouldn't want to attempt this in SQL; I was saying that a
 built-in function to do this would be broadly useful, not just for
 replicating DROPs.

Well, most of what you need is served by pg_identify_object, I think:
just grab the OID from the appropriate catalog, and the OID of the
catalog itself; that function will give you schema and name, which is
what you need.  Probably the most difficult part is figuring out which
reg* cast you want .. and of course there are also cases where there is
no such datatype to cast to in the first place.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expose options to explain? (track_io_timing)

2014-10-09 Thread Jeff Janes
On Thu, Oct 9, 2014 at 10:17 AM, Joshua D. Drake j...@commandprompt.com
wrote:


 Salut!

 Fellow volunteers, I request assistance in understanding the following:

 When I explain a query I can get the following information:


|   I/O Read Time: 0.000,
|   I/O Write Time: 0.000

 I know why it is 0. My question is this, can we expose it to explain only?
 Specifically explain (analyze,buffers). Is there a technical reason we
 must turn on track_io_timing for the whole system? If there isn't, is this
 something the community would want?



I think the theory for track_io_timing being PGC_SUSET is that if the
superuser turned it on, no one should be able to turn it off.

But I don't see an argument for the other way around, that no one should be
able to turn it on locally of the superuser left it at the default of off.

So I think the real behavior we would want is that anyone can turn it on in
their session, and can also turn it off provided it was turned on by them
in the first place.  But there is no machinery in the GUC code to do that,
which is probably why it wasn't done.  I meant to work on that for this dev
cycle, but I never dug into how to implement the provided it was turned on
by them in the first place part of the requirement.  And how would this be
expressed generically? Some notion that the default value can be a floor or
ceiling which the user can alter in one direction, and reverse that
alteration. PGC_SUSET_FLOOR and PGC_SUSET_CEILING?

Anyway, if we are going to solve this for track_io_timing, I think it would
be better so solve it in a way that it can also be used by
pg_stat_statements, as well as EXPLAIN.  But maybe that doesn't make sense,
as you need to be a superuser to call pg_stat_statements_reset() anyway.

Cheers,

Jeff


Re: [HACKERS] Last Commitfest patches waiting review

2014-10-09 Thread Peter Geoghegan
On Mon, Oct 6, 2014 at 11:53 AM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Oct 6, 2014 at 11:27 AM, Robert Haas robertmh...@gmail.com wrote:
 Well, really, I was just suggesting that I can spend more time on the
 patch, but not immediately.

 We haven't really talked about the idea of the HyperLogLog-based abort
 mechanism - the actual cost model - even though I thought we'd have
 discussed that extensively by now.

My concern is that if we only get it committed in the last commitfest,
we may run out of time to make sortsupport work for B-Tree index
builds. That's where the sortsupport for text stuff will be really
useful.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Robert Haas
On Thu, Oct 9, 2014 at 8:20 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-10-09 08:18:18 -0400, Robert Haas wrote:
 On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote:
  Interesting - in my local profile AtStart_Inval() is more pronounced
  than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
  AtStart_Inval() out of readonly queries ontop of your patch. Together
  that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
  tbl WHER pkey = xxx' testcase.

 Whoa.  Now that's clearly significant.

 Well, my guess it'll be far less noticeable in less trivial
 workloads. But it does seem worthwile.

 You didn't attach the patch; was that inadvertent, or was it too ugly
 for that?

 Far, far too ugly ;). I just removed the AtStart() call from xact.c and
 sprinkled it around relevant places instead ;)

OK, here's an attempt at a real patch for that.  I haven't perf-tested this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5b5d31b..651a5c4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1838,7 +1838,6 @@ StartTransaction(void)
 	 * initialize other subsystems for new transaction
 	 */
 	AtStart_GUC();
-	AtStart_Inval();
 	AtStart_Cache();
 	AfterTriggerBeginXact();
 
@@ -4151,7 +4150,6 @@ StartSubTransaction(void)
 	 */
 	AtSubStart_Memory();
 	AtSubStart_ResourceOwner();
-	AtSubStart_Inval();
 	AtSubStart_Notify();
 	AfterTriggerBeginSubXact();
 
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index a7a768e..6b6c88e 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -693,19 +693,32 @@ AcceptInvalidationMessages(void)
 }
 
 /*
- * AtStart_Inval
- *		Initialize inval lists at start of a main transaction.
+ * PrepareInvalidationState
+ *		Initialize inval lists for the current (sub)transaction.
  */
-void
-AtStart_Inval(void)
+static void
+PrepareInvalidationState(void)
 {
-	Assert(transInvalInfo == NULL);
-	transInvalInfo = (TransInvalidationInfo *)
+	TransInvalidationInfo *myInfo;
+
+	if (transInvalInfo != NULL 
+		transInvalInfo-my_level == GetCurrentTransactionNestLevel())
+		return;
+
+	myInfo = (TransInvalidationInfo *)
 		MemoryContextAllocZero(TopTransactionContext,
 			   sizeof(TransInvalidationInfo));
-	transInvalInfo-my_level = GetCurrentTransactionNestLevel();
-	SharedInvalidMessagesArray = NULL;
-	numSharedInvalidMessagesArray = 0;
+	myInfo-parent = transInvalInfo;
+	myInfo-my_level = GetCurrentTransactionNestLevel();
+
+	/*
+	 * If there's any previous entry, this one should be for a deeper
+	 * nesting level.
+	 */
+	Assert(transInvalInfo == NULL ||
+		myInfo-my_level  transInvalInfo-my_level);
+
+	transInvalInfo = myInfo;
 }
 
 /*
@@ -727,24 +740,6 @@ PostPrepare_Inval(void)
 }
 
 /*
- * AtSubStart_Inval
- *		Initialize inval lists at start of a subtransaction.
- */
-void
-AtSubStart_Inval(void)
-{
-	TransInvalidationInfo *myInfo;
-
-	Assert(transInvalInfo != NULL);
-	myInfo = (TransInvalidationInfo *)
-		MemoryContextAllocZero(TopTransactionContext,
-			   sizeof(TransInvalidationInfo));
-	myInfo-parent = transInvalInfo;
-	myInfo-my_level = GetCurrentTransactionNestLevel();
-	transInvalInfo = myInfo;
-}
-
-/*
  * Collect invalidation messages into SharedInvalidMessagesArray array.
  */
 static void
@@ -803,8 +798,16 @@ xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
 {
 	MemoryContext oldcontext;
 
+	/* Quick exit if we haven't done anything with invalidation messages. */
+	if (transInvalInfo == NULL)
+	{
+		*RelcacheInitFileInval = false;
+		*msgs = NULL;
+		return 0;
+	}
+
 	/* Must be at top of stack */
-	Assert(transInvalInfo != NULL  transInvalInfo-parent == NULL);
+	Assert(transInvalInfo-my_level == 1  transInvalInfo-parent == NULL);
 
 	/*
 	 * Relcache init file invalidation requires processing both before and
@@ -904,11 +907,15 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 void
 AtEOXact_Inval(bool isCommit)
 {
+	/* Quick exit if no messages */
+	if (transInvalInfo == NULL)
+		return;
+
+	/* Must be at top of stack */
+	Assert(transInvalInfo-my_level == 1  transInvalInfo-parent == NULL);
+
 	if (isCommit)
 	{
-		/* Must be at top of stack */
-		Assert(transInvalInfo != NULL  transInvalInfo-parent == NULL);
-
 		/*
 		 * Relcache init file invalidation requires processing both before and
 		 * after we send the SI messages.  However, we need not do anything
@@ -926,17 +933,16 @@ AtEOXact_Inval(bool isCommit)
 		if (transInvalInfo-RelcacheInitFileInval)
 			RelationCacheInitFilePostInvalidate();
 	}
-	else if (transInvalInfo != NULL)
+	else
 	{
-		/* Must be at top of stack */
-		Assert(transInvalInfo-parent == NULL);
-
 		ProcessInvalidationMessages(transInvalInfo-PriorCmdInvalidMsgs,
 	

Re: [HACKERS] Last Commitfest patches waiting review

2014-10-09 Thread Heikki Linnakangas

On 10/09/2014 09:59 PM, Peter Geoghegan wrote:

My concern is that if we only get it committed in the last commitfest,
we may run out of time to make sortsupport work for B-Tree index
builds. That's where the sortsupport for text stuff will be really
useful.


B-tree index build uses tuplesort.c. What's missing?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench throttling latency limit

2014-10-09 Thread Heikki Linnakangas

On 10/05/2014 10:43 AM, Fabien COELHO wrote:


Hello Heikki,


Here are new patches, again the first one is just refactoring, and the second
one contains this feature. I'm planning to commit the first one shortly, and
the second one later after people have had a chance to look at it.


I looked at it. It looks ok, but for a few spurious spacing changes here
and there. No big deal.

I tested it, everything I tested behaved as expected, so it is ok for me.


Thanks!

I committed the refactoring patch earlier, and just went through the 
second patch again. I wordsmithed the documentation and comments, and 
fixed the documentation on the log format. I also fixed the logging of 
skipped transactions so that the schedule lag is reported correctly for 
them.


One thing bothers me with the log format. Here's an example:


 0 81 4621 0 1412881037 912698 3005
 0 82 6173 0 1412881037 914578 4304
 0 83 skipped 0 1412881037 914578 5217
 0 83 skipped 0 1412881037 914578 5099
 0 83 4722 0 1412881037 916203 3108
 0 84 4142 0 1412881037 918023 2333
 0 85 2465 0 1412881037 919759 740


Note how the transaction counter is not incremented for skipped 
transactions. That's understandable, since we're not including skipped 
transactions in the number of transactions executed, but it means that 
the skipped transactions don't have a unique ID. That's annoying.


Here's a new version of the patch. I'll sleep over it before committing, 
but I think it's fine now, except maybe for the unique ID thing.


- Heikki

From 79dc4fba57e4c8a52a074ab302aee72eb78ce6fe Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Thu, 9 Oct 2014 22:04:27 +0300
Subject: [PATCH 1/1] Add --latency-limit option to pgbench.

This allows transactions that take longer than specified limit to be counted
separately. With --rate, transactions that are already late by the time we
get to execute them are skipped altogether. Using --latency-limit with
--rate allows you to catch up more quickly, if there's a hickup in the
server causing a lot of transactions to stall momentarily.

Fabien COELHO, reviewed by Rukh Meski and heavily refactored by me.
---
 contrib/pgbench/pgbench.c | 231 +++---
 doc/src/sgml/pgbench.sgml |  70 +++---
 2 files changed, 233 insertions(+), 68 deletions(-)

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e9431ee..5506346 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -141,6 +141,14 @@ double		sample_rate = 0.0;
 int64		throttle_delay = 0;
 
 /*
+ * Transactions which take longer than this limit (in usec) are counted as
+ * late, and reported as such, although they are completed anyway. When
+ * throttling is enabled, execution time slots that are more than this late
+ * are skipped altogether, and counted separately.
+ */
+int64		latency_limit = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -238,6 +246,8 @@ typedef struct
 	int64		throttle_trigger;		/* previous/next throttling (us) */
 	int64		throttle_lag;	/* total transaction lag behind throttling */
 	int64		throttle_lag_max;		/* max transaction lag */
+	int64		throttle_latency_skipped; /* lagging transactions skipped */
+	int64		latency_late;	/* late transactions */
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -250,6 +260,8 @@ typedef struct
 	int64		sqlats;
 	int64		throttle_lag;
 	int64		throttle_lag_max;
+	int64		throttle_latency_skipped;
+	int64		latency_late;
 } TResult;
 
 /*
@@ -284,6 +296,8 @@ typedef struct
 
 	long		start_time;		/* when does the interval start */
 	int			cnt;			/* number of transactions */
+	int			skipped;		/* number of transactions skipped under
+ * --rate and --latency-limit */
 
 	double		min_latency;	/* min/max latencies */
 	double		max_latency;
@@ -348,7 +362,7 @@ static void setalarm(int seconds);
 static void *threadRun(void *arg);
 
 static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now,
-	  AggVals *agg);
+	  AggVals *agg, bool skipped);
 
 static void
 usage(void)
@@ -375,6 +389,8 @@ usage(void)
 		   -f, --file=FILENAME  read transaction script from FILENAME\n
 		 -j, --jobs=NUM   number of threads (default: 1)\n
 		 -l, --logwrite transaction times to log file\n
+		 -L, --latency-limit=NUM  count transactions lasting more than NUM ms\n
+		  as late.\n
 		 -M, --protocol=simple|extended|prepared\n
 		  protocol for submitting queries (default: simple)\n
 		 -n, --no-vacuum  do not run VACUUM before tests\n
@@ -994,7 +1010,9 @@ void
 agg_vals_init(AggVals *aggs, instr_time start)
 {
 	/* basic counters */
-	aggs-cnt = 0;/* number of transactions */
+	aggs-cnt = 0;/* number of transactions (includes skipped) */
+	aggs-skipped = 0;			/* xacts skipped under --rate --latency-limit */
+
 	aggs-sum_latency = 0;		/* SUM(latency) */
 	

Re: [HACKERS] Last Commitfest patches waiting review

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 12:14 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 B-tree index build uses tuplesort.c. What's missing?

I don't think that all that much is missing. Tuplesort expects to work
with an index scankey when sorting B-Tree tuples. There needs to be
something like a reverse lookup of the sortsupport function. It looks
like a historical oversight, that would take time to fix, but wouldn't
be particularly challenging. You'd need to pick out the operators from
the scankey, so you'd have something like what tuplesort_begin_heap()
starts off with with tuplesort_begin_index_btree().

copytup_index() would then later need to be modifed to make
abbreviation occur there too, but that's no big deal.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-09 Thread Andres Freund
On 2014-10-09 16:01:55 +0200, Andres Freund wrote:
 On 2014-10-09 18:17:09 +0530, Amit Kapila wrote:
  On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote:
  
   On another point, I think it would be a good idea to rebase the
   bgreclaimer patch over what I committed, so that we have a
   clean patch against master to test with.
  
  Please find the rebased patch attached with this mail.  I have taken
  some performance data as well and done some analysis based on
  the same.
  
  Performance Data
  
  IBM POWER-8 24 cores, 192 hardware threads
  RAM = 492GB
  max_connections =300
  Database Locale =C
  checkpoint_segments=256
  checkpoint_timeout=15min
  shared_buffers=8GB
  scale factor = 5000
  Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
  Duration of each individual run = 5mins
 
 I don't think OLTP really is the best test case for this. Especially not
 pgbench with relatilvely small rows *and* a uniform distribution of
 access.
 
 Try parallel COPY TO. Batch write loads is where I've seen this hurt
 badly.

As an example. The attached scripts go from:
progress: 5.3 s, 20.9 tps, lat 368.917 ms stddev 49.655
progress: 10.1 s, 21.0 tps, lat 380.326 ms stddev 64.525
progress: 15.1 s, 14.1 tps, lat 568.108 ms stddev 226.040
progress: 20.4 s, 12.0 tps, lat 634.557 ms stddev 300.519
progress: 25.2 s, 17.5 tps, lat 461.738 ms stddev 136.257
progress: 30.2 s, 9.8 tps, lat 850.766 ms stddev 305.454
progress: 35.3 s, 12.2 tps, lat 670.473 ms stddev 271.075
progress: 40.2 s, 7.9 tps, lat 972.617 ms stddev 313.152
progress: 45.3 s, 14.9 tps, lat 546.056 ms stddev 211.987
progress: 50.2 s, 13.2 tps, lat 610.608 ms stddev 271.780
progress: 55.5 s, 16.9 tps, lat 468.757 ms stddev 156.516
progress: 60.5 s, 14.3 tps, lat 548.913 ms stddev 190.414
progress: 65.7 s, 9.3 tps, lat 821.293 ms stddev 353.665
progress: 70.1 s, 16.0 tps, lat 524.240 ms stddev 174.903
progress: 75.2 s, 17.0 tps, lat 485.692 ms stddev 194.273
progress: 80.2 s, 19.9 tps, lat 396.295 ms stddev 78.891
progress: 85.3 s, 18.3 tps, lat 423.744 ms stddev 105.798
progress: 90.1 s, 14.5 tps, lat 577.373 ms stddev 270.914
progress: 95.3 s, 12.0 tps, lat 649.434 ms stddev 247.001
progress: 100.3 s, 14.6 tps, lat 563.693 ms stddev 275.236
tps = 14.81 (including connections establishing)

to:
progress: 5.1 s, 18.9 tps, lat 409.766 ms stddev 75.032
progress: 10.3 s, 20.2 tps, lat 396.781 ms stddev 67.593
progress: 15.1 s, 19.1 tps, lat 418.545 ms stddev 109.431
progress: 20.3 s, 20.6 tps, lat 388.606 ms stddev 74.259
progress: 25.1 s, 19.5 tps, lat 406.591 ms stddev 109.050
progress: 30.0 s, 19.1 tps, lat 420.199 ms stddev 157.005
progress: 35.0 s, 18.4 tps, lat 421.102 ms stddev 124.019
progress: 40.3 s, 12.3 tps, lat 640.640 ms stddev 88.409
progress: 45.2 s, 12.8 tps, lat 586.471 ms stddev 145.543
progress: 50.5 s, 6.9 tps, lat 1116.603 ms stddev 285.479
progress: 56.2 s, 6.3 tps, lat 1349.055 ms stddev 381.095
progress: 60.6 s, 7.9 tps, lat 1083.745 ms stddev 452.386
progress: 65.0 s, 9.6 tps, lat 805.981 ms stddev 273.845
progress: 71.1 s, 9.6 tps, lat 798.273 ms stddev 184.108
progress: 75.2 s, 9.3 tps, lat 950.131 ms stddev 150.870
progress: 80.8 s, 8.6 tps, lat 899.389 ms stddev 135.090
progress: 85.3 s, 8.8 tps, lat 928.183 ms stddev 152.056
progress: 90.9 s, 8.0 tps, lat 929.737 ms stddev 71.155
progress: 95.7 s, 9.0 tps, lat 968.070 ms stddev 127.824
progress: 100.3 s, 8.7 tps, lat 911.767 ms stddev 130.697

just by switching shared_buffers from 1 to 8GB. I haven't tried, but I
hope that with an approach like your's this might become better.

psql -f /tmp/prepare.sql
pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
CREATE OR REPLACE FUNCTION exec(text) returns text language plpgsql volatile
  AS $f$
BEGIN
  EXECUTE $1;
  RETURN $1;
END;
$f$;
\o /dev/null
SELECT exec('drop table if exists largedata_'||g.i||'; create unlogged table 
largedata_'||g.i||'(data bytea, id serial primary key);') FROM 
generate_series(0, 64) g(i);
\o
COPY largedata_:client_id(data) FROM '/tmp/large' BINARY;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Andres Freund
On 2014-10-09 15:01:19 -0400, Robert Haas wrote:
 On Thu, Oct 9, 2014 at 8:20 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-10-09 08:18:18 -0400, Robert Haas wrote:
  On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   Interesting - in my local profile AtStart_Inval() is more pronounced
   than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
   AtStart_Inval() out of readonly queries ontop of your patch. Together
   that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
   tbl WHER pkey = xxx' testcase.
 
  Whoa.  Now that's clearly significant.
 
  Well, my guess it'll be far less noticeable in less trivial
  workloads. But it does seem worthwile.
 
  You didn't attach the patch; was that inadvertent, or was it too ugly
  for that?
 
  Far, far too ugly ;). I just removed the AtStart() call from xact.c and
  sprinkled it around relevant places instead ;)
 
 OK, here's an attempt at a real patch for that.  I haven't perf-tested this.

Neato. With a really trivial SELECT:

before:
tps = 28150.794776 (excluding connections establishing)
after:
tps = 29978.767703 (excluding connections establishing)

slightly more meaningful:

before:
tps = 21272.400039 (including connections establishing)
after:
tps = 22290.703482 (excluding connections establishing)

So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...

I've not really looked at the patches though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-09 Thread MauMau

From: Craig Ringer cr...@2ndquadrant.com

It'd be interesting and useful to run this test on a debug build of
PostgreSQL, i.e. one compiled against the debug version of the C library
and with full debuginfo not just minimal .pdb.


Although I'm not sure the user can do this now, I'll ask him anyway.


How were the stacks captured - what tool?


According to his mail, Windbg or userdump.exe.  I'll ask him about this.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last Commitfest patches waiting review

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 12:51 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Oh, I didn't realize we don't do that already! I'm surprised, I would've
 expected index build to have been the first thing we'd use the SortSupport
 stuff in.

The thing is that the most compelling numbers for sortsupport (plus
the related improvements to tuplesort itself) were from the onlyKey
optimization - the numbers are only so-so when you look at
multi-attribute sorts, because we specialize qsort() for both of those
two cases (i.e. one specialization, qsort_ssup(), only looks at
datum1, while qsort_tuple() looks at everything else, through which
comparetup_heap() and comparetup_datum() are called). But with the
B-Tree comparator, we're only ever going to be able to use
qsort_tuple() which is roughly equivalent to the so-so multi-attribute
case for heap tuples (because we need to detect duplicate violations,
and a few other things - no choice there).

It kind of makes sense that we didn't push ourselves to get B-Tree
support until now. But with sortsupport for text accelerating sorts by
perhaps as much as 10 times in sympathetic (though realistic) cases,
it would be crazy to have that without B-Tree support (abbreviated
keys always force us to use the qsort_tuple() specialization, because
the comparator tie-breaker logic must live in places like
comparetup_heap()).

 Yeah, that seems worth doing, independently of the this patch.

As I mentioned, that might be less true than you'd think.

 Can you write
 a separate patch to use SortSupport for B-tree index builds, please?
 Eliminating the FunctionCallInfoData overhead should shave off some some
 cycles from every index build.

I'll look into it. Hopefully an effort to actually implement it will
show that I was right, and there isn't much to it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-09 Thread MauMau

From: Andres Freund and...@2ndquadrant.com

What precisely do you mean with Intel64? 64bit x86 or Itanium?


64-bit x86, i.e. x86-64.



Also, what's the precise workload? Can you reproduce the problem?


IIUC, each client inserts 1000 records into one table, then repeats updating 
all those records.  I'll ask him again.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-10-09 Thread Kevin Grittner
Tomas Vondra t...@fuzzy.cz wrote:
 On 9.10.2014 16:55, Kevin Grittner wrote:

 I've tried various other tests using \timing rather than EXPLAIN, and
 the patched version looks even better in those cases. I have seen up
 to 4x the performance for a query using the patched version, higher
 variability in run time without the patch, and have yet to devise a
 benchmark where the patched version came out slower (although I admit
 to not being as good at creating such cases as some here).

 Nice. Thanks for the testing!

 The only case I've been able to come up with is when the hash table fits
 into work_mem only thanks to not counting the buckets. The new code will
 start batching in this case.

Hmm.  If you look at the timings in my posting from 2014-10-01 I
had cases where the patched version started with one batch and went
to two, while the unpatched used just one batch, and the patched
version was still more than twice as fast.  I'm sure the on disk
batch was fully cached, however; it might work out differently if
disk speed actually came into the picture.

 That is mostly luck, however, because it depends on the cardinality
 estimates, and the best way to fix it is increasing work_mem (which is
 safer thanks to reducing the overhead).

 Also, Robert proposed a way to mitigate this, if we realize we'd have to
 do batching during the initial sizing, we can peek whether reducing the
 number of buckets (to 1/2 or maybe 1/4) would help. I believe this is a
 good approach, and will look into that after pgconf.eu (i.e. early
 November), unless someone else is interested.

Sure, but it would be good to confirm that it's actually needed first.

 When given a generous work_mem setting the patched version often uses
 more of what it is allowed than the unpatched version (which is
 probably one of the reasons it tends to do better). If someone has
 set a high work_mem and has gotten by only because the configured
 amount is not all being used when it would benefit performance, they
 may need to adjust work_mem down to avoid memory problems. That
 doesn't seem to me to be a reason to reject the patch.

 I'm not entirely sure I understand this paragraph. What do you mean by
 configured amount is not all being used when it would benefit
 performance? Can you give an example?

Again, the earlier post showed that while the unpatched used 3516kB
whether it had work_mem set to 4MB or 1GB, the patched version used
3073kB when work_mem was set to 4MB and 4540kB when work_mem was
set to 1GB.  The extra memory allowed the patched version to stay
at 1 batch, improving performance over the other setting.

 The only thing I can think of is the batching behavior described above.

 This is in Waiting on Author status only because I never got an
 answer about why the debug code used printf() rather the elog() at
 a DEBUG level.  Other than that, I would say this patch is Ready
 for Committer.  Tomas?  You there?

 I think I responded to that on October 2, quoting:

 ===
 On 2.10.2014 09:50, Tomas Vondra wrote:
 On 2.10.2014, 2:20, Kevin Grittner wrote:

 The patch applied and built without problem, and pass `make
 check-world`. The only thing that caught my eye was the addition of
 debug code using printf() instead of logging at a DEBUG level. Is
 there any reason for that?

 Not really. IIRC the main reason it that the other code in
 nodeHash.c uses the same approach.
===

Ah, I never received that email.  That tends to happen every now
and then.  :-(

 I believe it's safe to switch the logging to elog(). IMHO the printf
 logging is there from some very early version of the code, before elog
 was introduced. Or something like that.

I guess it might be best to remain consistent in this patch and
change that in a separate patch.  I just wanted to make sure you
didn't see any reason not to do so.

With that addressed I will move this to Ready for Committer.  Since
both Heikki and Robert spent time on this patch earlier, I'll give
either of them a shot at committing it if they want; otherwise I'll
do it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade, locale and encoding

2014-10-09 Thread Bruce Momjian
On Tue, Oct  7, 2014 at 03:52:24PM +0300, Heikki Linnakangas wrote:
 While looking at bug #11431, I noticed that pg_upgrade still seems
 to think that encoding and locale are cluster-wide properties. We
 got per-database locale support in 8.4, and encoding has been
 per-database much longer than that.
 
 pg_upgrade checks the encoding and locale of template0 in both
 clusters, and throws an error if they don't match. But it doesn't
 check the locale or encoding of postgres or template1 databases.
 That leads to problems if e.g. the postgres database was dropped and
 recreated with a different encoding or locale in the old cluster. We
 will merrily upgrade it, but strings in the database will be
 incorrectly encoded.

Wow, I never thought someone would do that, but they certainly could ---
good catch.

 I propose the attached patch, for git master. It's more complicated
 in back-branches, as they still support upgrading from pre-8.4
 clusters. We haven't heard any complaints from the field on this, so
 I don't think it's worth trying to back-patch this.

Agreed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Bruce Momjian
On Thu, Oct  9, 2014 at 02:34:17PM +0400, Alexey Bashtanov wrote:
 Hello!
 
 Autovacuum daemon performs vacuum when the number of rows
 updated/deleted (n_dead_tuples) reaches some threshold.
 Similarly it performs analyze when the number of rows changed in any
 way (incl. inserted).
 When a table is mostly insert-only, its visibility map is not
 updated as vacuum threshold is almost never reached, but analyze
 does not update visibility map.
 
 Why could it be a bad idea to run vacuum after some number of any
 changes including inserts, like analyze?
 Or at least make it tunable by user (add a second bunch of paramters
 to control second vacuum threshold, disabled by default)?

I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

  http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Robert Haas
On Thu, Oct 9, 2014 at 3:53 PM, Andres Freund and...@2ndquadrant.com wrote:
 OK, here's an attempt at a real patch for that.  I haven't perf-tested this.

 Neato. With a really trivial SELECT:

 before:
 tps = 28150.794776 (excluding connections establishing)
 after:
 tps = 29978.767703 (excluding connections establishing)

 slightly more meaningful:

 before:
 tps = 21272.400039 (including connections establishing)
 after:
 tps = 22290.703482 (excluding connections establishing)

 So that's a noticeable win. Obviously it's going to be less for more
 complicated stuff, but still...

 I've not really looked at the patches though.

Yeah, not bad at all for the amount of work involved.  Want to
disclose the actual queries?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alvaro Herrera
Bruce Momjian wrote:

 I agree this is a serious problem.  We have discussed various options,
 but have not decided on anything.  The TODO list has:
 
   https://wiki.postgresql.org/wiki/Todo
 
   Improve setting of visibility map bits for read-only and insert-only
   workloads
   
 http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.  (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-09 Thread Andres Freund
On 2014-10-09 17:02:02 -0400, Robert Haas wrote:
 On Thu, Oct 9, 2014 at 3:53 PM, Andres Freund and...@2ndquadrant.com wrote:
  OK, here's an attempt at a real patch for that.  I haven't perf-tested 
  this.
 
  Neato. With a really trivial SELECT:
 
  before:
  tps = 28150.794776 (excluding connections establishing)
  after:
  tps = 29978.767703 (excluding connections establishing)
 
  slightly more meaningful:
 
  before:
  tps = 21272.400039 (including connections establishing)
  after:
  tps = 22290.703482 (excluding connections establishing)
 
  So that's a noticeable win. Obviously it's going to be less for more
  complicated stuff, but still...
 
  I've not really looked at the patches though.
 
 Yeah, not bad at all for the amount of work involved.  Want to
 disclose the actual queries?

SELECT 1;
SELECT * FROM smalltable WHERE id = xxx;

The latter is something quite frequent in the real world, so it's not
all academic...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Andres Freund
On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  I agree this is a serious problem.  We have discussed various options,
  but have not decided on anything.  The TODO list has:
  
  https://wiki.postgresql.org/wiki/Todo
  
  Improve setting of visibility map bits for read-only and insert-only
  workloads
  
http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
 
 I hate to repeat myself, but I think autovacuum could be modified to run
 actions other than vacuum and analyze.  In this specific case we could
 be running a table scan that checks only pages that don't have the
 all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Bruce Momjian wrote:

 I agree this is a serious problem.  We have discussed various options,
 but have not decided on anything.  The TODO list has:

 https://wiki.postgresql.org/wiki/Todo

 Improve setting of visibility map bits for read-only and insert-only
 workloads

   http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

 I hate to repeat myself, but I think autovacuum could be modified to run
 actions other than vacuum and analyze.  In this specific case we could
 be running a table scan that checks only pages that don't have the
 all-visible bit set, and see if it can be set.  (Of course, this idea
 needs refinement to avoid running over and over when the bit cannot be
 set on some pages for whatever reason.)

Wouldn't we get substantially the same thing just by counting tuple
inserts toward the autovacuum vacuum threshold?  I mean, it unless
the table is due for wraparound prevention autovacuum, it will only
visit pages that don't have the all-visible bit set, right?  And
how much work would that do beyond what you're describing if none
of the tuples are dead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
  Bruce Momjian wrote:
  
   I agree this is a serious problem.  We have discussed various options,
   but have not decided on anything.  The TODO list has:
   
 https://wiki.postgresql.org/wiki/Todo
   
 Improve setting of visibility map bits for read-only and insert-only
 workloads
 
   http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
  
  I hate to repeat myself, but I think autovacuum could be modified to run
  actions other than vacuum and analyze.  In this specific case we could
  be running a table scan that checks only pages that don't have the
  all-visible bit set, and see if it can be set.
 
 Isn't that *precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.  I'm thinking in something
very lightweight.  Otherwise, why don't we just reduce the
vacuum_scale_factor default to something very small, so that vacuum is
triggered more often?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alvaro Herrera
Kevin Grittner wrote:

 Wouldn't we get substantially the same thing just by counting tuple
 inserts toward the autovacuum vacuum threshold?  I mean, it unless
 the table is due for wraparound prevention autovacuum, it will only
 visit pages that don't have the all-visible bit set, right?  And
 how much work would that do beyond what you're describing if none
 of the tuples are dead?

The problem is precisely what happens if there are some dead tuples, but
not enough to reach the 20% threshold: this vacuum now has to scan the
table twice and has to clean up indexes also.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] schema-only -n option in pg_restore fails

2014-10-09 Thread Josh Berkus
All,

Crossing this over to -hackers because it's stopped being a bug and is
now a TODO item.  See below.

For those not on pgsql-bugs, I've quoted the full bug report below my
proposal.

On 10/09/2014 12:36 PM, Josh Berkus wrote:
 Summary: pg_restore -n attempts to restore objects to pg_catalog schema
 Versions Tested: 9.3.5, 9.3.0, 9.2.4

Explored this some with Andrew offlist.  Turns out this is going to be a
PITA to fix, so it should go on the big pile of TODOs for when we
overhaul search_path.

Here's what's happening under the hood, pg_restore generates this SQL text:

SET search_path = schem_a, pg_catalog;
CREATE TABLE tab_a (
 test text
);

Since schem_a doesn't exist, it's skipped over and pg_restore attempts
to create the objects in pg_catalog.  So this is Yet Another Issue
caused by the ten meter tall tar baby which is search_path.

So, my proposal for a resolution:

1) In current versions, patch the docs to explicitly say that -n does
not create the schema, and that if the user doesn't create the schema
pg_restore will fail.

2) Patch 9.5's pg_restore to do CREATE SCHEMA IF NOT EXISTS when -n is
used.  This will be 100% backwards-compatible with current behavior.

Discuss?

Original bug report follows.


On 10/09/2014 12:36 PM, Josh Berkus wrote: Summary: pg_restore -n
attempts to restore objects to pg_catalog schema
 Versions Tested: 9.3.5, 9.3.0, 9.2.4
 Severity: Failure
 Description:

 The -n option (or --schema) for pg_restore is supposed to allow you to
 restore a single schema from a custom-format pg_dump file.  Instead, it
 attempts to restore that schema's objects to the pg_catalog schema
 instead.  See the test case below.

 What's happening here is that the user is apparently expected to create
 the schema manually before doing a -n pg_restore.  However, that's not
 what the documentation says, and additionally doesn't make any sense if
 we're not giving the user the ability to restore to an alternate schema
 name (and so far we aren't).  If the schema does not already exist,
 pg_restore attempts to restore to the pg_catalog schema instead, which
 fails.

 In other words, pg_restore -n is just broken.  Clearly few people use
 it or we'd have a bug on it before now.

 What should happen is that pg_restore -n should create the schema if it
 doesn't already exist.  If for some reason you think that pg_restore
 shouldn't create the schema (which would be user-hostile, but at least
 consistent), then this should fail cleanly with a schema does not
 exist error message instead of trying to restore to pg_catalog.

 Test Case:

 1. createdb schtest;
 2. createdb schrestore;
 3. psql schtest

 4. create schema schem_a;
 create table schem_a.tab_a ( test text );
 create schema schem_b;
 create table schem_b.tab_b ( test text );
 create schema schem_c;
 create table schem_c.tab_c ( test text );

 5. pg_dump -Fc -f /tmp/schmtest.dump schtest
 6. pg_restore -Fc -n schem_a -d schrestore /tmp/schmtest.dump
 7.

 pg_restore: [archiver (db)] Error while PROCESSING TOC:
 pg_restore: [archiver (db)] Error from TOC entry 171; 1259 1191591 TABLE
 tab_a josh
 pg_restore: [archiver (db)] could not execute query: ERROR:  permission
 denied to create pg_catalog.tab_a
 DETAIL:  System catalog modifications are currently disallowed.
 Command was: CREATE TABLE tab_a (
 test text
 );

 pg_restore: [archiver (db)] could not execute query: ERROR:  schema
 schem_a does not exist
 Command was: ALTER TABLE schem_a.tab_a OWNER TO josh;

 pg_restore: [archiver (db)] Error from TOC entry 2194; 0 1191591 TABLE
 DATA tab_a josh
 pg_restore: [archiver (db)] could not execute query: ERROR:  relation
 tab_a does not exist
 Command was: COPY tab_a (test) FROM stdin;





-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Andres Freund
On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
   Bruce Momjian wrote:
   
I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and 
insert-only
workloads

  
http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
   
   I hate to repeat myself, but I think autovacuum could be modified to run
   actions other than vacuum and analyze.  In this specific case we could
   be running a table scan that checks only pages that don't have the
   all-visible bit set, and see if it can be set.
  
  Isn't that *precisely* what a plain vacuum run does?
 
 Well, it also scans for dead tuples, removes them, and needs to go
 through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I *do* think there's some
optimizations we could make in general.

 I'm thinking in something
 very lightweight.  Otherwise, why don't we just reduce the
 vacuum_scale_factor default to something very small, so that vacuum is
 triggered more often?

The problem here is that that doesn't trigger for inserts. Just for
updates/deletes or rollbacks.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Jim Nasby

On 10/8/14, 8:35 AM, Andres Freund wrote:

+#define EXCLUSIVE_LOCK (((uint32) 1)  (31 - 1))
+
+/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
+#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK)


There should at least be a comment where we define MAX_BACKENDS about the 
relationship here... or better yet, validate that MAX_BACKENDS  
SHARED_LOCK_MASK during postmaster startup. (For those that think that's too 
pedantic, I'll argue that it's no worse than the patch verifying that MyProc != 
NULL in LWLockQueueSelf()).



+/*
+ * Internal function that tries to atomically acquire the lwlock in the passed
+ * in mode.
+ *
+ * This function will not block waiting for a lock to become free - that's the
+ * callers job.
+ *
+ * Returns true if the lock isn't free and we need to wait.
+ *
+ * When acquiring shared locks it's possible that we disturb an exclusive
+ * waiter. If that's a problem for the specific user, pass in a valid pointer
+ * for 'potentially_spurious'. Its value will be set to true if we possibly
+ * did so. The caller then has to handle that scenario.
+ */
+static bool
+LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious)


We should invert the return of this function. Current code returns true if the 
lock is actually acquired (see below), and I think that's true of other locking 
code as well. IMHO it makes more sense that way, plus consistency is good.

(From 9.3)
 * LWLockConditionalAcquire - acquire a lightweight lock in the specified mode
 *
 * If the lock is not available, return FALSE with no side-effects.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Andres Freund
On 2014-10-09 16:52:46 -0500, Jim Nasby wrote:
 On 10/8/14, 8:35 AM, Andres Freund wrote:
 +#define EXCLUSIVE_LOCK (((uint32) 1)  (31 - 1))
 +
 +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
 +#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK)
 
 There should at least be a comment where we define MAX_BACKENDS about the 
 relationship here... or better yet, validate that MAX_BACKENDS  
 SHARED_LOCK_MASK during postmaster startup. (For those that think that's too 
 pedantic, I'll argue that it's no worse than the patch verifying that MyProc 
 != NULL in LWLockQueueSelf()).

If you modify either, you better grep for them... I don't think that's
going to happen anyway. Requiring it during startup would mean exposing
SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly
stick a StaticAssert() someplace in lwlock.c.

And no, it's not comparable at all to MyProc != NULL - the lwlock code
initially *does* run when MyProc isn't setup. We just better not
conflict against any other lockers at that stage.

 +/*
 + * Internal function that tries to atomically acquire the lwlock in the 
 passed
 + * in mode.
 + *
 + * This function will not block waiting for a lock to become free - that's 
 the
 + * callers job.
 + *
 + * Returns true if the lock isn't free and we need to wait.
 + *
 + * When acquiring shared locks it's possible that we disturb an exclusive
 + * waiter. If that's a problem for the specific user, pass in a valid 
 pointer
 + * for 'potentially_spurious'. Its value will be set to true if we possibly
 + * did so. The caller then has to handle that scenario.
 + */
 +static bool
 +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious)
 
 We should invert the return of this function. Current code returns
 true if the lock is actually acquired (see below), and I think that's
 true of other locking code as well. IMHO it makes more sense that way,
 plus consistency is good.

I don't think so. I've wondered about it as well, but the way the
function is used its more consistent imo if it returns whether we must
wait. Note that it's not an exported function.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Build (definition?) errors - in bootstrap

2014-10-09 Thread Lou Picciano
Having just git pulled from orgin/master:

$ ./configure
$ make

Mileage:

...
make -C bootstrap all
make[3]: Entering directory `/path/to/postgresql/src/backend/bootstrap'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -I. 
-I. -I../../../src/include   -c -o bootparse.o bootparse.c
bootparse.y: In function ‘boot_yyparse’:
bootparse.y:311:9: warning: passing argument 1 of ‘DefineIndex’ makes integer 
from pointer without a cast [enabled by default]
In file included from bootparse.y:37:0:
../../../src/include/commands/defrem.h:24:12: note: expected ‘Oid’ but argument 
is of type ‘struct IndexStmt *’
bootparse.y:311:9: warning: passing argument 2 of ‘DefineIndex’ makes pointer 
from integer without a cast [enabled by default]
In file included from bootparse.y:37:0:
../../../src/include/commands/defrem.h:24:12: note: expected ‘struct IndexStmt 
*’ but argument is of type ‘Oid’
bootparse.y:311:9: error: too few arguments to function ‘DefineIndex’
In file included from bootparse.y:37:0:
../../../src/include/commands/defrem.h:24:12: note: declared here
bootparse.y:346:9: warning: passing argument 1 of ‘DefineIndex’ makes integer 
from pointer without a cast [enabled by default]
In file included from bootparse.y:37:0:
../../../src/include/commands/defrem.h:24:12: note: expected ‘Oid’ but argument 
is of type ‘struct IndexStmt *’
bootparse.y:346:9: warning: passing argument 2 of ‘DefineIndex’ makes pointer 
from integer without a cast [enabled by default]
In file included from bootparse.y:37:0:
../../../src/include/commands/defrem.h:24:12: note: expected ‘struct IndexStmt 
*’ but argument is of type ‘Oid’
bootparse.y:346:9: error: too few arguments to function ‘DefineIndex’
In file included from bootparse.y:37:0:
../../../src/include/commands/defrem.h:24:12: note: declared here
make[3]: *** [bootparse.o] Error 1
make[3]: Leaving directory `/path/to/postgresql/src/backend/bootstrap'
make[2]: *** [bootstrap-recursive] Error 2
make[2]: Leaving directory `/path/to/postgresql/src/backend'
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory `/path/to/postgresql/src'
make: *** [all-src-recurse] Error 2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c

2014-10-09 Thread Peter Geoghegan
I found a reference made obsolete by commit 9e85183b, which is from
way back in 2000.

comparetup_index_btree() says:

/*
* This is similar to _bt_tuplecompare(), but we have already done the
* index_getattr calls for the first column, and we need to keep track of
* whether any null fields are present.  Also see the special treatment
* for equal keys at the end.
*/

I think that this comment should simply indicate that the routine is
similar to comparetup_heap(), except that it takes care of the special
tie-break stuff for B-Tree sorts, as well as enforcing uniqueness
during unique index builds. It should not reference _bt_compare() at
all (which is arguably the successor to _bt_tuplecompare()), since
_bt_compare() is concerned with a bunch of stuff highly specific to
the B-Tree implementation (e.g. having a hard-wired return value for
comparisons involving the first data item on an internal page).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Build (definition?) errors - in bootstrap

2014-10-09 Thread Tom Lane
Lou Picciano loupicci...@comcast.net writes:
 Having just git pulled from orgin/master:
 $ ./configure
 $ make
 [ fails ]

The buildfarm doesn't seem unhappy, so I doubt there's anything wrong
with the code as such.  Try make clean or even make distclean and
rebuild.  Also, if your computer's clock is or was badly off, you may
have file timestamp skews breaking things ... in which case you might
need make maintainer-clean.  If it's still broken after that, you'd
be best advised to fix the clock and do a complete fresh git clone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Jim Nasby

On 10/9/14, 4:57 PM, Andres Freund wrote:

If you modify either, you better grep for them... I don't think that's
going to happen anyway. Requiring it during startup would mean exposing
SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly
stick a StaticAssert() someplace in lwlock.c.


Ahh, yeah, exposing it would be ugly.

I just get the heeby-jeebies when I see assumptions like this though. I fear 
there's a bunch of cases where changing something will break a completely 
unrelated part of the system with no warning.

Maybe add an assert() to check it?


And no, it's not comparable at all to MyProc != NULL - the lwlock code
initially*does*  run when MyProc isn't setup. We just better not
conflict against any other lockers at that stage.


Ahh, can you maybe add that detail to the comment? That wasn't clear to me.


 +/*
 + * Internal function that tries to atomically acquire the lwlock in the 
passed
 + * in mode.
 + *
 + * This function will not block waiting for a lock to become free - that's 
the
 + * callers job.
 + *
 + * Returns true if the lock isn't free and we need to wait.
 + *
 + * When acquiring shared locks it's possible that we disturb an exclusive
 + * waiter. If that's a problem for the specific user, pass in a valid 
pointer
 + * for 'potentially_spurious'. Its value will be set to true if we possibly
 + * did so. The caller then has to handle that scenario.
 + */
 +static bool
 +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious)


We should invert the return of this function. Current code returns
true if the lock is actually acquired (see below), and I think that's
true of other locking code as well. IMHO it makes more sense that way,
plus consistency is good.

I don't think so. I've wondered about it as well, but the way the
function is used its more consistent imo if it returns whether we must
wait. Note that it's not an exported function.


ISTM that a function attempting a lock would return success, not failure. Even 
though it's internal now it could certainly be made external at some point in 
the future. But I suppose it's ultimately a matter of preference...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Jim Nasby

On 10/9/14, 4:03 PM, Alvaro Herrera wrote:

Bruce Momjian wrote:


I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

  http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us


I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.  (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)


If we go down that road we should also think about having it proactively set 
hint bits...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Build (definition?) errors - in bootstrap

2014-10-09 Thread Michael Paquier
On Fri, Oct 10, 2014 at 8:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lou Picciano loupicci...@comcast.net writes:
 Having just git pulled from orgin/master:
 $ ./configure
 $ make
 [ fails ]

 The buildfarm doesn't seem unhappy, so I doubt there's anything wrong
 with the code as such.  Try make clean or even make distclean and
 rebuild.  Also, if your computer's clock is or was badly off, you may
 have file timestamp skews breaking things ... in which case you might
 need make maintainer-clean.  If it's still broken after that, you'd
 be best advised to fix the clock and do a complete fresh git clone.
Something more violent can as well be done:
git clean -dxf
This ensures that there are no other files than the ones of your git
repository, making your repository back to a fresh state.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-09 Thread Jim Nasby

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com  wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that turned 
out to be an update could be rolled back... but the performance impact of going 
that route might be pretty horrid. :( There's also the potential to get stuck 
in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a 
BEFORE UPDATE trigger turns it into an INSERT.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-09 Thread Gavin Flower

On 10/10/14 12:38, Jim Nasby wrote:

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:
On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com  
wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that 
turned out to be an update could be rolled back... but the performance 
impact of going that route might be pretty horrid. :( There's also the 
potential to get stuck in a loop where a BEFORE INSERT trigger turns 
the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into an 
INSERT.

Perhaps you need an UPSERT trigger?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last Commitfest patches waiting review

2014-10-09 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 1:13 PM, Peter Geoghegan p...@heroku.com wrote:
 Can you write
 a separate patch to use SortSupport for B-tree index builds, please?
 Eliminating the FunctionCallInfoData overhead should shave off some some
 cycles from every index build.

 I'll look into it. Hopefully an effort to actually implement it will
 show that I was right, and there isn't much to it.

I was able to get this working pretty quickly. All regression tests
pass. I'm not going to post a patch just yet, because I still need to
make the cluster case work, and I'll probably want to refactor my
approach to performing catalog lookups a bit, but the important point
is that my original suspicion that this isn't very difficult or
invasive has been confirmed.

I should have something to post before too long.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] schema-only -n option in pg_restore fails

2014-10-09 Thread Erik Rijkers
On Thu, October 9, 2014 23:19, Josh Berkus wrote:
 All,

[dump/restore -n bug]


Perhaps this (from five years ago) can be fixed too (esp. if only a doc-fix):

  
http://www.postgresql.org/message-id/4833.156.83.1.81.1240955642.squir...@webmail.xs4all.nl

It's not the same problem but also a failure in pick-and-choose restoring.

This stuff has been broken for a long time - I got used to it...


thanks,

Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-09 Thread Amit Kapila
On Wed, Oct 8, 2014 at 7:05 PM, Andres Freund and...@2ndquadrant.com
wrote:

 Hi,

 Attached you can find the next version of my LW_SHARED patchset. Now
 that atomics are committed, it seems like a good idea to also add their
 raison d'être.

 Since the last public version I have:
 * Addressed lots of Amit's comments. Thanks!
 * Peformed a fair amount of testing.
 * Rebased the code. The volatile removal made that not entirely
   trivial...
 * Significantly cleaned up and simplified the code.
 * Updated comments and such
 * Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case)

 The feature currently consists out of two patches:
 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and
verbose. This also does:
 * changes the logic in LWLockRelease() to release all shared lockers
   when waking up any. This can yield some significant performance
   improvements - and the fairness isn't really much worse than
   before,
   as we always allowed new shared lockers to jump the queue.

 * adds a memory pg_write_barrier() in the wakeup paths between
   dequeuing and unsetting -lwWaiting. That was always required on
   weakly ordered machines, but f4077cda2 made it more urgent. I can
   reproduce crashes without it.
 2) Implement the wait free LW_SHARED algorithm.


I have done few performance tests for above patches and results of
same is as below:

Performance Data
--
IBM POWER-7 16 cores, 64 hardware threads
RAM = 64GB
max_connections =210
Database Locale =C
checkpoint_segments=256
checkpoint_timeout=35min
shared_buffers=8GB
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 5mins
Test type - read only pgbench with -M prepared
Other Related information about test
a. This is the data for median of 3 runs, the detailed data of individual
run
is attached with mail.
b. I have applied both the patches to take performance data.

Scale Factor - 100

   Patch_ver/Client_count 1 8 16 32 64 128  HEAD 13344 106921 196629 295123
377846 333928  PATCH 13662 106179 203960 298955 452638 465671

Scale Factor - 3000

   Patch_ver/Client_count 8 16 32 64 128 160  HEAD 86920 152417 231668
280827 257093 255122  PATCH 87552 160313 230677 276186 248609 244372


Observations
--
a. The patch performs really well (increase upto ~40%) incase all the
data fits in shared buffers (scale factor -100).
b. Incase data doesn't fit in shared buffers, but fits in RAM
(scale factor -3000), there is performance increase upto 16 client count,
however after that it starts dipping (in above config unto ~4.4%).

The above data shows that the patch improves performance for cases
when there is shared LWLock contention, however there is a slight
performance dip in case of Exclusive LWLocks (at scale factor 3000,
it needs exclusive LWLocks for buf mapping tables).  Now I am not
sure if this is the worst case dip or under similar configurations the
performance dip can be higher, because the trend shows that dip is
increasing with more client counts.

Brief Analysis of code w.r.t performance dip
-
Extra Instructions w.r.t Head in Acquire Exclusive lock path
a. Attempt lock twice
b. atomic operations for nwaiters in LWLockQueueSelf() and
LWLockAcquireCommon()
c. Now we need to take spinlock twice, once for self queuing and then
again for setting releaseOK.
d. few function calls and some extra checks

Similarly there seems to be few additional instructions in
LWLockRelease() path.

Now probably these shouldn't matter much in case backend needs to
wait for other Exclusive locker, but I am not sure what else could be
the reason for dip in case we need to have Exclusive LWLocks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


perf_lwlock_contention_data_v1.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-09 Thread Craig Ringer
On 10/10/2014 04:16 AM, MauMau wrote:
 From: Craig Ringer cr...@2ndquadrant.com
 It'd be interesting and useful to run this test on a debug build of
 PostgreSQL, i.e. one compiled against the debug version of the C library
 and with full debuginfo not just minimal .pdb.
 
 Although I'm not sure the user can do this now, I'll ask him anyway.

It sounds like they've produced a test case, so they should be able to
with a bit of luck.

Or even better, send you the test case.

 How were the stacks captured - what tool?
 
 According to his mail, Windbg or userdump.exe.  I'll ask him about this.

Thanks. The stack trace looks fairly sane, i.e. there's nothing
obviously out of whack at a glance, but I tend to get more informative
traces from Visual Studio debugging sessions.

Your next step here really needs to be to make this reproducible against
a debug build. Then see if reverting the xlog scalability work actually
changes the behaviour, given that you hypothesised that it could be
involved.

As I said off-list, if you can narrow the test case down to something
that can be reproduced more quickly, you could also git-bisect to seek
the commit at fault. Even if the test case takes an hour, that's still
viable:

$ git bisect start
$ git bisect bad
$ git bisect good REL9_3_RC1
Bisecting: a merge base must be tested
[e472b921406407794bab911c64655b8b82375196] Avoid deadlocks during
insertion into SP-GiST indexes.
$ git bisect good
Bisecting: 1026 revisions left to test after this (roughly 10 steps)
...


Thanks to the magic of binary search.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers