Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier michael.paqu...@gmail.com wrote: The additional process at promotion sounds like a good idea, I'll try to get a patch done tomorrow. This would result as well in removing the XLogArchiveForceDone stuff. Either way, not that I have been able to reproduce the problem manually, things can be clearly solved. Please find attached two patches aimed to fix this issue and to improve the situation: - 0001 prevents the apparition of those phantom WAL segment file by ensuring that when a node is in recovery it will remove it whatever its status in archive_status. This patch is the real fix, and should be applied down to 9.2. - 0002 is a patch implementing Heikki's idea of enforcing all the segment files present in pg_xlog to have their status to .done, marking them for removal. When looking at the code, I finally concluded that Fujii-san's point, about marking in all cases as .done segment files that have been fully streamed, actually makes more sense to not be backward. This patch would actually not be mandatory for back-patching, but it makes the process more robust IMO. I imagine that it would be nice to get those things fixed before the next minor release. Regards, -- Michael From 18c2d47b1d5dd3c0439f990ee4da6b305d477ca4 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Thu, 9 Oct 2014 15:04:26 +0900 Subject: [PATCH 1/2] Fix apparition of archive status files of .ready on streaming standbys Commit 1bd42cd has removed a check based on the recovery status of a node when removing old WAL segment files in pg_xlog, causing the apparition of .ready files that prevented the removal of some WAL segment files that remained stuck in the archive folder. Note that this does not prevent the abscence of some .done files as it may still be possible that some segments cannot be marked correctly as complete after their stream is done in the case for example of an abrupt disconnection between a standby and its root node, but it ensures that when a node is in recovery old WAL segment files are removed whatever their status in the folder archive_status. Per report from Jehan-Guillaume de Rorthais. --- src/backend/access/transam/xlog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5a4dbb9..39701a3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3771,7 +3771,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) strspn(xlde-d_name, 0123456789ABCDEF) == 24 strcmp(xlde-d_name + 8, lastoff + 8) = 0) { - if (XLogArchiveCheckDone(xlde-d_name)) + if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name)) { snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name); -- 2.1.2 From 493a9d05e9386403fc1e6d78df19dd8a59c53cae Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Thu, 9 Oct 2014 15:09:44 +0900 Subject: [PATCH 2/2] Enforce all WAL segment files to be marked as .done at node promotion This is a safety mechanism to ensure that there are no files that are not considered as .done as some segments may have been missed particularly in the case of partially written files or files being written when a disconnection occurred between a streaming standby and its root node. This makes the node reaching promotion having a state consistent with what is expected using the assumption that all the WAL segment files that are done being streamed should be always considered as archived by the node. --- src/backend/access/transam/xlog.c| 10 src/backend/access/transam/xlogarchive.c | 39 +++- src/include/access/xlog_internal.h | 1 + 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 39701a3..af5f548 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6862,6 +6862,16 @@ StartupXLOG(void) } /* + * Create a .done entry for each WAL file present in pg_xlog that has + * not been yet marked as such. In some cases where for example a streaming + * replica has had a connection to a remote node cut abruptly, such WAL + * files may have been only partially written or even not flagged correctly + * with .done. + */ + if (InRecovery) + XLogArchiveForceDoneAll(); + + /* * Kill WAL receiver, if it's still running, before we continue to write * the startup checkpoint record. It will trump over the checkpoint and * subsequent records if it's still alive when we start writing WAL. diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 047efa2..931106f 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -554,6 +554,40 @@ XLogArchiveNotifySeg(XLogSegNo segno) } /* + * XLogArchiveForceDoneAll + * + *
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Oct 8, 2014 at 10:49 PM, Simon Riggs si...@2ndquadrant.com wrote: Do you really expect me to do major work on some aspect of the syntax like that, given, as you say, that nobody explicitly agreed with me (and only you disagreed with me)? The only remark I heard on that was from you (you'd prefer to use NEW.* and OLD.*). But you spent much more time talking about getting something MERGE-like, which NEW.*/OLD.* clearly isn't. Yes, it is. Look at the AS clause. You can alias each of the two tables being joined. But I only have one table, and no join. When you referred to NEW.* and OLD.*, you clearly were making a comparison with trigger WHEN clauses, and not MERGE (which is a comparison I made myself, although for more technical reasons). It hardly matters, though. CONFLICTING() is very close (identical?) to MySQL's use of ON DUPLICATE KEY UPDATE val = VALUES(val). I'm happy to discuss that, but it's news to me that people take particular issue with it. 3 people have asked you questions or commented about the use of CONFLICTING() while I've been watching. Lots of people have asked me lots of questions. Again, as I said, I wasn't aware that CONFLICTING() was a particular point of contention. Please be more specific. It's clearly a non-standard mechanism and not inline with other Postgres usage. What would be inline with other Postgres usage? I don't think you've been clear on what you think is a better alternative. I felt a function-like expression was appropriate because the user refers to different tuples of the target table. It isn't like a join. Plus it's similar to the MySQL thing, but doesn't misuse VALUES() as a function-like thing. If there is disagreement, publishing the summary of changes you plan to make in your next version will help highlight that. I think I've done a pretty good job of collecting and collating the opinions of others, fwiw. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 10/09/2014 07:47 AM, furu...@pm.nttdata.co.jp wrote: If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? I'm sorry, I didn't understand that. Here is an example. When WAL is sent at 100ms intervals, fsync() is executed 10 times per second. If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of making WAL record) for 1 second, though, fsync() won't be executed several times in 1 second. I think --fsync-interval meets the demands of people who wants to restrain fsync() happens for several time in short period, but what do you think? Is it ok to delete --fsync-interval ? I still don't see the problem. In synchronous mode, pg_receivexlog should have similar logic as walreceiver does. It should read as much WAL from the socket as it can without blocking, and fsync() and send reply after that. And also fsync whenever switching to new segment. If the master sends WAL every 100ms, then pg_recevexlog will indeed fsync() 10 times per second. There's nothing wrong with that. In asynchronous mode, only fsync whenever switching to new segment. Yeah. Or rather, add a new message type, to indicate the synchronous/asynchronous status. What kind of 'message type' we have to add ? Do we need to separate 'w' into two types ? synchronous and asynchronous ? OR Add a new message type, kind of 'notify synchronous', and inform pg_receivexlog of synchronous status when it connect to the server. Better to add a new notify message type. And pg_recevexlog should be prepared to receive it at any time. The status might change on the fly, if the server's configuration is reloaded. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 10/09/2014 01:23 AM, Gavin Flower wrote: On 09/10/14 10:13, Andres Freund wrote: If we're switching to a saner computation, we should imo also switch to a better polynom - CRC-32C has better error detection capabilities than CRC32 and is available in hardware. As we're paying the price pf breaking compat anyway... Arguably we could also say that given that there's been little evident problems with the borked computation we could also switch to a much faster hash instead of continuing to use crc... Could a 64 bit variant of some kind be useful as an option - in addition to a correct 32 bit? More bits allows you to detect more errors. That's the only advantage. I've never heard that being a problem, so no, I don't think that's a good idea. As most people have 64 bit processors and storage is less constrained now-a-days, as well as we tend to store much larger chunks of data. That's irrelevant to the CRC in the WAL. Each WAL record is CRC'd separately, and they tend to be very small (less than 8k typically) regardless of how large chunks of data you store in the database. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 9 October 2014 07:27, Peter Geoghegan p...@heroku.com wrote: Please be more specific. Do not use CONFLICTING() which looks like it is a function. Instead, use a row qualifier, such as NEW, OLD etc to reference values from the incoming data e.g. CONFLICTING.value rather than CONFLICTING(value) Do not use the word CONFLICTING since it isn't clear whether you are referring to the row in the table or the value in the incoming data. I suggest the use of two separately named row qualifiers to allow us to use either of those when desired. I don't have suggestions as to what you should call those qualifiers, though Postgres already uses NEW and OLD in similar circumstances in triggers. (This has nothing at all to do with the MERGE command in the SQL standard, so please don't mention that here.) You may also wish to support the AS keyword, as MERGE does to make the above even more clear. e.g. SET col = EXISTING.col + NEW.col Thank you. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. I think we should add a check in walreceiver, to throw an error if the master sends an invalid WAL pointer, pointing to an FF segment. Attached is a patch for that. This would be needed for PG 9.3 if applied. Regards, -- Michael diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 753316e..4f4d019 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -789,6 +789,24 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime) walrcv-lastMsgReceiptTime = lastMsgReceiptTime; SpinLockRelease(walrcv-mutex); + /* + * Check that the WAL position received from sender is valid and does + * refer to a segment file whose name finishes by FF. + */ + if ((walEnd.xlogid 0xff) == 0xff) + { + char xlogfilename[MAXFNAMELEN]; + uint32 xlogseg; + uint32 xlogid; + + XLByteToPrevSeg(walEnd, xlogid, xlogseg); + XLogFileName(xlogfilename, ThisTimeLineID, xlogid, xlogseg); + ereport(ERROR, +(errcode(ERRCODE_PROTOCOL_VIOLATION), + errmsg_internal(invalid WAL position %X/%X referring to segment %s, + walEnd.xlogid, walEnd.xrecoff, xlogfilename))); + } + if (log_min_messages = DEBUG2) { char *sendtime; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 9, 2014 at 12:38 AM, Simon Riggs si...@2ndquadrant.com wrote: Do not use CONFLICTING() which looks like it is a function. So is ROW(). Or COALESCE(). Instead, use a row qualifier, such as NEW, OLD etc to reference values from the incoming data e.g. CONFLICTING.value rather than CONFLICTING(value) Do not use the word CONFLICTING since it isn't clear whether you are referring to the row in the table or the value in the incoming data. If you don't have a word that you think would more clearly indicate the intent of the expression, I'm happy to hear suggestions from others. You may also wish to support the AS keyword, as MERGE does to make the above even more clear. e.g. SET col = EXISTING.col + NEW.col That's less clear, IMV. EXISTING.col is col - the very same Var. So why qualify that it's the existing value in one place but not the other? In fact, you can't do that now with updates in general: postgres=# update upsert u set u.val = 'foo'; ERROR: 42703: column u of relation upsert does not exist LINE 1: update upsert u set u.val = 'foo'; ^ LOCATION: transformUpdateStmt, analyze.c:2068 This does work, which is kind of what you outline: postgres=# update upsert u set val = u.val; UPDATE 3 But MERGE accepts the former in other systems (in general, and for MERGE), where Postgres won't (for UPDATEs in general). Parse analysis of UPDATE targetlists just rejects this outright. FWIW, is any of the two tuples reference here NEW, in any sense? Informally, I'd say the new value is the resulting row - the final row value after the UPDATE. We want to refer to the existing row, and the row proposed for insertion (with all before trigger effects carried forward). Having the column reference go through an alias like this might be tricky. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 9, 2014 at 11:11 AM, Peter Geoghegan p...@heroku.com wrote: On Thu, Oct 9, 2014 at 12:38 AM, Simon Riggs si...@2ndquadrant.com wrote: Do not use CONFLICTING() which looks like it is a function. So is ROW(). Or COALESCE(). ROW and COALESCE behave almost like functions: they operate on any expression or value you pass to them. db=# select coalesce('bar'); coalesce -- bar Not so with CONFLICTING(), it only accepts a column name -- not a value -- and has knowledge of the surrounding statement that ordinary function-like constructs don't. db=# INSERT into evt_type (name) values ('foo') on conflict UPDATE set name=conflicting('bar'); ERROR: syntax error at or near 'bar' LINE 1: ...lues ('foo') on conflict UPDATE set name=conflicting('bar'); If you don't have a word that you think would more clearly indicate the intent of the expression, I'm happy to hear suggestions from others. I also like NEW due to similarity with triggers, but I see your concern about it not actually being new. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 9 October 2014 09:11, Peter Geoghegan p...@heroku.com wrote: You may also wish to support the AS keyword, as MERGE does to make the above even more clear. e.g. SET col = EXISTING.col + NEW.col That's less clear, IMV. EXISTING.col is col - the very same Var. So why qualify that it's the existing value in one place but not the other? In fact, you can't do that now with updates in general: postgres=# update upsert u set u.val = 'foo'; ERROR: 42703: column u of relation upsert does not exist LINE 1: update upsert u set u.val = 'foo'; ^ LOCATION: transformUpdateStmt, analyze.c:2068 YES, which is exactly why I did not say this, I said something different. This does work, which is kind of what you outline: postgres=# update upsert u set val = u.val; UPDATE 3 YES, which is why I said it. But MERGE accepts the former in other systems (in general, and for MERGE), where Postgres won't (for UPDATEs in general). Parse analysis of UPDATE targetlists just rejects this outright. FWIW, is any of the two tuples reference here NEW, in any sense? Informally, I'd say the new value is the resulting row - the final row value after the UPDATE. We want to refer to the existing row, and the row proposed for insertion (with all before trigger effects carried forward). YES, which is why I specifically requested the ability to reference the incoming data. Common sense interpretations make for quicker and easier discussions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 9, 2014 at 1:33 AM, Marti Raudsepp ma...@juffo.org wrote: ROW and COALESCE behave almost like functions: they operate on any expression or value you pass to them. Okay, then like CONFLICTING() is like many of the XML expressions. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 9, 2014 at 1:41 AM, Simon Riggs si...@2ndquadrant.com wrote: YES, which is why I specifically requested the ability to reference the incoming data. My point is that people are not really inclined to use an alias in UPDATEs in general when referring to the target. The thing that seems special (and worthy of special qualification) is the reference to what you call the incoming data, and what I've called tuples proposed for insertion (after being affected by any before row triggers). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 9, 2014 at 1:56 AM, Peter Geoghegan p...@heroku.com wrote: My point is that people are not really inclined to use an alias in UPDATEs in general when referring to the target. The thing that seems special (and worthy of special qualification) is the reference to what you call the incoming data, and what I've called tuples proposed for insertion (after being affected by any before row triggers). For simple cases, you might not even bother with CONFLICTING() - you might find it easier to just repeat the constant in the INSERT and UPDATE parts of the query. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On 2014-10-08 13:52:14 -0400, Robert Haas wrote: On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. Even SELECT 1 is going to be doing *far* more pallocs than that to get through raw parsing, parse analysis, planning, and execution startup. If you can find a few hundred pallocs we can avoid in trivial queries, it would get interesting; but I'll be astonished if saving 4 is measurable. I got nerd-sniped by this problem today, probably after staring at the profiling data very similar to what led Andres to ask the question in the first place. The gains are indeed not measurable on a macrobenchmark, but I had to write the patch to figure that out, so here it is. Although the gain isn't a measurable percentage of total runtime, it does appear to be a significant percentage of the palloc traffic on trivial queries. I did 30-second SELECT-only pgbench runs at scale factor 10, using prepared queries. According to perf, on unpatched master, StartTransactionCommand accounts for 11.46% of the calls to AllocSetAlloc. (I imagine this is by runtime, not by call count, but it probably doesn't matter much either way.) But with this patch, StartTransactionCommand's share drops to 4.43%. Most of that is coming from AtStart_Inval(), which wouldn't be hard to fix either. Interesting - in my local profile AtStart_Inval() is more pronounced than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked AtStart_Inval() out of readonly queries ontop of your patch. Together that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM tbl WHER pkey = xxx' testcase. I'm inclined to think that tamping down palloc traffic is worthwhile even if we can't directly measure the savings on a macrobenchmark. AllocSetAlloc is often at or near the top of profiling results, but there's rarely a single caller responsible for enough of those calls for to make it worth optimizing. But that means that if we refuse to take patches that save just a few pallocs, we're basically giving up on ever improving anything in this area, and that doesn't seem like a good idea either; it's only going to get slowly worse over time as we add more features. I think it depends a bit on the callsites. If its somewhere that nobody will ever care, because it's a slowpath, then we shouldn't care either. But that's not the case here, so I do think that makes sense. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? I'm sorry, I didn't understand that. Here is an example. When WAL is sent at 100ms intervals, fsync() is executed 10 times per second. If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of making WAL record) for 1 second, though, fsync() won't be executed several times in 1 second. I think --fsync-interval meets the demands of people who wants to restrain fsync() happens for several time in short period, but what do you think? Is it ok to delete --fsync-interval ? I still don't see the problem. In synchronous mode, pg_receivexlog should have similar logic as walreceiver does. It should read as much WAL from the socket as it can without blocking, and fsync() and send reply after that. And also fsync whenever switching to new segment. If the master sends WAL every 100ms, then pg_recevexlog will indeed fsync() 10 times per second. There's nothing wrong with that. In asynchronous mode, only fsync whenever switching to new segment. Yeah. Or rather, add a new message type, to indicate the synchronous/asynchronous status. What kind of 'message type' we have to add ? Do we need to separate 'w' into two types ? synchronous and asynchronous ? OR Add a new message type, kind of 'notify synchronous', and inform pg_receivexlog of synchronous status when it connect to the server. Better to add a new notify message type. And pg_recevexlog should be prepared to receive it at any time. The status might change on the fly, if the server's configuration is reloaded. Thanks for the reply. In synchronous mode, pg_receivexlog should have similar logic as walreceiver does. OK. I understand that removing --fsync-interval has no problem. Better to add a new notify message type. And pg_recevexlog should be prepared to receive it at any time. The status might change on the fly, if the server's configuration is reloaded. OK. I'll consider it. Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Hello! Autovacuum daemon performs vacuum when the number of rows updated/deleted (n_dead_tuples) reaches some threshold. Similarly it performs analyze when the number of rows changed in any way (incl. inserted). When a table is mostly insert-only, its visibility map is not updated as vacuum threshold is almost never reached, but analyze does not update visibility map. Why could it be a bad idea to run vacuum after some number of any changes including inserts, like analyze? Or at least make it tunable by user (add a second bunch of paramters to control second vacuum threshold, disabled by default)? Best regards, Alexey Bashtanov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On 2014-10-09 14:06:35 +0900, Kyotaro HORIGUCHI wrote: Hello, simplly inhibit set retry flag when ProcDiePending in my_sock_write seems enough. But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so I modified the patch 4 as the attached patch. Why is that necessary? It seems really rather wrong to make BIO_set_retry_write() dependant on ProcDiePending? Especially as, at least in my testing, it's not even required because the be_tls_write() can just check the error properly? diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 6fc6903..2288fe2 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size) BIO_clear_retry_flags(h); if (res = 0) { - if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN) + if (!ProcDiePending + (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)) { BIO_set_retry_write(h); } Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote: Interesting - in my local profile AtStart_Inval() is more pronounced than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked AtStart_Inval() out of readonly queries ontop of your patch. Together that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM tbl WHER pkey = xxx' testcase. Whoa. Now that's clearly significant. You didn't attach the patch; was that inadvertent, or was it too ugly for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On 2014-10-09 08:18:18 -0400, Robert Haas wrote: On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote: Interesting - in my local profile AtStart_Inval() is more pronounced than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked AtStart_Inval() out of readonly queries ontop of your patch. Together that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM tbl WHER pkey = xxx' testcase. Whoa. Now that's clearly significant. Well, my guess it'll be far less noticeable in less trivial workloads. But it does seem worthwile. You didn't attach the patch; was that inadvertent, or was it too ugly for that? Far, far too ugly ;). I just removed the AtStart() call from xact.c and sprinkled it around relevant places instead ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
On Wed, Oct 8, 2014 at 1:52 PM, Michael Banck michael.ba...@credativ.de wrote: Looking at it from a DBA perspective, this would indeed be better, yes. However, I see a few issues with that: 1. If you are using an init script (or another wrapper around pg_ctl), you don't get any of its output it seems. 2. Having taken a quick look at pg_ctl, it seems to just kill the postmaster on shutdown and wait for its PID file to disappear. I don't see how it should figure out that PostgreSQL is waiting for a checkpoint to be finished? Or do both. I suspect elog( INFO, ... ) might do that. That would imply that pg_ctl receives and writes out log messages directed at clients, which I don't think is true? Even if it was, client_min_messages does not include an INFO level, and LOG is not being logged to clients by default. So the first common level above the default of both client_min_messages and log_min_messages would be WARNING, which seems excessive to me. As I said, I only took a quick look at pg_ctl though, so I might well be missing something. I think you're spot-on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
On 2014-10-02 15:21:48 +0200, Michael Banck wrote: Hi, we have seen repeatedly that users can be confused about why PostgreSQL is not shutting down even though they requested it. Usually, this is because `log_checkpoints' is not enabled and the final checkpoint is being written, delaying shutdown. As no message besides shutting down is written to the server log in this case, we even had users believing the server was hanging and pondering killing it manually. In order to alert those users that a checkpoint is being written, I propose to add a log message waiting for checkpoint ... on shutdown, even if log_checkpoints is disabled, as this particular checkpoint might be important information. I've attached a trivial patch for this, should it be added to the next commitfest? How about flipping the default for log_checkpoints instead? There really isn't a good reason for having it disabled by default. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
From: Simon Riggs si...@2ndquadrant.com I hope we can get pgAudit in as a module for 9.5. I also hope that it will stimulate the requirements/funding of further work in this area, rather than squash it. My feeling is we have more examples of feature sets that grow over time (replication, view handling, hstore/JSONB etc) than we have examples of things languishing in need of attention (partitioning). I'm hoping PostgreSQL will have an audit trail feature. I'd like to support your work by reviewing and testing, although I'm not sure I can fully understand the code soon. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
* Andres Freund (and...@2ndquadrant.com) wrote: On 2014-10-02 15:21:48 +0200, Michael Banck wrote: I've attached a trivial patch for this, should it be added to the next commitfest? How about flipping the default for log_checkpoints instead? There really isn't a good reason for having it disabled by default. Yeah, I agree with this- it's extremely useful information and it's really not that verbose in general.. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
Hello, One user reported a hang problem with 9.4 beta2 on Windows. The PostgreSQL is 64-bit version. I couldn't find the cause, but want to solve the problem. Could you help with this? I heard that the user had run 16 concurrent psql sessions which executes INSERT and UPDATE statements, which is a write-intensive stress test. He encountered the hang phenomenon twice, one of which occured several hours after the start of the test, and the other occured about an hour after the test launch. The user gave me the stack traces, which I attached at the end of this mail. The problem appears to be related to the xlog insert scaling. But I can't figure out where the root cause lies --- WAL slot handling, spinlock on Windows, or PGSemaphoreLock/UnLock on Windows? The place I suspect is S_UNLOCK(). It doesn't use any memory barrier. Is this correct on Intel64 processors? #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) The rest of this mail is the stack trace: `0043e0a8 7ff8`213d12ee : `0002 `0002 `0001 ` : ntdll!ZwWaitForMultipleObjects+0xa `0043e0b0 0001`401de68e : ` 7ff5`e000 ` `04fb6b40 : KERNELBASE!WaitForMultipleObjectsEx+0xe1 `0043e390 0001`4023cf11 : `02a55500 `1c117410 80605042`36ad2501 0001`405546e0 : postgres!PGSemaphoreLock+0x6e [d:\pginstaller.auto\postgres.windows-x64\src\backend\port\win32_sema.c @ 145] `0043e3e0 0001`4006203b : `f9017d56 `0022 ` `0400 : postgres!LWLockAcquireCommon+0x121 [d:\pginstaller.auto\postgres.windows-x64\src\backend\storage\lmgr\lwlock.c @ 625] `0043e430 0001`4002c182 : `0005 ` `004e2f00 ` : postgres!XLogInsert+0x62b [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\transam\xlog.c @ 1110] `0043e700 0001`400323b6 : ` ` `0a63 `0289de10 : postgres!log_heap_clean+0x102 [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 6561] `0043e7e0 0001`400320e8 : `040ec5c0 `0a63 `0043f340 `040ec5c0 : postgres!heap_page_prune+0x2a6 [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\pruneheap.c @ 261] `0043f2f0 0001`4002dc40 : `0057ea30 ` ` `028d1810 : postgres!heap_page_prune_opt+0x148 [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\pruneheap.c @ 150] `0043f340 0001`4002e7da : `028d1800 `0d26 `0005 `0057ea30 : postgres!heapgetpage+0xa0 [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 355] `0043f3e0 0001`4002802c : ` ` ` ` : postgres!heapgettup_pagemode+0x40a [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 944] `0043f460 0001`40126507 : ` `001d ` `001d : postgres!heap_getnext+0x1c [d:\pginstaller.auto\postgres.windows-x64\src\backend\access\heap\heapam.c @ 1478] `0043f490 0001`401137f5 : `028d05b0 `028d06c0 ` `028a3d30 : postgres!SeqNext+0x27 [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\nodeseqscan.c @ 76] `0043f4c0 0001`4010c7b2 : `0058dba0 `028d05b0 ` ` : postgres!ExecScan+0xd5 [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execscan.c @ 167] `0043f520 0001`4012448d : `028d02e0 `028d02d8 `028d02e0 `00585a00 : postgres!ExecProcNode+0xd2 [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execprocnode.c @ 400] `0043f550 0001`4010c772 : `00587bc0 `028d0110 ` `028d0258 : postgres!ExecModifyTable+0x10d [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\nodemodifytable.c @ 926] `0043f610 0001`4010bb6d : `028d0110 `00587bc0 ` `0056c740 : postgres!ExecProcNode+0x92 [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execprocnode.c @ 377] `0043f640 0001`401099d8 : `00570ff0 `0051e400 `028d0110 `005831f0 : postgres!ExecutePlan+0x5d [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execmain.c @ 1481] `0043f680 0001`4024f813 : `00570ff0 `0051e468 `0051c530 `005831f0 : postgres!standard_ExecutorRun+0xa8 [d:\pginstaller.auto\postgres.windows-x64\src\backend\executor\execmain.c @ 319] `0043f6f0 0001`4024ff5a :
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
Stephen Frost sfr...@snowman.net writes: * Andres Freund (and...@2ndquadrant.com) wrote: How about flipping the default for log_checkpoints instead? There really isn't a good reason for having it disabled by default. Yeah, I agree with this- it's extremely useful information and it's really not that verbose in general.. -1. Every time we've turned on default logging of routine events, there's been pushback and it was eventually turned off again as log spam. I guarantee you that logging checkpoints will be seen as log spam, except by people who are actually having trouble with that behavior, which is a small minority. I'm not really convinced that there's a problem here that needs fixing at all, but certainly putting log_checkpoints on by default is not an acceptable fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-09 18:17:09 +0530, Amit Kapila wrote: On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote: On another point, I think it would be a good idea to rebase the bgreclaimer patch over what I committed, so that we have a clean patch against master to test with. Please find the rebased patch attached with this mail. I have taken some performance data as well and done some analysis based on the same. Performance Data IBM POWER-8 24 cores, 192 hardware threads RAM = 492GB max_connections =300 Database Locale =C checkpoint_segments=256 checkpoint_timeout=15min shared_buffers=8GB scale factor = 5000 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins I don't think OLTP really is the best test case for this. Especially not pgbench with relatilvely small rows *and* a uniform distribution of access. Try parallel COPY TO. Batch write loads is where I've seen this hurt badly. patch_ver/client_count 1 8 32 64 128 256 HEAD 18884 118628 251093 216294 186625 177505 PATCH 18743 122578 247243 205521 179712 175031 So, pretty much no benefits on any scale, right? Here we can see that the performance dips at higher client count(=32) which was quite surprising for me, as I was expecting it to improve, because bgreclaimer reduces the contention by making buffers available on free list. So I tried to analyze the situation by using perf and found that in above configuration, there is a contention around freelist spinlock with HEAD and the same is removed by Patch, but still the performance goes down with Patch. On further analysis, I observed that actually after Patch there is an increase in contention around ProcArrayLock (shared LWlock) via GetSnapshotData which sounds bit odd, but that's what I can see in profiles. Based on analysis, few ideas which I would like to further investigate are: a. As there is an increase in spinlock contention, I would like to check with Andres's latest patch which reduces contention around shared lwlocks. b. Reduce some instructions added by patch in StrategyGetBuffer(), like instead of awakening bgreclaimer at low threshold, awaken when it tries to do clock sweep. Are you sure you didn't mix up the profiles here? The head vs. patched look more like profiles from different client counts than different versions of the code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Thu, Oct 9, 2014 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-09 18:17:09 +0530, Amit Kapila wrote: On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote: On another point, I think it would be a good idea to rebase the bgreclaimer patch over what I committed, so that we have a clean patch against master to test with. Please find the rebased patch attached with this mail. I have taken some performance data as well and done some analysis based on the same. Performance Data IBM POWER-8 24 cores, 192 hardware threads RAM = 492GB max_connections =300 Database Locale =C checkpoint_segments=256 checkpoint_timeout=15min shared_buffers=8GB scale factor = 5000 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins I don't think OLTP really is the best test case for this. Especially not pgbench with relatilvely small rows *and* a uniform distribution of access. Try parallel COPY TO. Batch write loads is where I've seen this hurt badly. patch_ver/client_count 1 8 32 64 128 256 HEAD 18884 118628 251093 216294 186625 177505 PATCH 18743 122578 247243 205521 179712 175031 So, pretty much no benefits on any scale, right? Almost Right, there seem to be slight benefit at client count 8, however that can be due to variation as well. Here we can see that the performance dips at higher client count(=32) which was quite surprising for me, as I was expecting it to improve, because bgreclaimer reduces the contention by making buffers available on free list. So I tried to analyze the situation by using perf and found that in above configuration, there is a contention around freelist spinlock with HEAD and the same is removed by Patch, but still the performance goes down with Patch. On further analysis, I observed that actually after Patch there is an increase in contention around ProcArrayLock (shared LWlock) via GetSnapshotData which sounds bit odd, but that's what I can see in profiles. Based on analysis, few ideas which I would like to further investigate are: a. As there is an increase in spinlock contention, I would like to check with Andres's latest patch which reduces contention around shared lwlocks. b. Reduce some instructions added by patch in StrategyGetBuffer(), like instead of awakening bgreclaimer at low threshold, awaken when it tries to do clock sweep. Are you sure you didn't mix up the profiles here? I have tried this 2 times, basically I am quite confident from myside, but human errors can't be ruled out. I have used below statements: Steps used for profiling during configure, use CFLAS=-fno-omit-frame-pointer Terminal -1 Start Server Terminal -2 ./pgbench -c 64 -j 64 -T 300 -S -M prepared postgres Terminal-3 perf record -a -g sleep 60 --This command is run after a minute or so of -- test start After test is finished - perf report -g graph,0.5,callee Do you see any problem in the way I am collecting perf reports? In any case, I can try once more if you still doubt the profiles. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations
On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: typedef struct { ! char token[TOKMAXLEN + 1]; /* now always null-terminated */ char type; ! int32 value; } datetkn; Being entirely new to this code, now makes me think of the current timestamp. I think this word can be removed to reduce ambiguity. + /* use strncmp so that we match truncated tokens */ result = strncmp(key, position-token, TOKMAXLEN); In your proposal you wanted to remove crufty code that deals with non-null-terminated token strings. Is this some of that crufty code? Can it be removed? -- Chris
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/02/2014 03:20 AM, Kevin Grittner wrote: My only concern from the benchmarks is that it seemed like there was a statistically significant increase in planning time: unpatched plan time average: 0.450 ms patched plan time average: 0.536 ms That *might* just be noise, but it seems likely to be real. For the improvement in run time, I'd put up with an extra 86us in planning time per hash join; but if there's any way to shave some of that off, all the better. The patch doesn't modify the planner at all, so it would be rather surprising if it increased planning time. I'm willing to just write that off as noise. Fair enough. I have seen much larger variations caused by how the executable code happened to line up relative to cacheline boundaries; and we've never made any effort to manage that. I've tried various other tests using \timing rather than EXPLAIN, and the patched version looks even better in those cases. I have seen up to 4x the performance for a query using the patched version, higher variability in run time without the patch, and have yet to devise a benchmark where the patched version came out slower (although I admit to not being as good at creating such cases as some here). When given a generous work_mem setting the patched version often uses more of what it is allowed than the unpatched version (which is probably one of the reasons it tends to do better). If someone has set a high work_mem and has gotten by only because the configured amount is not all being used when it would benefit performance, they may need to adjust work_mem down to avoid memory problems. That doesn't seem to me to be a reason to reject the patch. This is in Waiting on Author status only because I never got an answer about why the debug code used printf() rather the elog() at a DEBUG level. Other than that, I would say this patch is Ready for Committer. Tomas? You there? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
On 10/09/2014 09:47 PM, MauMau wrote: I heard that the user had run 16 concurrent psql sessions which executes INSERT and UPDATE statements, which is a write-intensive stress test. He encountered the hang phenomenon twice, one of which occured several hours after the start of the test, and the other occured about an hour after the test launch. It'd be interesting and useful to run this test on a debug build of PostgreSQL, i.e. one compiled against the debug version of the C library and with full debuginfo not just minimal .pdb. How were the stacks captured - what tool? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations
Chris Bandy bandy.ch...@gmail.com writes: On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: + /* use strncmp so that we match truncated tokens */ result = strncmp(key, position-token, TOKMAXLEN); In your proposal you wanted to remove crufty code that deals with non-null-terminated token strings. Is this some of that crufty code? Can it be removed? Yeah, I had hoped to simplify these things to just strcmp, but on closer inspection that didn't work, because some of the keywords in the table are truncated at TOKMAXLEN (10 characters). If we just made these strcmp then the code would stop recognizing e.g. milliseconds. I thought briefly about widening TOKMAXLEN so that there were no truncated keywords in the table, but that seems risky from a backwards-compatibility standpoint: as it stands, the code will accept milliseconds, millisecond, or millisecon, and there might possibly be applications out there that depend on that. In any case I'd much rather keep the array stride at 16 bytes for speed reasons; and who's to say we might not put in some even-longer keywords in the future? Another alternative we should maybe consider is leaving the definition of the token field alone (ie, still not guaranteed null terminated) which'd leave us with one free byte per datetkn entry. I can't think of a likely reason to need another 1-byte field though, and the existing definition is not without risk. That comment that was there about don't change this to strlcpy was there because somebody broke it awhile back, or at least submitted a patch that would've broken it if it'd been accepted. People are too used to null-terminated strings in C; a field definition that violates that norm is just trouble waiting to happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
On 2014-10-09 22:47:48 +0900, MauMau wrote: Hello, One user reported a hang problem with 9.4 beta2 on Windows. The PostgreSQL is 64-bit version. I couldn't find the cause, but want to solve the problem. Could you help with this? I heard that the user had run 16 concurrent psql sessions which executes INSERT and UPDATE statements, which is a write-intensive stress test. He encountered the hang phenomenon twice, one of which occured several hours after the start of the test, and the other occured about an hour after the test launch. The user gave me the stack traces, which I attached at the end of this mail. The problem appears to be related to the xlog insert scaling. But I can't figure out where the root cause lies --- WAL slot handling, spinlock on Windows, or PGSemaphoreLock/UnLock on Windows? The place I suspect is S_UNLOCK(). It doesn't use any memory barrier. Is this correct on Intel64 processors? What precisely do you mean with Intel64? 64bit x86 or Itanium? Also, what's the precise workload? Can you reproduce the problem? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, Thank you for review. 1) I don't think it's a good idea to put the full page write compression into struct XLogRecord. Full page write compression information can be stored in varlena struct of compressed blocks as done for toast data in pluggable compression support patch. If I understand correctly, it can be done similar to the manner in which compressed Datum is modified to contain information about compression algorithm in pluggable compression support patch. 2) You've essentially removed a lot of checks about the validity of bkp blocks in xlogreader. I don't think that's acceptable To ensure this, the raw size stored in first four byte of compressed datum can be used to perform error checking for backup blocks Currently, the error checking for size of backup blocks happens individually for each block. If backup blocks are compressed together , it can happen once for the entire set of backup blocks in a WAL record. The total raw size of compressed blocks can be checked against the total size stored in WAL record header. 3) You have both FullPageWritesStr() and full_page_writes_str(). full_page_writes_str() is true/false version of FullPageWritesStr macro. It is implemented for backward compatibility with pg_xlogdump 4)I don't like FullPageWritesIsNeeded(). For one it, at least to me, sounds grammatically wrong. More importantly when reading it I'm thinking of it being about the LSN check. How about instead directly checking whatever != FULL_PAGE_WRITES_OFF? I will modify this. 5) CompressBackupBlockPagesAlloc is declared static but not defined as such. 7) Unless I miss something CompressBackupBlock should be plural, right? ATM it compresses all the blocks? I will correct these. 6)You call CompressBackupBlockPagesAlloc() from two places. Neither is IIRC within a critical section. So you imo should remove the outOfMem handling and revert to palloc() instead of using malloc directly. Yes neither is in critical section. outOfMem handling is done in order to proceed without compression of FPW in case sufficient memory is not available for compression. Thank you, Rahila Syed -- View this message in context: http://postgresql.1045698.n5.nabble.com/Compression-of-full-page-writes-tp5769039p5822391.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Corporate and Individual Contributor License Agreements (CLAs)
Arcadiy, You may want to refer them to the license itself which will make it very easy for them to understand why the Contributor License Agreement is not required: PostgreSQL is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses. PostgreSQL Database Management System (formerly known as Postgres, then as Postgres95) Portions Copyright (c) 1996-2014, The PostgreSQL Global Development Group Portions Copyright (c) 1994, The Regents of the University of California Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
On Wed, Oct 8, 2014 at 10:52 AM, Michael Banck michael.ba...@credativ.de wrote: Hi, Am Samstag, den 04.10.2014, 15:05 -0500 schrieb Jim Nasby: On 10/4/14, 1:21 PM, Jeff Janes wrote: On Thu, Oct 2, 2014 at 6:21 AM, Michael Banck wrote: we have seen repeatedly that users can be confused about why PostgreSQL is not shutting down even though they requested it. Usually, this is because `log_checkpoints' is not enabled and the final checkpoint is being written, delaying shutdown. As no message besides shutting down is written to the server log in this case, we even had users believing the server was hanging and pondering killing it manually. Wouldn't a better place to write this message be the terminal from which pg_ctl stop was invoked, rather than the server log file? Looking at it from a DBA perspective, this would indeed be better, yes. However, I see a few issues with that: 1. If you are using an init script (or another wrapper around pg_ctl), you don't get any of its output it seems. 2. Having taken a quick look at pg_ctl, it seems to just kill the postmaster on shutdown and wait for its PID file to disappear. I don't see how it should figure out that PostgreSQL is waiting for a checkpoint to be finished? It could just print out a reminder that a checkpoint will occur, depending on what mode of shutdown was requested. I don't think this reminder has be validated by the server itself, the intention should be enough. Most people who don't know that a clean shutdown inherently involves a checkpoint probably don't monitor the server log closely, either. Of course if they use packager scripts to do the shutdown and those scripts don't pass along the message, I guess that still doesn't help. Cheers, Jeff
[HACKERS] Expose options to explain? (track_io_timing)
Salut! Fellow volunteers, I request assistance in understanding the following: When I explain a query I can get the following information: | I/O Read Time: 0.000, | I/O Write Time: 0.000 I know why it is 0. My question is this, can we expose it to explain only? Specifically explain (analyze,buffers). Is there a technical reason we must turn on track_io_timing for the whole system? If there isn't, is this something the community would want? JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
On Thu, Oct 2, 2014 at 4:21 PM, Michael Banck michael.ba...@credativ.de wrote: we have seen repeatedly that users can be confused about why PostgreSQL is not shutting down even though they requested it. Usually, this is because `log_checkpoints' is not enabled and the final checkpoint is being written, delaying shutdown. Are you convinced that that's the *actual* reason? Maybe you're mis-attributing it. In my experience shutdown delays are often caused by using the default smart shutdown mode, which is another way of saying completely stupid. It waits until all connections to the server are closed -- meaning never in common situations with connection pooling or when an admin has forgotten their psql shell open. To add insult to injury, you can't open a new connection to invoke pg_terminate_backend() to kill them, either. In the case of a restart, it can cause much longer downtime than a fast/immediate restart would. Sorry for the rant, but am I the only one hating that default? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
On 2014-10-09 09:44:09 -0400, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Andres Freund (and...@2ndquadrant.com) wrote: How about flipping the default for log_checkpoints instead? There really isn't a good reason for having it disabled by default. Yeah, I agree with this- it's extremely useful information and it's really not that verbose in general.. -1. Every time we've turned on default logging of routine events, there's been pushback and it was eventually turned off again as log spam. I guarantee you that logging checkpoints will be seen as log spam, except by people who are actually having trouble with that behavior, which is a small minority. We're talking about 2 log message per checkpoint_timeout interval here. That's pretty darn far away from log spam. Was there really any case of such low frequency message causing ire? And if it's more frequent you can be happy that you see the log message - because your config isn't appropriate for your load. The number of times I've seen people being baffled at the bad performance because of a inadequate checkpoint configuration, whose effect they couldn't see, is baffling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
* Andres Freund (and...@2ndquadrant.com) wrote: On 2014-10-09 09:44:09 -0400, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: Yeah, I agree with this- it's extremely useful information and it's really not that verbose in general.. -1. Every time we've turned on default logging of routine events, there's been pushback and it was eventually turned off again as log spam. I guarantee you that logging checkpoints will be seen as log spam, except by people who are actually having trouble with that behavior, which is a small minority. We're talking about 2 log message per checkpoint_timeout interval here. That's pretty darn far away from log spam. Was there really any case of such low frequency message causing ire? For embedded devices and similar small-scale systems, I can see Tom's point. At the same time, I would expect those to require sufficient configuration that also setting log_checkpoints to 'off' wouldn't be a huge deal. And if it's more frequent you can be happy that you see the log message - because your config isn't appropriate for your load. The number of times I've seen people being baffled at the bad performance because of a inadequate checkpoint configuration, whose effect they couldn't see, is baffling. I certainly agree with this, but then, look at our default log_line_prefix and other default log settings.. In general, our defaults are horrible and fixing this one is really just the tip of the iceburg.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] alter user set local_preload_libraries.
On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote: The attached patch simply changes the context for local_... to PGC_USERSET and edits the doc. I had this ready to commit, but then Invent PGC_SU_BACKEND and mark log_connections/log_disconnections that way. was committed in the meantime. Does this affect what we should do with this change? I guess one thing to look into would be whether we could leave local_preload_libraries as PGC_BACKEND and change session_preload_libraries to PGC_SU_BACKEND, and then investigate whether we could allow settings made with ALTER ROLE / SET to change PGC_BACKEND settings. Yeah, I was wondering about that while I was making the other commit. I did not touch those variables at the time, but it would make sense to restrict them as you suggest. +1 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what about applying the attached patch? This patch allows that and changes the context of session_preload_libraries to PGC_SU_BACKEND. Regards, -- Fujii Masao *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 6260,6266 SET XML OPTION { DOCUMENT | CONTENT }; listitem para This variable specifies one or more shared libraries that are to be ! preloaded at connection start. Only superusers can change this setting. The parameter value only takes effect at the start of the connection. Subsequent changes have no effect. If a specified library is not found, the connection attempt will fail. --- 6260,6267 listitem para This variable specifies one or more shared libraries that are to be ! preloaded at connection start. Only superusers can change this setting ! at session start. The parameter value only takes effect at the start of the connection. Subsequent changes have no effect. If a specified library is not found, the connection attempt will fail. *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 2892,2898 static struct config_string ConfigureNamesString[] = }, { ! {session_preload_libraries, PGC_SUSET, CLIENT_CONN_PRELOAD, gettext_noop(Lists shared libraries to preload into each backend.), NULL, GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_SUPERUSER_ONLY --- 2892,2898 }, { ! {session_preload_libraries, PGC_SU_BACKEND, CLIENT_CONN_PRELOAD, gettext_noop(Lists shared libraries to preload into each backend.), NULL, GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_SUPERUSER_ONLY *** *** 5756,5762 set_config_option(const char *name, const char *value, else if (context != PGC_POSTMASTER context != PGC_BACKEND context != PGC_SU_BACKEND ! source != PGC_S_CLIENT) { ereport(elevel, (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM), --- 5756,5766 else if (context != PGC_POSTMASTER context != PGC_BACKEND context != PGC_SU_BACKEND ! source != PGC_S_CLIENT ! source != PGC_S_DATABASE_USER ! source != PGC_S_USER ! source != PGC_S_DATABASE ! source != PGC_S_GLOBAL) { ereport(elevel, (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM), *** *** 8820,8827 validate_option_array_item(const char *name, const char *value, * There are three cases to consider: * * name is a known GUC variable. Check the value normally, check ! * permissions normally (ie, allow if variable is USERSET, or if it's ! * SUSET and user is superuser). * * name is not known, but exists or can be created as a placeholder (i.e., * it has a prefixed name). We allow this case if you're a superuser, --- 8824,8831 * There are three cases to consider: * * name is a known GUC variable. Check the value normally, check ! * permissions normally (ie, allow if variable is USERSET/BACKEND, ! * or if it's SUSET/SU_BACKEND and user is superuser). * * name is not known, but exists or can be created as a placeholder (i.e., * it has a prefixed name). We allow this case if you're a superuser, *** *** 8861,8877 validate_option_array_item(const char *name, const char *value, } /* manual permissions check so we can avoid an error being thrown */ ! if (gconf-context == PGC_USERSET) /* ok */ ; ! else if (gconf-context == PGC_SUSET superuser()) /* ok */ ; else if (skipIfNoPermissions) return false; /* if a permissions error should be thrown, let set_config_option do it */ ! /* test for permissions and valid option value */ (void) set_config_option(name, value, ! superuser() ? PGC_SUSET : PGC_USERSET, PGC_S_TEST,
Re: [HACKERS] replicating DROP commands across servers
On 10/6/14, 11:24 PM, Robert Haas wrote: Offlist. FWIW, I've run into situations more than once in userspace where I need a way to properly separate schema and object name. Generally I can make do using reg* casts and then hitting catalog tables, but it'd be nice if there was an easier way. Sure, although I think that's a bit of a separate problem. It's hard to iterate through a string a character at a time from the SQL level so that you can handle stuff like the quote_literal() rules. If we want people to be able to do that easily we need to provide tools to handle it. But C is actually quite well-suited to such tasks. Yeah, I wouldn't want to attempt this in SQL; I was saying that a built-in function to do this would be broadly useful, not just for replicating DROPs. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
Stephen Frost sfr...@snowman.net writes: * Andres Freund (and...@2ndquadrant.com) wrote: On 2014-10-09 09:44:09 -0400, Tom Lane wrote: -1. Every time we've turned on default logging of routine events, there's been pushback and it was eventually turned off again as log spam. We're talking about 2 log message per checkpoint_timeout interval here. That's pretty darn far away from log spam. Was there really any case of such low frequency message causing ire? For embedded devices and similar small-scale systems, I can see Tom's point. At the same time, I would expect those to require sufficient configuration that also setting log_checkpoints to 'off' wouldn't be a huge deal. Here's the problem as I see it: DBAs will be annoyed by the spam and will turn it off. Then they'll still be confused when a shutdown takes a long time. So this is no fix at all for the original complaint. I'm also not entirely convinced that checkpoints have anything to do with the complaint. Once we get a shutdown request, we're going to have to perform a checkpoint, which we do at full speed, no delays (or at least did so last I checked). Whether a checkpoint was already in progress is more or less irrelevant. It's always been like that and I can't recall anybody complaining about it. I suspect Marti is correct that the real problem is elsewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: For embedded devices and similar small-scale systems, I can see Tom's point. At the same time, I would expect those to require sufficient configuration that also setting log_checkpoints to 'off' wouldn't be a huge deal. Here's the problem as I see it: DBAs will be annoyed by the spam and will turn it off. Then they'll still be confused when a shutdown takes a long time. So this is no fix at all for the original complaint. I've not run into very many folks working with embedded devices, so take this with a grain of salt, but I have *never* run into a DBA who is running a production system who doesn't want log_checkpoints, log_connections, log_disconnections, and a much more verbose log_line_prefix (and more, really), so I don't buy into this argument at all. Our default logging is no where near what logging on a production system should be and I'd be interested to meet the DBA who disagrees with that, because they've got some requiremeents that I've not dealt with before. Basically, I believe every DBA who is using PG for more than a toy setup (or strictly development) would be pleasantly surprised to have checkpoints logged; far too many of them don't even know the option exists. I'm also not entirely convinced that checkpoints have anything to do with the complaint. Once we get a shutdown request, we're going to have to perform a checkpoint, which we do at full speed, no delays (or at least did so last I checked). Whether a checkpoint was already in progress is more or less irrelevant. It's always been like that and I can't recall anybody complaining about it. I suspect Marti is correct that the real problem is elsewhere. This is certainly an interesting question and was asked about up-thread also, I believe. I agree that if it wasn't slow to shut down due to a checkpoint then logging checkpoints isn't going to help. If the issue is that it's a 'smart' shutdown request with folks logged in, then perhaps we should consider logging *that* fact.. waiting to shut down due to user connections or some such. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] replicating DROP commands across servers
Jim Nasby wrote: On 10/6/14, 11:24 PM, Robert Haas wrote: Offlist. FWIW, I've run into situations more than once in userspace where I need a way to properly separate schema and object name. Generally I can make do using reg* casts and then hitting catalog tables, but it'd be nice if there was an easier way. Sure, although I think that's a bit of a separate problem. It's hard to iterate through a string a character at a time from the SQL level so that you can handle stuff like the quote_literal() rules. If we want people to be able to do that easily we need to provide tools to handle it. But C is actually quite well-suited to such tasks. Yeah, I wouldn't want to attempt this in SQL; I was saying that a built-in function to do this would be broadly useful, not just for replicating DROPs. Well, most of what you need is served by pg_identify_object, I think: just grab the OID from the appropriate catalog, and the OID of the catalog itself; that function will give you schema and name, which is what you need. Probably the most difficult part is figuring out which reg* cast you want .. and of course there are also cases where there is no such datatype to cast to in the first place. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expose options to explain? (track_io_timing)
On Thu, Oct 9, 2014 at 10:17 AM, Joshua D. Drake j...@commandprompt.com wrote: Salut! Fellow volunteers, I request assistance in understanding the following: When I explain a query I can get the following information: | I/O Read Time: 0.000, | I/O Write Time: 0.000 I know why it is 0. My question is this, can we expose it to explain only? Specifically explain (analyze,buffers). Is there a technical reason we must turn on track_io_timing for the whole system? If there isn't, is this something the community would want? I think the theory for track_io_timing being PGC_SUSET is that if the superuser turned it on, no one should be able to turn it off. But I don't see an argument for the other way around, that no one should be able to turn it on locally of the superuser left it at the default of off. So I think the real behavior we would want is that anyone can turn it on in their session, and can also turn it off provided it was turned on by them in the first place. But there is no machinery in the GUC code to do that, which is probably why it wasn't done. I meant to work on that for this dev cycle, but I never dug into how to implement the provided it was turned on by them in the first place part of the requirement. And how would this be expressed generically? Some notion that the default value can be a floor or ceiling which the user can alter in one direction, and reverse that alteration. PGC_SUSET_FLOOR and PGC_SUSET_CEILING? Anyway, if we are going to solve this for track_io_timing, I think it would be better so solve it in a way that it can also be used by pg_stat_statements, as well as EXPLAIN. But maybe that doesn't make sense, as you need to be a superuser to call pg_stat_statements_reset() anyway. Cheers, Jeff
Re: [HACKERS] Last Commitfest patches waiting review
On Mon, Oct 6, 2014 at 11:53 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Oct 6, 2014 at 11:27 AM, Robert Haas robertmh...@gmail.com wrote: Well, really, I was just suggesting that I can spend more time on the patch, but not immediately. We haven't really talked about the idea of the HyperLogLog-based abort mechanism - the actual cost model - even though I thought we'd have discussed that extensively by now. My concern is that if we only get it committed in the last commitfest, we may run out of time to make sortsupport work for B-Tree index builds. That's where the sortsupport for text stuff will be really useful. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On Thu, Oct 9, 2014 at 8:20 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-09 08:18:18 -0400, Robert Haas wrote: On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote: Interesting - in my local profile AtStart_Inval() is more pronounced than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked AtStart_Inval() out of readonly queries ontop of your patch. Together that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM tbl WHER pkey = xxx' testcase. Whoa. Now that's clearly significant. Well, my guess it'll be far less noticeable in less trivial workloads. But it does seem worthwile. You didn't attach the patch; was that inadvertent, or was it too ugly for that? Far, far too ugly ;). I just removed the AtStart() call from xact.c and sprinkled it around relevant places instead ;) OK, here's an attempt at a real patch for that. I haven't perf-tested this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 5b5d31b..651a5c4 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1838,7 +1838,6 @@ StartTransaction(void) * initialize other subsystems for new transaction */ AtStart_GUC(); - AtStart_Inval(); AtStart_Cache(); AfterTriggerBeginXact(); @@ -4151,7 +4150,6 @@ StartSubTransaction(void) */ AtSubStart_Memory(); AtSubStart_ResourceOwner(); - AtSubStart_Inval(); AtSubStart_Notify(); AfterTriggerBeginSubXact(); diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c index a7a768e..6b6c88e 100644 --- a/src/backend/utils/cache/inval.c +++ b/src/backend/utils/cache/inval.c @@ -693,19 +693,32 @@ AcceptInvalidationMessages(void) } /* - * AtStart_Inval - * Initialize inval lists at start of a main transaction. + * PrepareInvalidationState + * Initialize inval lists for the current (sub)transaction. */ -void -AtStart_Inval(void) +static void +PrepareInvalidationState(void) { - Assert(transInvalInfo == NULL); - transInvalInfo = (TransInvalidationInfo *) + TransInvalidationInfo *myInfo; + + if (transInvalInfo != NULL + transInvalInfo-my_level == GetCurrentTransactionNestLevel()) + return; + + myInfo = (TransInvalidationInfo *) MemoryContextAllocZero(TopTransactionContext, sizeof(TransInvalidationInfo)); - transInvalInfo-my_level = GetCurrentTransactionNestLevel(); - SharedInvalidMessagesArray = NULL; - numSharedInvalidMessagesArray = 0; + myInfo-parent = transInvalInfo; + myInfo-my_level = GetCurrentTransactionNestLevel(); + + /* + * If there's any previous entry, this one should be for a deeper + * nesting level. + */ + Assert(transInvalInfo == NULL || + myInfo-my_level transInvalInfo-my_level); + + transInvalInfo = myInfo; } /* @@ -727,24 +740,6 @@ PostPrepare_Inval(void) } /* - * AtSubStart_Inval - * Initialize inval lists at start of a subtransaction. - */ -void -AtSubStart_Inval(void) -{ - TransInvalidationInfo *myInfo; - - Assert(transInvalInfo != NULL); - myInfo = (TransInvalidationInfo *) - MemoryContextAllocZero(TopTransactionContext, - sizeof(TransInvalidationInfo)); - myInfo-parent = transInvalInfo; - myInfo-my_level = GetCurrentTransactionNestLevel(); - transInvalInfo = myInfo; -} - -/* * Collect invalidation messages into SharedInvalidMessagesArray array. */ static void @@ -803,8 +798,16 @@ xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs, { MemoryContext oldcontext; + /* Quick exit if we haven't done anything with invalidation messages. */ + if (transInvalInfo == NULL) + { + *RelcacheInitFileInval = false; + *msgs = NULL; + return 0; + } + /* Must be at top of stack */ - Assert(transInvalInfo != NULL transInvalInfo-parent == NULL); + Assert(transInvalInfo-my_level == 1 transInvalInfo-parent == NULL); /* * Relcache init file invalidation requires processing both before and @@ -904,11 +907,15 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs, void AtEOXact_Inval(bool isCommit) { + /* Quick exit if no messages */ + if (transInvalInfo == NULL) + return; + + /* Must be at top of stack */ + Assert(transInvalInfo-my_level == 1 transInvalInfo-parent == NULL); + if (isCommit) { - /* Must be at top of stack */ - Assert(transInvalInfo != NULL transInvalInfo-parent == NULL); - /* * Relcache init file invalidation requires processing both before and * after we send the SI messages. However, we need not do anything @@ -926,17 +933,16 @@ AtEOXact_Inval(bool isCommit) if (transInvalInfo-RelcacheInitFileInval) RelationCacheInitFilePostInvalidate(); } - else if (transInvalInfo != NULL) + else { - /* Must be at top of stack */ - Assert(transInvalInfo-parent == NULL); - ProcessInvalidationMessages(transInvalInfo-PriorCmdInvalidMsgs,
Re: [HACKERS] Last Commitfest patches waiting review
On 10/09/2014 09:59 PM, Peter Geoghegan wrote: My concern is that if we only get it committed in the last commitfest, we may run out of time to make sortsupport work for B-Tree index builds. That's where the sortsupport for text stuff will be really useful. B-tree index build uses tuplesort.c. What's missing? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench throttling latency limit
On 10/05/2014 10:43 AM, Fabien COELHO wrote: Hello Heikki, Here are new patches, again the first one is just refactoring, and the second one contains this feature. I'm planning to commit the first one shortly, and the second one later after people have had a chance to look at it. I looked at it. It looks ok, but for a few spurious spacing changes here and there. No big deal. I tested it, everything I tested behaved as expected, so it is ok for me. Thanks! I committed the refactoring patch earlier, and just went through the second patch again. I wordsmithed the documentation and comments, and fixed the documentation on the log format. I also fixed the logging of skipped transactions so that the schedule lag is reported correctly for them. One thing bothers me with the log format. Here's an example: 0 81 4621 0 1412881037 912698 3005 0 82 6173 0 1412881037 914578 4304 0 83 skipped 0 1412881037 914578 5217 0 83 skipped 0 1412881037 914578 5099 0 83 4722 0 1412881037 916203 3108 0 84 4142 0 1412881037 918023 2333 0 85 2465 0 1412881037 919759 740 Note how the transaction counter is not incremented for skipped transactions. That's understandable, since we're not including skipped transactions in the number of transactions executed, but it means that the skipped transactions don't have a unique ID. That's annoying. Here's a new version of the patch. I'll sleep over it before committing, but I think it's fine now, except maybe for the unique ID thing. - Heikki From 79dc4fba57e4c8a52a074ab302aee72eb78ce6fe Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Thu, 9 Oct 2014 22:04:27 +0300 Subject: [PATCH 1/1] Add --latency-limit option to pgbench. This allows transactions that take longer than specified limit to be counted separately. With --rate, transactions that are already late by the time we get to execute them are skipped altogether. Using --latency-limit with --rate allows you to catch up more quickly, if there's a hickup in the server causing a lot of transactions to stall momentarily. Fabien COELHO, reviewed by Rukh Meski and heavily refactored by me. --- contrib/pgbench/pgbench.c | 231 +++--- doc/src/sgml/pgbench.sgml | 70 +++--- 2 files changed, 233 insertions(+), 68 deletions(-) diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index e9431ee..5506346 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -141,6 +141,14 @@ double sample_rate = 0.0; int64 throttle_delay = 0; /* + * Transactions which take longer than this limit (in usec) are counted as + * late, and reported as such, although they are completed anyway. When + * throttling is enabled, execution time slots that are more than this late + * are skipped altogether, and counted separately. + */ +int64 latency_limit = 0; + +/* * tablespace selection */ char *tablespace = NULL; @@ -238,6 +246,8 @@ typedef struct int64 throttle_trigger; /* previous/next throttling (us) */ int64 throttle_lag; /* total transaction lag behind throttling */ int64 throttle_lag_max; /* max transaction lag */ + int64 throttle_latency_skipped; /* lagging transactions skipped */ + int64 latency_late; /* late transactions */ } TState; #define INVALID_THREAD ((pthread_t) 0) @@ -250,6 +260,8 @@ typedef struct int64 sqlats; int64 throttle_lag; int64 throttle_lag_max; + int64 throttle_latency_skipped; + int64 latency_late; } TResult; /* @@ -284,6 +296,8 @@ typedef struct long start_time; /* when does the interval start */ int cnt; /* number of transactions */ + int skipped; /* number of transactions skipped under + * --rate and --latency-limit */ double min_latency; /* min/max latencies */ double max_latency; @@ -348,7 +362,7 @@ static void setalarm(int seconds); static void *threadRun(void *arg); static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, - AggVals *agg); + AggVals *agg, bool skipped); static void usage(void) @@ -375,6 +389,8 @@ usage(void) -f, --file=FILENAME read transaction script from FILENAME\n -j, --jobs=NUM number of threads (default: 1)\n -l, --logwrite transaction times to log file\n + -L, --latency-limit=NUM count transactions lasting more than NUM ms\n + as late.\n -M, --protocol=simple|extended|prepared\n protocol for submitting queries (default: simple)\n -n, --no-vacuum do not run VACUUM before tests\n @@ -994,7 +1010,9 @@ void agg_vals_init(AggVals *aggs, instr_time start) { /* basic counters */ - aggs-cnt = 0;/* number of transactions */ + aggs-cnt = 0;/* number of transactions (includes skipped) */ + aggs-skipped = 0; /* xacts skipped under --rate --latency-limit */ + aggs-sum_latency = 0; /* SUM(latency) */
Re: [HACKERS] Last Commitfest patches waiting review
On Thu, Oct 9, 2014 at 12:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: B-tree index build uses tuplesort.c. What's missing? I don't think that all that much is missing. Tuplesort expects to work with an index scankey when sorting B-Tree tuples. There needs to be something like a reverse lookup of the sortsupport function. It looks like a historical oversight, that would take time to fix, but wouldn't be particularly challenging. You'd need to pick out the operators from the scankey, so you'd have something like what tuplesort_begin_heap() starts off with with tuplesort_begin_index_btree(). copytup_index() would then later need to be modifed to make abbreviation occur there too, but that's no big deal. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-09 16:01:55 +0200, Andres Freund wrote: On 2014-10-09 18:17:09 +0530, Amit Kapila wrote: On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote: On another point, I think it would be a good idea to rebase the bgreclaimer patch over what I committed, so that we have a clean patch against master to test with. Please find the rebased patch attached with this mail. I have taken some performance data as well and done some analysis based on the same. Performance Data IBM POWER-8 24 cores, 192 hardware threads RAM = 492GB max_connections =300 Database Locale =C checkpoint_segments=256 checkpoint_timeout=15min shared_buffers=8GB scale factor = 5000 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins I don't think OLTP really is the best test case for this. Especially not pgbench with relatilvely small rows *and* a uniform distribution of access. Try parallel COPY TO. Batch write loads is where I've seen this hurt badly. As an example. The attached scripts go from: progress: 5.3 s, 20.9 tps, lat 368.917 ms stddev 49.655 progress: 10.1 s, 21.0 tps, lat 380.326 ms stddev 64.525 progress: 15.1 s, 14.1 tps, lat 568.108 ms stddev 226.040 progress: 20.4 s, 12.0 tps, lat 634.557 ms stddev 300.519 progress: 25.2 s, 17.5 tps, lat 461.738 ms stddev 136.257 progress: 30.2 s, 9.8 tps, lat 850.766 ms stddev 305.454 progress: 35.3 s, 12.2 tps, lat 670.473 ms stddev 271.075 progress: 40.2 s, 7.9 tps, lat 972.617 ms stddev 313.152 progress: 45.3 s, 14.9 tps, lat 546.056 ms stddev 211.987 progress: 50.2 s, 13.2 tps, lat 610.608 ms stddev 271.780 progress: 55.5 s, 16.9 tps, lat 468.757 ms stddev 156.516 progress: 60.5 s, 14.3 tps, lat 548.913 ms stddev 190.414 progress: 65.7 s, 9.3 tps, lat 821.293 ms stddev 353.665 progress: 70.1 s, 16.0 tps, lat 524.240 ms stddev 174.903 progress: 75.2 s, 17.0 tps, lat 485.692 ms stddev 194.273 progress: 80.2 s, 19.9 tps, lat 396.295 ms stddev 78.891 progress: 85.3 s, 18.3 tps, lat 423.744 ms stddev 105.798 progress: 90.1 s, 14.5 tps, lat 577.373 ms stddev 270.914 progress: 95.3 s, 12.0 tps, lat 649.434 ms stddev 247.001 progress: 100.3 s, 14.6 tps, lat 563.693 ms stddev 275.236 tps = 14.81 (including connections establishing) to: progress: 5.1 s, 18.9 tps, lat 409.766 ms stddev 75.032 progress: 10.3 s, 20.2 tps, lat 396.781 ms stddev 67.593 progress: 15.1 s, 19.1 tps, lat 418.545 ms stddev 109.431 progress: 20.3 s, 20.6 tps, lat 388.606 ms stddev 74.259 progress: 25.1 s, 19.5 tps, lat 406.591 ms stddev 109.050 progress: 30.0 s, 19.1 tps, lat 420.199 ms stddev 157.005 progress: 35.0 s, 18.4 tps, lat 421.102 ms stddev 124.019 progress: 40.3 s, 12.3 tps, lat 640.640 ms stddev 88.409 progress: 45.2 s, 12.8 tps, lat 586.471 ms stddev 145.543 progress: 50.5 s, 6.9 tps, lat 1116.603 ms stddev 285.479 progress: 56.2 s, 6.3 tps, lat 1349.055 ms stddev 381.095 progress: 60.6 s, 7.9 tps, lat 1083.745 ms stddev 452.386 progress: 65.0 s, 9.6 tps, lat 805.981 ms stddev 273.845 progress: 71.1 s, 9.6 tps, lat 798.273 ms stddev 184.108 progress: 75.2 s, 9.3 tps, lat 950.131 ms stddev 150.870 progress: 80.8 s, 8.6 tps, lat 899.389 ms stddev 135.090 progress: 85.3 s, 8.8 tps, lat 928.183 ms stddev 152.056 progress: 90.9 s, 8.0 tps, lat 929.737 ms stddev 71.155 progress: 95.7 s, 9.0 tps, lat 968.070 ms stddev 127.824 progress: 100.3 s, 8.7 tps, lat 911.767 ms stddev 130.697 just by switching shared_buffers from 1 to 8GB. I haven't tried, but I hope that with an approach like your's this might become better. psql -f /tmp/prepare.sql pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services CREATE OR REPLACE FUNCTION exec(text) returns text language plpgsql volatile AS $f$ BEGIN EXECUTE $1; RETURN $1; END; $f$; \o /dev/null SELECT exec('drop table if exists largedata_'||g.i||'; create unlogged table largedata_'||g.i||'(data bytea, id serial primary key);') FROM generate_series(0, 64) g(i); \o COPY largedata_:client_id(data) FROM '/tmp/large' BINARY; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On 2014-10-09 15:01:19 -0400, Robert Haas wrote: On Thu, Oct 9, 2014 at 8:20 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-09 08:18:18 -0400, Robert Haas wrote: On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote: Interesting - in my local profile AtStart_Inval() is more pronounced than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked AtStart_Inval() out of readonly queries ontop of your patch. Together that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM tbl WHER pkey = xxx' testcase. Whoa. Now that's clearly significant. Well, my guess it'll be far less noticeable in less trivial workloads. But it does seem worthwile. You didn't attach the patch; was that inadvertent, or was it too ugly for that? Far, far too ugly ;). I just removed the AtStart() call from xact.c and sprinkled it around relevant places instead ;) OK, here's an attempt at a real patch for that. I haven't perf-tested this. Neato. With a really trivial SELECT: before: tps = 28150.794776 (excluding connections establishing) after: tps = 29978.767703 (excluding connections establishing) slightly more meaningful: before: tps = 21272.400039 (including connections establishing) after: tps = 22290.703482 (excluding connections establishing) So that's a noticeable win. Obviously it's going to be less for more complicated stuff, but still... I've not really looked at the patches though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
From: Craig Ringer cr...@2ndquadrant.com It'd be interesting and useful to run this test on a debug build of PostgreSQL, i.e. one compiled against the debug version of the C library and with full debuginfo not just minimal .pdb. Although I'm not sure the user can do this now, I'll ask him anyway. How were the stacks captured - what tool? According to his mail, Windbg or userdump.exe. I'll ask him about this. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last Commitfest patches waiting review
On Thu, Oct 9, 2014 at 12:51 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Oh, I didn't realize we don't do that already! I'm surprised, I would've expected index build to have been the first thing we'd use the SortSupport stuff in. The thing is that the most compelling numbers for sortsupport (plus the related improvements to tuplesort itself) were from the onlyKey optimization - the numbers are only so-so when you look at multi-attribute sorts, because we specialize qsort() for both of those two cases (i.e. one specialization, qsort_ssup(), only looks at datum1, while qsort_tuple() looks at everything else, through which comparetup_heap() and comparetup_datum() are called). But with the B-Tree comparator, we're only ever going to be able to use qsort_tuple() which is roughly equivalent to the so-so multi-attribute case for heap tuples (because we need to detect duplicate violations, and a few other things - no choice there). It kind of makes sense that we didn't push ourselves to get B-Tree support until now. But with sortsupport for text accelerating sorts by perhaps as much as 10 times in sympathetic (though realistic) cases, it would be crazy to have that without B-Tree support (abbreviated keys always force us to use the qsort_tuple() specialization, because the comparator tie-breaker logic must live in places like comparetup_heap()). Yeah, that seems worth doing, independently of the this patch. As I mentioned, that might be less true than you'd think. Can you write a separate patch to use SortSupport for B-tree index builds, please? Eliminating the FunctionCallInfoData overhead should shave off some some cycles from every index build. I'll look into it. Hopefully an effort to actually implement it will show that I was right, and there isn't much to it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
From: Andres Freund and...@2ndquadrant.com What precisely do you mean with Intel64? 64bit x86 or Itanium? 64-bit x86, i.e. x86-64. Also, what's the precise workload? Can you reproduce the problem? IIUC, each client inserts 1000 records into one table, then repeats updating all those records. I'll ask him again. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
Tomas Vondra t...@fuzzy.cz wrote: On 9.10.2014 16:55, Kevin Grittner wrote: I've tried various other tests using \timing rather than EXPLAIN, and the patched version looks even better in those cases. I have seen up to 4x the performance for a query using the patched version, higher variability in run time without the patch, and have yet to devise a benchmark where the patched version came out slower (although I admit to not being as good at creating such cases as some here). Nice. Thanks for the testing! The only case I've been able to come up with is when the hash table fits into work_mem only thanks to not counting the buckets. The new code will start batching in this case. Hmm. If you look at the timings in my posting from 2014-10-01 I had cases where the patched version started with one batch and went to two, while the unpatched used just one batch, and the patched version was still more than twice as fast. I'm sure the on disk batch was fully cached, however; it might work out differently if disk speed actually came into the picture. That is mostly luck, however, because it depends on the cardinality estimates, and the best way to fix it is increasing work_mem (which is safer thanks to reducing the overhead). Also, Robert proposed a way to mitigate this, if we realize we'd have to do batching during the initial sizing, we can peek whether reducing the number of buckets (to 1/2 or maybe 1/4) would help. I believe this is a good approach, and will look into that after pgconf.eu (i.e. early November), unless someone else is interested. Sure, but it would be good to confirm that it's actually needed first. When given a generous work_mem setting the patched version often uses more of what it is allowed than the unpatched version (which is probably one of the reasons it tends to do better). If someone has set a high work_mem and has gotten by only because the configured amount is not all being used when it would benefit performance, they may need to adjust work_mem down to avoid memory problems. That doesn't seem to me to be a reason to reject the patch. I'm not entirely sure I understand this paragraph. What do you mean by configured amount is not all being used when it would benefit performance? Can you give an example? Again, the earlier post showed that while the unpatched used 3516kB whether it had work_mem set to 4MB or 1GB, the patched version used 3073kB when work_mem was set to 4MB and 4540kB when work_mem was set to 1GB. The extra memory allowed the patched version to stay at 1 batch, improving performance over the other setting. The only thing I can think of is the batching behavior described above. This is in Waiting on Author status only because I never got an answer about why the debug code used printf() rather the elog() at a DEBUG level. Other than that, I would say this patch is Ready for Committer. Tomas? You there? I think I responded to that on October 2, quoting: === On 2.10.2014 09:50, Tomas Vondra wrote: On 2.10.2014, 2:20, Kevin Grittner wrote: The patch applied and built without problem, and pass `make check-world`. The only thing that caught my eye was the addition of debug code using printf() instead of logging at a DEBUG level. Is there any reason for that? Not really. IIRC the main reason it that the other code in nodeHash.c uses the same approach. === Ah, I never received that email. That tends to happen every now and then. :-( I believe it's safe to switch the logging to elog(). IMHO the printf logging is there from some very early version of the code, before elog was introduced. Or something like that. I guess it might be best to remain consistent in this patch and change that in a separate patch. I just wanted to make sure you didn't see any reason not to do so. With that addressed I will move this to Ready for Committer. Since both Heikki and Robert spent time on this patch earlier, I'll give either of them a shot at committing it if they want; otherwise I'll do it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade, locale and encoding
On Tue, Oct 7, 2014 at 03:52:24PM +0300, Heikki Linnakangas wrote: While looking at bug #11431, I noticed that pg_upgrade still seems to think that encoding and locale are cluster-wide properties. We got per-database locale support in 8.4, and encoding has been per-database much longer than that. pg_upgrade checks the encoding and locale of template0 in both clusters, and throws an error if they don't match. But it doesn't check the locale or encoding of postgres or template1 databases. That leads to problems if e.g. the postgres database was dropped and recreated with a different encoding or locale in the old cluster. We will merrily upgrade it, but strings in the database will be incorrectly encoded. Wow, I never thought someone would do that, but they certainly could --- good catch. I propose the attached patch, for git master. It's more complicated in back-branches, as they still support upgrading from pre-8.4 clusters. We haven't heard any complaints from the field on this, so I don't think it's worth trying to back-patch this. Agreed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On Thu, Oct 9, 2014 at 02:34:17PM +0400, Alexey Bashtanov wrote: Hello! Autovacuum daemon performs vacuum when the number of rows updated/deleted (n_dead_tuples) reaches some threshold. Similarly it performs analyze when the number of rows changed in any way (incl. inserted). When a table is mostly insert-only, its visibility map is not updated as vacuum threshold is almost never reached, but analyze does not update visibility map. Why could it be a bad idea to run vacuum after some number of any changes including inserts, like analyze? Or at least make it tunable by user (add a second bunch of paramters to control second vacuum threshold, disabled by default)? I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On Thu, Oct 9, 2014 at 3:53 PM, Andres Freund and...@2ndquadrant.com wrote: OK, here's an attempt at a real patch for that. I haven't perf-tested this. Neato. With a really trivial SELECT: before: tps = 28150.794776 (excluding connections establishing) after: tps = 29978.767703 (excluding connections establishing) slightly more meaningful: before: tps = 21272.400039 (including connections establishing) after: tps = 22290.703482 (excluding connections establishing) So that's a noticeable win. Obviously it's going to be less for more complicated stuff, but still... I've not really looked at the patches though. Yeah, not bad at all for the amount of work involved. Want to disclose the actual queries? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Bruce Momjian wrote: I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us I hate to repeat myself, but I think autovacuum could be modified to run actions other than vacuum and analyze. In this specific case we could be running a table scan that checks only pages that don't have the all-visible bit set, and see if it can be set. (Of course, this idea needs refinement to avoid running over and over when the bit cannot be set on some pages for whatever reason.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferring some AtStart* allocations?
On 2014-10-09 17:02:02 -0400, Robert Haas wrote: On Thu, Oct 9, 2014 at 3:53 PM, Andres Freund and...@2ndquadrant.com wrote: OK, here's an attempt at a real patch for that. I haven't perf-tested this. Neato. With a really trivial SELECT: before: tps = 28150.794776 (excluding connections establishing) after: tps = 29978.767703 (excluding connections establishing) slightly more meaningful: before: tps = 21272.400039 (including connections establishing) after: tps = 22290.703482 (excluding connections establishing) So that's a noticeable win. Obviously it's going to be less for more complicated stuff, but still... I've not really looked at the patches though. Yeah, not bad at all for the amount of work involved. Want to disclose the actual queries? SELECT 1; SELECT * FROM smalltable WHERE id = xxx; The latter is something quite frequent in the real world, so it's not all academic... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote: Bruce Momjian wrote: I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us I hate to repeat myself, but I think autovacuum could be modified to run actions other than vacuum and analyze. In this specific case we could be running a table scan that checks only pages that don't have the all-visible bit set, and see if it can be set. Isn't that *precisely* what a plain vacuum run does? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Alvaro Herrera alvhe...@2ndquadrant.com wrote: Bruce Momjian wrote: I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us I hate to repeat myself, but I think autovacuum could be modified to run actions other than vacuum and analyze. In this specific case we could be running a table scan that checks only pages that don't have the all-visible bit set, and see if it can be set. (Of course, this idea needs refinement to avoid running over and over when the bit cannot be set on some pages for whatever reason.) Wouldn't we get substantially the same thing just by counting tuple inserts toward the autovacuum vacuum threshold? I mean, it unless the table is due for wraparound prevention autovacuum, it will only visit pages that don't have the all-visible bit set, right? And how much work would that do beyond what you're describing if none of the tuples are dead? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Andres Freund wrote: On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote: Bruce Momjian wrote: I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us I hate to repeat myself, but I think autovacuum could be modified to run actions other than vacuum and analyze. In this specific case we could be running a table scan that checks only pages that don't have the all-visible bit set, and see if it can be set. Isn't that *precisely* what a plain vacuum run does? Well, it also scans for dead tuples, removes them, and needs to go through indexes to remove their references. I'm thinking in something very lightweight. Otherwise, why don't we just reduce the vacuum_scale_factor default to something very small, so that vacuum is triggered more often? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Kevin Grittner wrote: Wouldn't we get substantially the same thing just by counting tuple inserts toward the autovacuum vacuum threshold? I mean, it unless the table is due for wraparound prevention autovacuum, it will only visit pages that don't have the all-visible bit set, right? And how much work would that do beyond what you're describing if none of the tuples are dead? The problem is precisely what happens if there are some dead tuples, but not enough to reach the 20% threshold: this vacuum now has to scan the table twice and has to clean up indexes also. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] schema-only -n option in pg_restore fails
All, Crossing this over to -hackers because it's stopped being a bug and is now a TODO item. See below. For those not on pgsql-bugs, I've quoted the full bug report below my proposal. On 10/09/2014 12:36 PM, Josh Berkus wrote: Summary: pg_restore -n attempts to restore objects to pg_catalog schema Versions Tested: 9.3.5, 9.3.0, 9.2.4 Explored this some with Andrew offlist. Turns out this is going to be a PITA to fix, so it should go on the big pile of TODOs for when we overhaul search_path. Here's what's happening under the hood, pg_restore generates this SQL text: SET search_path = schem_a, pg_catalog; CREATE TABLE tab_a ( test text ); Since schem_a doesn't exist, it's skipped over and pg_restore attempts to create the objects in pg_catalog. So this is Yet Another Issue caused by the ten meter tall tar baby which is search_path. So, my proposal for a resolution: 1) In current versions, patch the docs to explicitly say that -n does not create the schema, and that if the user doesn't create the schema pg_restore will fail. 2) Patch 9.5's pg_restore to do CREATE SCHEMA IF NOT EXISTS when -n is used. This will be 100% backwards-compatible with current behavior. Discuss? Original bug report follows. On 10/09/2014 12:36 PM, Josh Berkus wrote: Summary: pg_restore -n attempts to restore objects to pg_catalog schema Versions Tested: 9.3.5, 9.3.0, 9.2.4 Severity: Failure Description: The -n option (or --schema) for pg_restore is supposed to allow you to restore a single schema from a custom-format pg_dump file. Instead, it attempts to restore that schema's objects to the pg_catalog schema instead. See the test case below. What's happening here is that the user is apparently expected to create the schema manually before doing a -n pg_restore. However, that's not what the documentation says, and additionally doesn't make any sense if we're not giving the user the ability to restore to an alternate schema name (and so far we aren't). If the schema does not already exist, pg_restore attempts to restore to the pg_catalog schema instead, which fails. In other words, pg_restore -n is just broken. Clearly few people use it or we'd have a bug on it before now. What should happen is that pg_restore -n should create the schema if it doesn't already exist. If for some reason you think that pg_restore shouldn't create the schema (which would be user-hostile, but at least consistent), then this should fail cleanly with a schema does not exist error message instead of trying to restore to pg_catalog. Test Case: 1. createdb schtest; 2. createdb schrestore; 3. psql schtest 4. create schema schem_a; create table schem_a.tab_a ( test text ); create schema schem_b; create table schem_b.tab_b ( test text ); create schema schem_c; create table schem_c.tab_c ( test text ); 5. pg_dump -Fc -f /tmp/schmtest.dump schtest 6. pg_restore -Fc -n schem_a -d schrestore /tmp/schmtest.dump 7. pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 171; 1259 1191591 TABLE tab_a josh pg_restore: [archiver (db)] could not execute query: ERROR: permission denied to create pg_catalog.tab_a DETAIL: System catalog modifications are currently disallowed. Command was: CREATE TABLE tab_a ( test text ); pg_restore: [archiver (db)] could not execute query: ERROR: schema schem_a does not exist Command was: ALTER TABLE schem_a.tab_a OWNER TO josh; pg_restore: [archiver (db)] Error from TOC entry 2194; 0 1191591 TABLE DATA tab_a josh pg_restore: [archiver (db)] could not execute query: ERROR: relation tab_a does not exist Command was: COPY tab_a (test) FROM stdin; -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote: Andres Freund wrote: On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote: Bruce Momjian wrote: I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us I hate to repeat myself, but I think autovacuum could be modified to run actions other than vacuum and analyze. In this specific case we could be running a table scan that checks only pages that don't have the all-visible bit set, and see if it can be set. Isn't that *precisely* what a plain vacuum run does? Well, it also scans for dead tuples, removes them, and needs to go through indexes to remove their references. IIRC it doesn't do most of that if that there's no need. And if it's a insert only table without rollbacks. I *do* think there's some optimizations we could make in general. I'm thinking in something very lightweight. Otherwise, why don't we just reduce the vacuum_scale_factor default to something very small, so that vacuum is triggered more often? The problem here is that that doesn't trigger for inserts. Just for updates/deletes or rollbacks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 10/8/14, 8:35 AM, Andres Freund wrote: +#define EXCLUSIVE_LOCK (((uint32) 1) (31 - 1)) + +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ +#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK) There should at least be a comment where we define MAX_BACKENDS about the relationship here... or better yet, validate that MAX_BACKENDS SHARED_LOCK_MASK during postmaster startup. (For those that think that's too pedantic, I'll argue that it's no worse than the patch verifying that MyProc != NULL in LWLockQueueSelf()). +/* + * Internal function that tries to atomically acquire the lwlock in the passed + * in mode. + * + * This function will not block waiting for a lock to become free - that's the + * callers job. + * + * Returns true if the lock isn't free and we need to wait. + * + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. + */ +static bool +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious) We should invert the return of this function. Current code returns true if the lock is actually acquired (see below), and I think that's true of other locking code as well. IMHO it makes more sense that way, plus consistency is good. (From 9.3) * LWLockConditionalAcquire - acquire a lightweight lock in the specified mode * * If the lock is not available, return FALSE with no side-effects. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-09 16:52:46 -0500, Jim Nasby wrote: On 10/8/14, 8:35 AM, Andres Freund wrote: +#define EXCLUSIVE_LOCK (((uint32) 1) (31 - 1)) + +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ +#define SHARED_LOCK_MASK (~EXCLUSIVE_LOCK) There should at least be a comment where we define MAX_BACKENDS about the relationship here... or better yet, validate that MAX_BACKENDS SHARED_LOCK_MASK during postmaster startup. (For those that think that's too pedantic, I'll argue that it's no worse than the patch verifying that MyProc != NULL in LWLockQueueSelf()). If you modify either, you better grep for them... I don't think that's going to happen anyway. Requiring it during startup would mean exposing SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly stick a StaticAssert() someplace in lwlock.c. And no, it's not comparable at all to MyProc != NULL - the lwlock code initially *does* run when MyProc isn't setup. We just better not conflict against any other lockers at that stage. +/* + * Internal function that tries to atomically acquire the lwlock in the passed + * in mode. + * + * This function will not block waiting for a lock to become free - that's the + * callers job. + * + * Returns true if the lock isn't free and we need to wait. + * + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. + */ +static bool +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious) We should invert the return of this function. Current code returns true if the lock is actually acquired (see below), and I think that's true of other locking code as well. IMHO it makes more sense that way, plus consistency is good. I don't think so. I've wondered about it as well, but the way the function is used its more consistent imo if it returns whether we must wait. Note that it's not an exported function. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Build (definition?) errors - in bootstrap
Having just git pulled from orgin/master: $ ./configure $ make Mileage: ... make -C bootstrap all make[3]: Entering directory `/path/to/postgresql/src/backend/bootstrap' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -I. -I. -I../../../src/include -c -o bootparse.o bootparse.c bootparse.y: In function ‘boot_yyparse’: bootparse.y:311:9: warning: passing argument 1 of ‘DefineIndex’ makes integer from pointer without a cast [enabled by default] In file included from bootparse.y:37:0: ../../../src/include/commands/defrem.h:24:12: note: expected ‘Oid’ but argument is of type ‘struct IndexStmt *’ bootparse.y:311:9: warning: passing argument 2 of ‘DefineIndex’ makes pointer from integer without a cast [enabled by default] In file included from bootparse.y:37:0: ../../../src/include/commands/defrem.h:24:12: note: expected ‘struct IndexStmt *’ but argument is of type ‘Oid’ bootparse.y:311:9: error: too few arguments to function ‘DefineIndex’ In file included from bootparse.y:37:0: ../../../src/include/commands/defrem.h:24:12: note: declared here bootparse.y:346:9: warning: passing argument 1 of ‘DefineIndex’ makes integer from pointer without a cast [enabled by default] In file included from bootparse.y:37:0: ../../../src/include/commands/defrem.h:24:12: note: expected ‘Oid’ but argument is of type ‘struct IndexStmt *’ bootparse.y:346:9: warning: passing argument 2 of ‘DefineIndex’ makes pointer from integer without a cast [enabled by default] In file included from bootparse.y:37:0: ../../../src/include/commands/defrem.h:24:12: note: expected ‘struct IndexStmt *’ but argument is of type ‘Oid’ bootparse.y:346:9: error: too few arguments to function ‘DefineIndex’ In file included from bootparse.y:37:0: ../../../src/include/commands/defrem.h:24:12: note: declared here make[3]: *** [bootparse.o] Error 1 make[3]: Leaving directory `/path/to/postgresql/src/backend/bootstrap' make[2]: *** [bootstrap-recursive] Error 2 make[2]: Leaving directory `/path/to/postgresql/src/backend' make[1]: *** [all-backend-recurse] Error 2 make[1]: Leaving directory `/path/to/postgresql/src' make: *** [all-src-recurse] Error 2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c
I found a reference made obsolete by commit 9e85183b, which is from way back in 2000. comparetup_index_btree() says: /* * This is similar to _bt_tuplecompare(), but we have already done the * index_getattr calls for the first column, and we need to keep track of * whether any null fields are present. Also see the special treatment * for equal keys at the end. */ I think that this comment should simply indicate that the routine is similar to comparetup_heap(), except that it takes care of the special tie-break stuff for B-Tree sorts, as well as enforcing uniqueness during unique index builds. It should not reference _bt_compare() at all (which is arguably the successor to _bt_tuplecompare()), since _bt_compare() is concerned with a bunch of stuff highly specific to the B-Tree implementation (e.g. having a hard-wired return value for comparisons involving the first data item on an internal page). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Build (definition?) errors - in bootstrap
Lou Picciano loupicci...@comcast.net writes: Having just git pulled from orgin/master: $ ./configure $ make [ fails ] The buildfarm doesn't seem unhappy, so I doubt there's anything wrong with the code as such. Try make clean or even make distclean and rebuild. Also, if your computer's clock is or was badly off, you may have file timestamp skews breaking things ... in which case you might need make maintainer-clean. If it's still broken after that, you'd be best advised to fix the clock and do a complete fresh git clone. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 10/9/14, 4:57 PM, Andres Freund wrote: If you modify either, you better grep for them... I don't think that's going to happen anyway. Requiring it during startup would mean exposing SHARED_LOCK_MASK outside of lwlock.c which'd be ugly. We could possibly stick a StaticAssert() someplace in lwlock.c. Ahh, yeah, exposing it would be ugly. I just get the heeby-jeebies when I see assumptions like this though. I fear there's a bunch of cases where changing something will break a completely unrelated part of the system with no warning. Maybe add an assert() to check it? And no, it's not comparable at all to MyProc != NULL - the lwlock code initially*does* run when MyProc isn't setup. We just better not conflict against any other lockers at that stage. Ahh, can you maybe add that detail to the comment? That wasn't clear to me. +/* + * Internal function that tries to atomically acquire the lwlock in the passed + * in mode. + * + * This function will not block waiting for a lock to become free - that's the + * callers job. + * + * Returns true if the lock isn't free and we need to wait. + * + * When acquiring shared locks it's possible that we disturb an exclusive + * waiter. If that's a problem for the specific user, pass in a valid pointer + * for 'potentially_spurious'. Its value will be set to true if we possibly + * did so. The caller then has to handle that scenario. + */ +static bool +LWLockAttemptLock(LWLock* lock, LWLockMode mode, bool *potentially_spurious) We should invert the return of this function. Current code returns true if the lock is actually acquired (see below), and I think that's true of other locking code as well. IMHO it makes more sense that way, plus consistency is good. I don't think so. I've wondered about it as well, but the way the function is used its more consistent imo if it returns whether we must wait. Note that it's not an exported function. ISTM that a function attempting a lock would return success, not failure. Even though it's internal now it could certainly be made external at some point in the future. But I suppose it's ultimately a matter of preference... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 10/9/14, 4:03 PM, Alvaro Herrera wrote: Bruce Momjian wrote: I agree this is a serious problem. We have discussed various options, but have not decided on anything. The TODO list has: https://wiki.postgresql.org/wiki/Todo Improve setting of visibility map bits for read-only and insert-only workloads http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us I hate to repeat myself, but I think autovacuum could be modified to run actions other than vacuum and analyze. In this specific case we could be running a table scan that checks only pages that don't have the all-visible bit set, and see if it can be set. (Of course, this idea needs refinement to avoid running over and over when the bit cannot be set on some pages for whatever reason.) If we go down that road we should also think about having it proactively set hint bits... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Build (definition?) errors - in bootstrap
On Fri, Oct 10, 2014 at 8:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Lou Picciano loupicci...@comcast.net writes: Having just git pulled from orgin/master: $ ./configure $ make [ fails ] The buildfarm doesn't seem unhappy, so I doubt there's anything wrong with the code as such. Try make clean or even make distclean and rebuild. Also, if your computer's clock is or was badly off, you may have file timestamp skews breaking things ... in which case you might need make maintainer-clean. If it's still broken after that, you'd be best advised to fix the clock and do a complete fresh git clone. Something more violent can as well be done: git clean -dxf This ensures that there are no other files than the ones of your git repository, making your repository back to a fresh state. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On 10/8/14, 5:51 PM, Peter Geoghegan wrote: On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com wrote: Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me, but if the consensus is that we want to do that, it does argue for your plan of making UPSERT a variant of the INSERT command. Well, it isn't that I'm doing it because I think that it is a great idea, with everything to recommend it. It's more like I don't see any practical alternative. We need the before row insert triggers to fire to figure out what to insert (or if we should update instead). No way around that. At the same time, those triggers are at liberty to do almost anything, and so in general we have no way of totally nullifying their effects (or side effects). Surely you see the dilemma. FWIW, if each row was handled in a subtransaction then an insert that turned out to be an update could be rolled back... but the performance impact of going that route might be pretty horrid. :( There's also the potential to get stuck in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into an INSERT. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On 10/10/14 12:38, Jim Nasby wrote: On 10/8/14, 5:51 PM, Peter Geoghegan wrote: On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com wrote: Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me, but if the consensus is that we want to do that, it does argue for your plan of making UPSERT a variant of the INSERT command. Well, it isn't that I'm doing it because I think that it is a great idea, with everything to recommend it. It's more like I don't see any practical alternative. We need the before row insert triggers to fire to figure out what to insert (or if we should update instead). No way around that. At the same time, those triggers are at liberty to do almost anything, and so in general we have no way of totally nullifying their effects (or side effects). Surely you see the dilemma. FWIW, if each row was handled in a subtransaction then an insert that turned out to be an update could be rolled back... but the performance impact of going that route might be pretty horrid. :( There's also the potential to get stuck in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into an INSERT. Perhaps you need an UPSERT trigger? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last Commitfest patches waiting review
On Thu, Oct 9, 2014 at 1:13 PM, Peter Geoghegan p...@heroku.com wrote: Can you write a separate patch to use SortSupport for B-tree index builds, please? Eliminating the FunctionCallInfoData overhead should shave off some some cycles from every index build. I'll look into it. Hopefully an effort to actually implement it will show that I was right, and there isn't much to it. I was able to get this working pretty quickly. All regression tests pass. I'm not going to post a patch just yet, because I still need to make the cluster case work, and I'll probably want to refactor my approach to performing catalog lookups a bit, but the important point is that my original suspicion that this isn't very difficult or invasive has been confirmed. I should have something to post before too long. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] schema-only -n option in pg_restore fails
On Thu, October 9, 2014 23:19, Josh Berkus wrote: All, [dump/restore -n bug] Perhaps this (from five years ago) can be fixed too (esp. if only a doc-fix): http://www.postgresql.org/message-id/4833.156.83.1.81.1240955642.squir...@webmail.xs4all.nl It's not the same problem but also a failure in pick-and-choose restoring. This stuff has been broken for a long time - I got used to it... thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Wed, Oct 8, 2014 at 7:05 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, Attached you can find the next version of my LW_SHARED patchset. Now that atomics are committed, it seems like a good idea to also add their raison d'être. Since the last public version I have: * Addressed lots of Amit's comments. Thanks! * Peformed a fair amount of testing. * Rebased the code. The volatile removal made that not entirely trivial... * Significantly cleaned up and simplified the code. * Updated comments and such * Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case) The feature currently consists out of two patches: 1) Convert PGPROC-lwWaitLink into a dlist. The old code was frail and verbose. This also does: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. I can reproduce crashes without it. 2) Implement the wait free LW_SHARED algorithm. I have done few performance tests for above patches and results of same is as below: Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections =210 Database Locale =C checkpoint_segments=256 checkpoint_timeout=35min shared_buffers=8GB Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins Test type - read only pgbench with -M prepared Other Related information about test a. This is the data for median of 3 runs, the detailed data of individual run is attached with mail. b. I have applied both the patches to take performance data. Scale Factor - 100 Patch_ver/Client_count 1 8 16 32 64 128 HEAD 13344 106921 196629 295123 377846 333928 PATCH 13662 106179 203960 298955 452638 465671 Scale Factor - 3000 Patch_ver/Client_count 8 16 32 64 128 160 HEAD 86920 152417 231668 280827 257093 255122 PATCH 87552 160313 230677 276186 248609 244372 Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). The above data shows that the patch improves performance for cases when there is shared LWLock contention, however there is a slight performance dip in case of Exclusive LWLocks (at scale factor 3000, it needs exclusive LWLocks for buf mapping tables). Now I am not sure if this is the worst case dip or under similar configurations the performance dip can be higher, because the trend shows that dip is increasing with more client counts. Brief Analysis of code w.r.t performance dip - Extra Instructions w.r.t Head in Acquire Exclusive lock path a. Attempt lock twice b. atomic operations for nwaiters in LWLockQueueSelf() and LWLockAcquireCommon() c. Now we need to take spinlock twice, once for self queuing and then again for setting releaseOK. d. few function calls and some extra checks Similarly there seems to be few additional instructions in LWLockRelease() path. Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com perf_lwlock_contention_data_v1.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
On 10/10/2014 04:16 AM, MauMau wrote: From: Craig Ringer cr...@2ndquadrant.com It'd be interesting and useful to run this test on a debug build of PostgreSQL, i.e. one compiled against the debug version of the C library and with full debuginfo not just minimal .pdb. Although I'm not sure the user can do this now, I'll ask him anyway. It sounds like they've produced a test case, so they should be able to with a bit of luck. Or even better, send you the test case. How were the stacks captured - what tool? According to his mail, Windbg or userdump.exe. I'll ask him about this. Thanks. The stack trace looks fairly sane, i.e. there's nothing obviously out of whack at a glance, but I tend to get more informative traces from Visual Studio debugging sessions. Your next step here really needs to be to make this reproducible against a debug build. Then see if reverting the xlog scalability work actually changes the behaviour, given that you hypothesised that it could be involved. As I said off-list, if you can narrow the test case down to something that can be reproduced more quickly, you could also git-bisect to seek the commit at fault. Even if the test case takes an hour, that's still viable: $ git bisect start $ git bisect bad $ git bisect good REL9_3_RC1 Bisecting: a merge base must be tested [e472b921406407794bab911c64655b8b82375196] Avoid deadlocks during insertion into SP-GiST indexes. $ git bisect good Bisecting: 1026 revisions left to test after this (roughly 10 steps) ... Thanks to the magic of binary search. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers