[HACKERS] build farm machine using make -j 8 mixed results
I change the build-farm.conf file to have the following make line: make = 'make -j 8', # or gmake if required. can include path if necessary. 2 pass, 4 fail. Is this a build configuration you want to pursue? I can either create a new machine, or change one of my existing machines. Makes no difference to me. roberts-imac:build-farm-4.7-j robert$ time ./run_branches.pl --run-all --test Sat Sep 1 09:26:49 2012: buildfarm run for CHANGEME:HEAD starting [09:26:49] checking out source ... [09:27:59] checking if build run needed ... [09:27:59] copying source to pgsql.54279 ... [09:28:03] running configure ... [09:28:25] running make ... [09:28:45] running make check ... Branch: HEAD Stage Check failed with status 2 Sat Sep 1 09:29:09 2012: buildfarm run for CHANGEME:REL9_2_STABLE starting [09:29:09] checking out source ... [09:29:57] checking if build run needed ... [09:29:57] copying source to pgsql.70926 ... [09:29:58] running configure ... [09:30:18] running make ... [09:30:45] running make check ... [09:31:09] running make contrib ... [09:31:12] running make install ... [09:31:15] running make contrib install ... [09:31:16] setting up db cluster (C)... [09:31:19] starting db (C)... [09:31:20] running make installcheck (C)... [09:31:39] restarting db (C)... [09:31:51] running make isolation check ... [09:32:00] restarting db (C)... [09:32:12] running make PL installcheck (C)... Branch: REL9_2_STABLE Stage PLCheck-C failed with status 2 Sat Sep 1 09:32:29 2012: buildfarm run for CHANGEME:REL9_1_STABLE starting [09:32:29] checking out source ... [09:33:08] checking if build run needed ... [09:33:08] copying source to pgsql.94883 ... [09:33:10] running configure ... [09:33:30] running make ... [09:33:54] running make check ... [09:34:18] running make contrib ... [09:34:21] running make install ... [09:34:23] running make contrib install ... [09:34:24] setting up db cluster (C)... [09:34:26] starting db (C)... [09:34:27] running make installcheck (C)... [09:34:47] restarting db (C)... [09:34:59] running make isolation check ... [09:35:16] restarting db (C)... [09:35:28] running make PL installcheck (C)... Branch: REL9_1_STABLE Stage PLCheck-C failed with status 2 Sat Sep 1 09:35:46 2012: buildfarm run for CHANGEME:REL9_0_STABLE starting [09:35:46] checking out source ... [09:36:08] checking if build run needed ... [09:36:08] copying source to pgsql.18851 ... [09:36:10] running configure ... [09:36:28] running make ... [09:37:00] running make check ... [09:37:23] running make contrib ... [09:37:27] running make install ... [09:37:30] running make contrib install ... [09:37:32] setting up db cluster (C)... [09:37:34] starting db (C)... [09:37:35] running make installcheck (C)... [09:37:52] restarting db (C)... [09:38:04] running make PL installcheck (C)... [09:38:06] restarting db (C)... [09:38:18] running make contrib installcheck (C)... [09:38:30] stopping db (C)... [09:38:31] running make ecpg check ... [09:38:43] OK Branch: REL9_0_STABLE All stages succeeded Sat Sep 1 09:38:43 2012: buildfarm run for CHANGEME:REL8_4_STABLE starting [09:38:43] checking out source ... [09:38:57] checking if build run needed ... [09:38:57] copying source to pgsql.45071 ... [09:38:59] running configure ... [09:39:19] running make ... [09:39:46] running make check ... [09:40:14] running make contrib ... [09:40:17] running make install ... [09:40:23] running make contrib install ... [09:40:25] setting up db cluster (C)... [09:40:30] starting db (C)... [09:40:31] running make installcheck (C)... [09:40:47] restarting db (C)... [09:40:59] running make PL installcheck (C)... [09:41:01] restarting db (C)... [09:41:13] running make contrib installcheck (C)... [09:41:25] stopping db (C)... [09:41:26] running make ecpg check ... [09:41:44] OK Branch: REL8_4_STABLE All stages succeeded Sat Sep 1 09:41:44 2012: buildfarm run for CHANGEME:REL8_3_STABLE starting [09:41:44] checking out source ... [09:42:39] checking if build run needed ... [09:42:39] copying source to pgsql.80222 ... [09:42:42] running configure ... [09:43:03] running make ... [09:43:38] running make check ... [09:44:04] running make contrib ... [09:44:09] running make install ... [09:44:14] running make contrib install ... [09:44:17] setting up db cluster (C)... [09:44:20] starting db (C)... [09:44:21] running make installcheck (C)... [09:44:39] restarting db (C)... [09:44:51] running make PL installcheck (C)... [09:44:52] restarting db (C)... [09:45:05] running make contrib installcheck (C)... Branch: REL8_3_STABLE Stage ContribCheck-C failed with status 2
[HACKERS] Reduce the time to know trigger_file's existence
Hi, We are trying to introduce a thread that monitors the creation of the trigger_file. As and when the file is created, the process that monitors postgres server needs to be notified through the inotify API. This is to reduce the 3-4 seconds delay that exists with the current implementation in postgres. As per the current implementation, the thread checks for the existence of the file every 5 seconds. If the file got created just 1 second after the sleep, there is a wait time of 4 seconds before we know whether the file is present or not. We intend to avoid this delay by using inotify(). PostgreSQL version number you are running: postgres 9.1.5 How you installed PostgreSQL: Downloaded and compiled the sources Does anyone have suggestions on the appropriate place to add inotify_add_watch to achieve our objective? Thanks in advance.
[HACKERS] Reduce the time to know trigger_file's existence
Hi, We are trying to introduce a thread that monitors the creation of the trigger_file. As and when the file is created, the process that monitors postgres server needs to be notified through the inotify API. This is to reduce the 3-4 seconds delay that exists with the current implementation in postgres. As per the current implementation, the thread checks for the existence of the file every 5 seconds. If the file got created just 1 second after the sleep, there is a wait time of 4 seconds before we know whether the file is present or not. We intend to avoid this delay by using inotify(). PostgreSQL version number you are running: postgres 9.1.5 How you installed PostgreSQL: Downloaded and compiled the sources Does anyone have suggestions on the appropriate place to add inotify_add_watch to achieve our objective? Thanks in advance.
Re: [HACKERS] Some whitespaces in utility.c
On Tue, Sep 4, 2012 at 6:57 AM, Michael Paquier michael.paqu...@gmail.com wrote: Hi all, I found some whitespace characters in utility.c introduced by commit 3a0e4d3. Please find attached a patch fixing that which can be applied on postgres master (commit 2f0c7d5). That probably exists in many other places in the source as well, but it's certainly fairly ugly. So I see nothing wrong cleaning it up whe nspotted. Thus, patch applied, thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] txid failed epoch increment, again, aka 6291
It seems like this has reproduced once more. And once again, there doesn't appear to be any funny business in pg_control (but the structs are pasted here for your re-check), and there are successful sensical updates to it. The primary is running 9.0.6. However, we do have a new piece of data: there was a very brief period where txid_snapshot did report an xmin greater than 2^33, our next epoch boundary, by a few thousand transactions. That could be because the reporting function GetNextXidAndEpoch does its own epoch calculation before the checkpoint and then after a checkpoint that forgets to increment the epoch there is no need to add post-facto adjust the epoch anymore. I've been reviewing the mechanism in CreateCheckPoint for this on and off for a couple of days, but so far I haven't come up with a convincing mechanism. However, given that it seems historically that this bug is more likely to surface than *not* surface on this system, perhaps we can try for a sometimes-reproducing test case. I'm still struggling for a hint of a solution, though, so toss your thoughts here. $2 = {Insert = {LogwrtResult = {Write = {xlogid = 0, xrecoff = 0}, Flush = {xlogid = 0, xrecoff = 0}}, PrevRecord = {xlogid = 0, xrecoff = 0}, curridx = 0, currpage = 0x7ff4ed04a000, currpos = 0x0, RedoRecPtr = {xlogid = 18751, xrecoff = 1200832888}, forcePageWrites = 0 '\000'}, LogwrtRqst = {Write = {xlogid = 0, xrecoff = 0}, Flush = {xlogid = 0, xrecoff = 0}}, LogwrtResult = { Write = {xlogid = 0, xrecoff = 0}, Flush = {xlogid = 0, xrecoff = 0}}, ckptXidEpoch = 1, ckptXid = 9904084, asyncXactLSN = { xlogid = 0, xrecoff = 0}, lastRemovedLog = 0, lastRemovedSeg = 0, Write = {LogwrtResult = {Write = {xlogid = 0, xrecoff = 0}, Flush = {xlogid = 0, xrecoff = 0}}, curridx = 0, lastSegSwitchTime = 0}, pages = 0x7ff4ed04a000 , xlblocks = 0x7ff4ed0471d8, XLogCacheBlck = 1023, ThisTimeLineID = 0, RecoveryTargetTLI = 6, archiveCleanupCommand = '\000' repeats 1023 times, SharedRecoveryInProgress = 1 '\001', lastCheckPointRecPtr = {xlogid = 18751, xrecoff = 1671519088}, lastCheckPoint = {redo = { xlogid = 18751, xrecoff = 1200832888}, ThisTimeLineID = 6, nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti = 1119, nextMultiOffset = 3115, oldestXid = 4115479553, oldestXidDB = 1, time = 1346746796, oldestActiveXid = 9776547}, replayEndRecPtr = {xlogid = 18751, xrecoff = 1748623656}, recoveryLastRecPtr = {xlogid = 18751, xrecoff = 1748623656}, recoveryLastXTime = 400062234671833, info_lck = 0 '\000'} (gdb) p ControlFile $3 = (ControlFileData *) 0x7ff4ed046bf8 (gdb) p *ControlFile $4 = {system_identifier = 5613733157253676693, pg_control_version = 903, catalog_version_no = 201008051, state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint = {xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = { xlogid = 18751, xrecoff = 1072693824}, checkPointCopy = {redo = {xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6, nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti = 1119, nextMultiOffset = 3115, oldestXid = 4115479553, oldestXidDB = 1, time = 1346746496, oldestActiveXid = 9558248}, minRecoveryPoint = {xlogid = 18751, xrecoff = 1748623656}, backupStartPoint = {xlogid = 0, xrecoff = 0}, wal_level = 2, MaxConnections = 500, max_prepared_xacts = 500, max_locks_per_xact = 64, maxAlign = 8, floatFormat = 1234567, blcksz = 8192, relseg_size = 131072, xlog_blcksz = 8192, xlog_seg_size = 16777216, nameDataLen = 64, indexMaxKeys = 32, toast_max_chunk_size = 1996, enableIntTimes = 1 '\001', float4ByVal = 1 '\001', float8ByVal = 1 '\001', crc = 3725972657} -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor document updates
I noticed the syntax of the \copy command in the psql reference page is an old style. ISTM it's better to update the document. Please find attached a patch. Thanks, Best regards, Etsuro Fujita psql-copy-ref.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
Heikki Linnakangas hlinn...@iki.fi writes: Hmm, I was thinking that when walsender gets the position it can send the WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current recovery timeline. If it has changed, refuse to send the new WAL and terminate. That would be a fairly small change, it would just close the window between requesting walsenders to terminate and them actually terminating. It looks to me like a bug fix that also applies to non cascading situation. Is that right? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Tuesday, September 04, 2012 11:00 AM Andres Freund wrote: On Tuesday, September 04, 2012 06:20:59 AM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I can see why that would be nice, but is it really realistic? Don't we expect some more diligence in applications using this against letting such a child continue to run after ctrl-c/SIGTERMing e.g. pg_dump in comparison to closing a normal database connection? Er, what? If you kill the client, the child postgres will see connection closure and will shut down. I already tested that with the POC patch, it worked fine. Well, but that will make scripting harder because you cannot start another single backend pg_dump before the old backend noticed it, checkpointed and shut down. But isn't that behavior will be similar when currently server is shutting down due to CTRL-C, and at that time new clients will not be allowed to connect. As this new interface is an approach similar to embedded database where first API (StartServer) or at connect time it starts database and the other connection might not be allowed during shutdown state. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multiple setup steps for isolation tests
Tom Lane wrote: Kevin Grittner writes: Tom Lane wrote: The grammar changes look wrong: I think you eliminated the ability to have zero setup steps, no? Instead, setup_list should expand to either empty or setup_list setup. I tried that first, but had shift/reduce conflicts. [ scratches head ... ] Dunno what you did exactly, but the attached version works fine for me. [ slaps forhead ] Yeah, that should do it. Will apply. Thanks. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans versus serializable transactions
Tom Lane wrote: Kevin Grittner writes: By not visiting the heap page for tuples, index-only scans fail to acquire all of the necessary predicate locks for correct behavior at the serializable transaction isolation level. The tag for the tuple-level predicate locks includes the xmin, to avoid possible problems with tid re-use. (This was not covered in initial pre-release versions of SSI, and testing actually hit the problem.) When an index-only scan does need to look at the heap because the visibility map doesn't indicate that the tuple is visible to all transactions, the tuple-level predicate lock is acquired. The best we can do without visiting the heap is a page level lock on the heap page, so that is what the attached patch does. If there are no objections, I will apply to HEAD and 9.2. This isn't right in detail: there are paths through the loop where tuple is not NULL at the beginning of the next iteration (specifically, consider failure of a lossy-qual recheck). I think that only results in wasted work, but it's still not operating as intended. I'd suggest moving the declaration/initialization of the tuple variable to inside the while loop, since there's no desire for its value to carry across loops. You're right. It looks to me like moving the declaration (and initialization) to more local scape (just inside the loop) fixes it. New version attached. Will apply if no further problems are found. -Kevin index-only-serializable-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.2rc1 produces incorrect results
Hello. It took me a while to get a version of this that was independent of my data, but here it is. I don't understand what's going wrong but if you change any part of this query (or at least any part I tried), the correct result is returned. This script will reproduce it: = create table t1 (id integer primary key); create table t2 (id integer primary key references t1 (id)); insert into t1 (id) select generate_series(1, 10); -- size matters insert into t2 (id) values (1); -- get a known value in the table insert into t2 (id) select g from generate_series(2, 10) g where random() 0.01; -- size matters again analyze t1; analyze t2; with A as ( select t2.id, t2.id = 1 as is_something from t2 join t1 on t1.id = t2.id left join pg_class pg_c on pg_c.relname = t2.id::text -- I haven't tried on a user table where pg_c.oid is null ), B as ( select A.id, row_number() over (partition by A.id) as order -- this seems to be important, too from A ) select A.id, array(select B.id from B where B.id = A.id) from A where A.is_something union all select A.id, array(select B.id from B where B.id = A.id) from A where A.is_something; = As you can (hopefully) see, the two UNIONed queries are identical but do not return the same values. I wish I had the skills to attach a patch to this message, but alas I do not.
Re: [HACKERS] [COMMITTERS] pgsql: Make a cut at a major-features list for 9.2.
On Mon, Sep 3, 2012 at 6:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Wed, Aug 22, 2012 at 11:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Make a cut at a major-features list for 9.2. This is open to debate of course, but it's past time we had *something* here. We have cascading replication as a major feature, which I agree it is. But I think we should add the (closely related) base backups from slave to the list as well - I've had more people say they're really going to use that one than the cascading replication. Perhaps add them both on the same bullet point? Sure, reword it however you want. I've not been paying much attention to replication stuff, so I don't really know what's important there. Done. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump incorrect output in plaintext mode
On Fri, Aug 31, 2012 at 2:05 PM, Magnus Hagander mag...@hagander.net wrote: On Aug 28, 2012 9:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Tue, Aug 28, 2012 at 6:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't see anything particularly incorrect about that. The point of the --verbose switch is to track what pg_dump is doing, and if what it's doing involves going through RestoreArchive(), why should we try to hide the fact? restoring data for table 't' makes you think it's actuall restoring things. It's not. That dumping is implemented by calling an internal function called RestoreArchive() has to be an implementation detail... It certainly confuses users that we say restoring when we're not doing that... Well, why don't we just s/restoring/processing/ in the debug message, and call it good? Sure, that would work for me... I can go do that if there are no objections. Done. Are we allowed to backpatch things to 9.2 at this point that changes strings for translators? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade docs
This doc sentence about pg_upgrade is now inaccurate: If doing option--check/ with a running old server of a pre-9.1 version, and the old server is using a Unix-domain socket directory that is different from the default built into the new productnamePostgreSQL/ installation, set envarPGHOST/ to point to the socket location of the old server. (This is not relevant on Windows.) The new detail is that this also affects non-live check and non-check upgrades because pg_ctl -w doesn't work for pre-9.1 servers with the socket in the current directory --- that was not known when this documentation paragraph was written. Applied doc patch attached. The wording became pretty complex so I tried to simplify it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml new file mode 100644 index 9e43f3c..301222c *** a/doc/src/sgml/pgupgrade.sgml --- b/doc/src/sgml/pgupgrade.sgml *** psql --username postgres --file script.s *** 520,530 /para para !If doing option--check/ with a running old server of a pre-9.1 version, !and the old server is using a Unix-domain socket directory that is !different from the default built into the new productnamePostgreSQL/ !installation, set envarPGHOST/ to point to the socket location of the !old server. (This is not relevant on Windows.) /para para --- 520,529 /para para !If using a pre-9.1 old server that is using a non-default Unix-domain !socket directory or a default that differs from the default of the !new cluster, set envarPGHOST/ to point to the old server's socket !location. (This is not relevant on Windows.) /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote: On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote: This patch is based on below Todo Item: Consider adding buffers the background writer finds reusable to the free list I have tried implementing it and taken the readings for Select when all the data is in either OS buffers or Shared Buffers. The Patch has simple implementation for bgwriter or checkpoint process moving the unused buffers (unpinned with ZERO usage_count buffers) into freelist. I don't think InvalidateBuffer can be safely used in this way. It says We assume that no other backend could possibly be interested in using the page, which is not true here. As I understood and anlyzed based on above, that there is problem in attached patch such that in function InvalidateBuffer(), after UnlockBufHdr() and before PartitionLock if some backend uses that buffer and increase the usage count to 1, still InvalidateBuffer() will remove the buffer from hash table and put it in Freelist. I have modified the code to address above by checking refcount usage_count inside Partition Lock , LockBufHdr and only after that move it to freelist which is similar to InvalidateBuffer. In actual code we can optimize the current code by using extra parameter in InvalidateBuffer. Please let me know if I understood you correctly or you want to say something else by above comment? Also, do we want to actually invalidate the buffers? If someone does happen to want one after it is put on the freelist, making it read it in again into a different buffer doesn't seem like a nice thing to do, rather than just letting it reclaim it. But even if bgwriter/checkpoint don't do, Backend needing new buffer will do similar things (remove from hash table) for this buffer as this is nextvictim buffer. The main intention of doing the MoveBufferToFreeList is to avoid contention of Partition Locks and BufFreeListLock among backends, which has given Performance improvement in high contention scenarios. One problem I could see with proposed change is that in some cases the usage count will get decrement for a buffer allocated from free list immediately as it can be nextvictimbuffer. However there can be solution to this problem. Can you suggest some scenario's where I should do more performance test? With Regards, Amit Kapila.diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dba19eb..87446cb 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -955,6 +955,88 @@ retry: StrategyFreeBuffer(buf); } + +/* + * MoveBufferToFreeList -- mark a shared buffer invalid and return it to the + * freelist. which is similar to InvalidateBuffer function. + */ +static void +MoveBufferToFreeList(volatile BufferDesc *buf) +{ + BufferTag oldTag; + uint32 oldHash;/* hash value for oldTag */ + LWLockIdoldPartitionLock; /* buffer partition lock for it */ + BufFlagsoldFlags; + + /* Save the original buffer tag before dropping the spinlock */ + oldTag = buf-tag; + + UnlockBufHdr(buf); + + /* +* Need to compute the old tag's hashcode and partition lock ID. XXX is it +* worth storing the hashcode in BufferDesc so we need not recompute it +* here? Probably not. +*/ + oldHash = BufTableHashCode(oldTag); + oldPartitionLock = BufMappingPartitionLock(oldHash); + + + /* +* Acquire exclusive mapping lock in preparation for changing the buffer's +* association. +*/ + LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); + + /* Re-lock the buffer header */ + LockBufHdr(buf); + + /* If it's changed while we were waiting for lock, do nothing */ + if (!BUFFERTAGS_EQUAL(buf-tag, oldTag)) + { + UnlockBufHdr(buf); + LWLockRelease(oldPartitionLock); + return; + } + + /* +* Validate wheather we can add the buffer into freelist or not +*/ + if ((buf-refcount != 0) || (buf-usage_count != 0)) + { + UnlockBufHdr(buf); + LWLockRelease(oldPartitionLock); + return; + } + + /* +* Clear out the buffer's tag and flags. We must do this to ensure that +* linear scans of the buffer array don't think the buffer is valid. +*/ + oldFlags = buf-flags; + CLEAR_BUFFERTAG(buf-tag); + buf-flags = 0; + buf-usage_count = 0; + + UnlockBufHdr(buf); + + /* +* Remove the buffer from the lookup hashtable, if it was in there. +*/ + if (oldFlags BM_TAG_VALID) + BufTableDelete(oldTag, oldHash); + + /* +* Done with mapping lock. +*/ +
Re: [HACKERS] Statistics and selectivity estimation for ranges
On Mon, Aug 27, 2012 at 5:00 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 24.08.2012 18:51, Heikki Linnakangas wrote: On 20.08.2012 00:31, Alexander Korotkov wrote: New version of patch. * Collect new stakind STATISTIC_KIND_BOUNDS_**HISTOGRAM, which is lower and upper bounds histograms combined into single ranges array, instead of STATISTIC_KIND_HISTOGRAM. One worry I have about that format for the histogram is that you deserialize all the values in the histogram, before you do the binary searches. That seems expensive if stats target is very high. I guess you could deserialize them lazily to alleviate that, though. * Selectivity estimations for,=,,= using this histogram. Thanks! I'm going to do the same for this that I did for the sp-gist patch, and punt on the more complicated parts for now, and review them separately. Attached is a heavily edited version that doesn't include the length histogram, and consequently doesn't do anything smart for the and operators. is estimated using the bounds histograms. There's now a separate stakind for the empty range fraction, since it's not included in the length-histogram. I tested this on a dataset containing birth and death dates of persons that have a wikipedia page, obtained from the dbpedia.org project. I can send a copy if someone wants it. The estimates seem pretty accurate. Please take a look, to see if I messed up something. Committed this with some further changes. Addon patch is attached. Actually, I don't get your intention of introducing STATISTIC_KIND_RANGE_EMPTY_FRAC stakind. Did you plan to leave it as empty frac in distinct stakind or replace this stakind with STATISTIC_KIND_LENGTH_HISTOGRAM? In the attached patch STATISTIC_KIND_RANGE_EMPTY_FRAC is replaced with STATISTIC_KIND_LENGTH_HISTOGRAM. -- With best regards, Alexander Korotkov. range_stat-addon-0.1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis pg...@j-davis.com wrote: I am taking a look at this patch now. A few quick comments: * It looks like bounds_adjacent modifies it's by-reference arguments, which is a little worrying to me. The lower/upper labels are flipped back, but the inclusivities are not. Maybe just pass by value instead? * Bounds_adjacent is sensitive to the argument order. Can't it just take bound1 and bound2? Fixed. Patch is attached. -- With best regards, Alexander Korotkov. range_spgist_adjacent-0.2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 pg_upgrade regression tests on WIndows
On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote: The attached very small patch allows pg_upgrade's make check to succeed on REL9_2_STABLE on my Mingw system. However, I consider the issue I mentioned earlier regarding use of forward slashes in the argument to rmdir to be a significant blocker, so I'm going to go and fix that and then pull this all together. cheers andrew diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c index 6f993df..57ca1df 100644 --- a/contrib/pg_upgrade/exec.c +++ b/contrib/pg_upgrade/exec.c @@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file, else retval = 0; +#ifndef WIN32 if ((log = fopen_priv(log_file, a+)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); fprintf(log, \n\n); fclose(log); +#endif return retval; } I am confused by this fix. If pg_ctl was keeping that log file open, wouldn't the log write fail when pg_dump or psql was run later? I am trying to understand how a later commands would not also trigger an error. Is it a timing thing? If that is it, I would like to know and have that documented. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump incorrect output in plaintext mode
On Tue, Sep 4, 2012 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Aug 28, 2012 9:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, why don't we just s/restoring/processing/ in the debug message, and call it good? Are we allowed to backpatch things to 9.2 at this point that changes strings for translators? Well, not being a translator I'm not sure that I get a vote. But I'd think this is too minor to justify back-patching it. Probably - that's why I didn't even consider going back beyond 9.2. Anyway; I'll just leave it at master. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump incorrect output in plaintext mode
Magnus Hagander mag...@hagander.net writes: On Aug 28, 2012 9:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, why don't we just s/restoring/processing/ in the debug message, and call it good? Are we allowed to backpatch things to 9.2 at this point that changes strings for translators? Well, not being a translator I'm not sure that I get a vote. But I'd think this is too minor to justify back-patching it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some whitespaces in utility.c
Magnus Hagander mag...@hagander.net writes: On Tue, Sep 4, 2012 at 6:57 AM, Michael Paquier michael.paqu...@gmail.com wrote: I found some whitespace characters in utility.c introduced by commit 3a0e4d3. Please find attached a patch fixing that which can be applied on postgres master (commit 2f0c7d5). That probably exists in many other places in the source as well, but it's certainly fairly ugly. So I see nothing wrong cleaning it up whe nspotted. Just as a note: we generally leave it to pgindent to fix this sort of thing. I'm not sure it's worth the effort of submitting manual patches for, unless you have reason to think the next pgindent run won't fix it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
Excerpts from Andrew Dunstan's message of mar sep 04 01:16:39 -0300 2012: And here's the first Windows buildfarm check of pg_upgrade. http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=pittadt=2012-09-04%2003%3A00%3A05stg=check-pg_upgrade Great, thanks. Who's going to work now on porting the shell script to Perl? ;-) Somehow the verbose reporting of user relation files being copied does not seem exceedingly useful; and I don't remember seeing that on Linux. Should this be tweaked to avoid outputting the status message? c:\mingw\msys\1.0\home\pgrunner\bf\root\HEAD\pgsql.7020\contrib\pg_upgradeecho ECHO is on. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On Tue, Sep 4, 2012 at 11:42:58AM -0300, Alvaro Herrera wrote: Excerpts from Andrew Dunstan's message of mar sep 04 01:16:39 -0300 2012: And here's the first Windows buildfarm check of pg_upgrade. http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=pittadt=2012-09-04%2003%3A00%3A05stg=check-pg_upgrade Great, thanks. Who's going to work now on porting the shell script to Perl? ;-) Well, we require Perl for development, but not for usage, at least not yet. There was talk of needing Perl for doing standby pg_upgrade, but there were too many concerns about that idea. Somehow the verbose reporting of user relation files being copied does not seem exceedingly useful; and I don't remember seeing that on Linux. Should this be tweaked to avoid outputting the status message? c:\mingw\msys\1.0\home\pgrunner\bf\root\HEAD\pgsql.7020\contrib\pg_upgradeecho ECHO is on. Probably. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2rc1 produces incorrect results
Vik Reykja vikrey...@gmail.com writes: Hello. It took me a while to get a version of this that was independent of my data, but here it is. I don't understand what's going wrong but if you change any part of this query (or at least any part I tried), the correct result is returned. Huh. 9.1 gets the wrong answer too, so this isn't a (very) new bug; but curiously, 8.4 and 9.0 seem to get it right. I think this is probably related somehow to Adam Mackler's recent report --- multiple scans of the same CTE seems to be a bit of a soft spot :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On 09/04/2012 10:42 AM, Alvaro Herrera wrote: Excerpts from Andrew Dunstan's message of mar sep 04 01:16:39 -0300 2012: And here's the first Windows buildfarm check of pg_upgrade. http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=pittadt=2012-09-04%2003%3A00%3A05stg=check-pg_upgrade Great, thanks. Who's going to work now on porting the shell script to Perl? ;-) Probably me, one day ... Somehow the verbose reporting of user relation files being copied does not seem exceedingly useful; and I don't remember seeing that on Linux. Yes, it's a pain. Not sure what causes it. Should this be tweaked to avoid outputting the status message? c:\mingw\msys\1.0\home\pgrunner\bf\root\HEAD\pgsql.7020\contrib\pg_upgradeecho ECHO is on. Already fixed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] txid failed epoch increment, again, aka 6291
On Tue, Sep 04, 2012 at 02:07:30AM -0700, Daniel Farina wrote: It seems like this has reproduced once more. And once again, there doesn't appear to be any funny business in pg_control (but the structs are pasted here for your re-check), and there are successful sensical updates to it. The primary is running 9.0.6. What version is the standby running? However, we do have a new piece of data: there was a very brief period where txid_snapshot did report an xmin greater than 2^33, our next epoch boundary, by a few thousand transactions. That could be because the reporting function GetNextXidAndEpoch does its own epoch calculation before the checkpoint and then after a checkpoint that forgets to increment the epoch there is no need to add post-facto adjust the epoch anymore. That makes sense. I've been reviewing the mechanism in CreateCheckPoint for this on and off for a couple of days, but so far I haven't come up with a convincing mechanism. However, given that it seems historically that this bug is more likely to surface than *not* surface on this system, perhaps we can try for a sometimes-reproducing test case. I'm still struggling for a hint of a solution, though, so toss your thoughts here. The cause is not apparent to me, either, and simple tests here do not reproduce the problem. I am suspicious of xlog_redo() updating ControlData-checkPointCopy without acquiring ControlFileLock. If a restartpoint is finishing at about the same time, ControlFile-checkPointCopy.nextXid might come from the just-read checkpoint record while ControlFile-checkPointCopy.nextXidEpoch bears the value from the older checkpoint just adopted as a restartpoint. The resulting inconsistency would, however, vanish at the next ControlFile-checkPointCopy update. This does not explain the symptoms you have reported, and it cannot explain much of anything on a primary. (gdb) p *ControlFile $4 = {system_identifier = 5613733157253676693, pg_control_version = 903, catalog_version_no = 201008051, state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint = This capture, it seems, is from a standby. {xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = { xlogid = 18751, xrecoff = 1072693824}, checkPointCopy = {redo = {xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6, nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti = 1119, nextMultiOffset = 3115, oldestXid = 4115479553, oldestXidDB = 1, time = 1346746496, oldestActiveXid = 9558248}, You expected checkPointCopy = { ... nextXidEpoch = 2, ... }, correct? (Well, nextXidEpoch = 3 if you count the previous missed increment.) Does pg_control on the primary also bear epoch 1 where epoch 2 is expected? Your last restartpoint was exactly five minutes before your last checkpoint, so there's no evidence of a dearth of safe restartpoint opportunities. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Also, I am a bit doubtful about the advice on sizing the connection pool as applied to small servers: surely it's not sane to recommend that a single-processor system with one disk should have max_connections = 3. At least, *I* don't think that's sane. I'm not sure it's wrong when combined with this: Remember that this sweet spot is for the number of connections that are actively doing work. ... You should always make max_connections a bit bigger than the number of connections you enable in your connection pool. That way there are always a few slots available for direct connections for system maintenance and monitoring. Where would you expect the knee to be for connections concurrently actively doing work on a single-core, single-drive system ? I don't know. But my experience with our customers is that people are often forced to set the size of the connection pool far larger than what that formula would suggest. Many people are doing transaction-level pooling, and for those people, they've got to look at how many multi-statement transactions they've got and think about what the peak value for that quantity is. It's still worth using pooling because it reduces the number of simultaneous connections, but it's not going to reduce it to the kind of values you're talking about. Also, consider that transactions aren't all the same length. Suppose 90% of your queries execute in 50ms, and 10% execute in 60s. Even though it's less efficient, you've got to set the connection pool large enough that at least some of the 50 ms queries can continue to get processed even if the maximum number of 60s queries that you ever expect to see in parallel are already running. This may seem like a theoretical problem but we have customers who use connection pools to get the number of simultaneous connections down to, say, 800. I guarantee you that these people do not have 200 CPUs and 400 disks, but they're smart people and they find that smaller pool sizes don't work. It is something which has to be considered, and I don't think it's theoretical at all. Here's how we deal with it. We don't use a plain FIFO queue for our transaction requests, but a prioritized FIFO with 10 levels of priority (0 to 9). The highest priority (9) is reserved for utility requests -- where a running transaction needs to spin off a related transaction to do some work for it. For the lowest level (0) we normally allocate only a single connection, and it is used for very long-running reports which we want to queue to run one-at-a-time. As examples of how we categorize queries, filling a large list in an interactive application will run at priority 3, while translating a key which must cause a description on the screen to display is run at priority 8. Normal single-row updates and deletes from an interactive application run at priority 5. Each connection in the pool has a worker thread, and is assigned a minimum priority that it will handle. When all threads are busy and transaction requests are queued, any thread completing a database transaction pulls from the front of the highest priority queue with a waiting request to run a transaction, looking only at priorities which are not beneath it. If there are no waiting requests of high enough priority, the thread waits for one to arrive. We have found that the formula I presented, when combined with transactional request queuing like I describe here gives us our best performance. I don't have the exact numbers in front of me at the moment, but on a machine with 16 cores and a 40-drive array (but heavily cached, so that the effective spindle count was lower than that), servicing thousands of concurrent web users with hundreds of tps, we improved performance significantly by dropping our connection pool size from about 60 to about 30, in addition to the separate pool of six which are handling logical replication from about 80 sources. That was a real-life production situation, but we ran a series of benchmarks and found that in a pretty wide spectrum of situations the formula I gave fits pretty neatly. If someone is using 800 connections for, say, a 32 core machine with a 200 drive array I would suspect that they would get a lot of benefit from a smarter connection pool. Sure, we can say, well, the fine print tells you that 2*CPUs+disks is not REALLY the formula you should use, but it's just so far off what I see in the field that I have a hard time thinking it's really helping people to give them that as a starting point. The point is that it *is* generally really close to the numbers we have seen here in both benchmarks and production, and I have gotten comments both on and off the lists from people who have told me that they tried that formula against their benchmark results and found that it fit well. Now, this may be dependent on OS or
Re: [HACKERS] 9.2 pg_upgrade regression tests on WIndows
On 09/04/2012 09:47 AM, Bruce Momjian wrote: On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote: The attached very small patch allows pg_upgrade's make check to succeed on REL9_2_STABLE on my Mingw system. However, I consider the issue I mentioned earlier regarding use of forward slashes in the argument to rmdir to be a significant blocker, so I'm going to go and fix that and then pull this all together. cheers andrew diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c index 6f993df..57ca1df 100644 --- a/contrib/pg_upgrade/exec.c +++ b/contrib/pg_upgrade/exec.c @@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file, else retval = 0; +#ifndef WIN32 if ((log = fopen_priv(log_file, a+)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); fprintf(log, \n\n); fclose(log); +#endif return retval; } I am confused by this fix. If pg_ctl was keeping that log file open, wouldn't the log write fail when pg_dump or psql was run later? I am trying to understand how a later commands would not also trigger an error. Is it a timing thing? If that is it, I would like to know and have that documented. Oh, hmm. I thought it was the postmaster holding the log, but now I see that we are giving it a different log file. Maybe it is a timing thing. I'll experiment and see if a sleep cures the problem. ... Nope, still getting this after a sleep(5): cannot write to log file pg_upgrade_server_start.log Failure, exiting ... [try some more] Nope, even in a loop lasting 60s I still got this. So I'm a bit confused too. Seeing if I can narrow it down using ProcMon ... cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
Bruce Momjian br...@momjian.us writes: On Tue, Sep 4, 2012 at 11:42:58AM -0300, Alvaro Herrera wrote: Who's going to work now on porting the shell script to Perl? ;-) Well, we require Perl for development, but not for usage, at least not yet. This is a regression-test script, so that complaint doesn't seem to me to have a lot of force ... especially not when set against the fact that the shell script is useless on non-mingw Windows. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On Tue, Sep 4, 2012 at 11:12:52AM -0400, Andrew Dunstan wrote: On 09/04/2012 10:49 AM, Bruce Momjian wrote: On Tue, Sep 4, 2012 at 11:42:58AM -0300, Alvaro Herrera wrote: Excerpts from Andrew Dunstan's message of mar sep 04 01:16:39 -0300 2012: And here's the first Windows buildfarm check of pg_upgrade. http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=pittadt=2012-09-04%2003%3A00%3A05stg=check-pg_upgrade Great, thanks. Who's going to work now on porting the shell script to Perl? ;-) Well, we require Perl for development, but not for usage, at least not yet. There was talk of needing Perl for doing standby pg_upgrade, but there were too many concerns about that idea. This is a test script, not what you should use in production. I don't see any reason why we shouldn't require Perl for running the standard test. Oh, I thought he was talking about the scripts pg_upgrade creates for users to run. Sorry. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
Andrew Dunstan and...@dunslane.net writes: This is a test script, not what you should use in production. I don't see any reason why we shouldn't require Perl for running the standard test. But on the third hand ... we've taken pains to ensure that you don't *have* to have Perl to build from a tarball, and I think it is not unreasonable that build should include being able to do make check. Maybe we have to carry both this shell script and a Perl equivalent for Windows. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On 09/04/2012 10:49 AM, Bruce Momjian wrote: On Tue, Sep 4, 2012 at 11:42:58AM -0300, Alvaro Herrera wrote: Excerpts from Andrew Dunstan's message of mar sep 04 01:16:39 -0300 2012: And here's the first Windows buildfarm check of pg_upgrade. http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=pittadt=2012-09-04%2003%3A00%3A05stg=check-pg_upgrade Great, thanks. Who's going to work now on porting the shell script to Perl? ;-) Well, we require Perl for development, but not for usage, at least not yet. There was talk of needing Perl for doing standby pg_upgrade, but there were too many concerns about that idea. This is a test script, not what you should use in production. I don't see any reason why we shouldn't require Perl for running the standard test. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
On Mon, Aug 20, 2012 at 9:13 PM, Alexander Korotkov aekorot...@gmail.comwrote: Current gistchoose code has a bug. I've started separate thread about it. http://archives.postgresql.org/pgsql-hackers/2012-08/msg00544.php Also, it obviously needs more comments. Current state of patch is more proof of concept than something ready. I'm going to change it in following ways: 1) We don't know how expensive user penalty function is. So, I'm going to change randomization algorithm so that it doesn't increase number of penalty calls in average. 2) Since, randomization could produce additional IO, there are probably no optimal solution for all the cases. We could introduce user-visible option which enables or disables randomization. However, default value of this option is another question. Also, I think you should use random() rather than rand(). Thanks, will fix. New version of patch is attached. Parameter randomization was introduced. It controls whether to randomize choose. Choose algorithm was rewritten. -- With best regards, Alexander Korotkov. gist_choose_bloat-0.2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On 09/04/2012 11:21 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: This is a test script, not what you should use in production. I don't see any reason why we shouldn't require Perl for running the standard test. But on the third hand ... we've taken pains to ensure that you don't *have* to have Perl to build from a tarball, and I think it is not unreasonable that build should include being able to do make check. Maybe we have to carry both this shell script and a Perl equivalent for Windows. Yeah. I think it will just be another target in vcregress.pl cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] txid failed epoch increment, again, aka 6291
On Tue, Sep 4, 2012 at 8:04 AM, Noah Misch n...@leadboat.com wrote: On Tue, Sep 04, 2012 at 02:07:30AM -0700, Daniel Farina wrote: It seems like this has reproduced once more. And once again, there doesn't appear to be any funny business in pg_control (but the structs are pasted here for your re-check), and there are successful sensical updates to it. The primary is running 9.0.6. What version is the standby running? We have several, all of which so far agree on the same txid, presumably syndicated from the primary: one is 9.0.7, another is 9.0.8. Incidentally I am also currently preparing a 9.0.9 one that I was going to use for some forensics. However, we do have a new piece of data: there was a very brief period where txid_snapshot did report an xmin greater than 2^33, our next epoch boundary, by a few thousand transactions. That could be because the reporting function GetNextXidAndEpoch does its own epoch calculation before the checkpoint and then after a checkpoint that forgets to increment the epoch there is no need to add post-facto adjust the epoch anymore. That makes sense. I've been reviewing the mechanism in CreateCheckPoint for this on and off for a couple of days, but so far I haven't come up with a convincing mechanism. However, given that it seems historically that this bug is more likely to surface than *not* surface on this system, perhaps we can try for a sometimes-reproducing test case. I'm still struggling for a hint of a solution, though, so toss your thoughts here. The cause is not apparent to me, either, and simple tests here do not reproduce the problem. I am suspicious of xlog_redo() updating ControlData-checkPointCopy without acquiring ControlFileLock. If a restartpoint is finishing at about the same time, ControlFile-checkPointCopy.nextXid might come from the just-read checkpoint record while ControlFile-checkPointCopy.nextXidEpoch bears the value from the older checkpoint just adopted as a restartpoint. The resulting inconsistency would, however, vanish at the next ControlFile-checkPointCopy update. This does not explain the symptoms you have reported, and it cannot explain much of anything on a primary. (gdb) p *ControlFile $4 = {system_identifier = 5613733157253676693, pg_control_version = 903, catalog_version_no = 201008051, state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint = This capture, it seems, is from a standby. Yeah. I can probably also get one from the primary. {xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = { xlogid = 18751, xrecoff = 1072693824}, checkPointCopy = {redo = {xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6, nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti = 1119, nextMultiOffset = 3115, oldestXid = 4115479553, oldestXidDB = 1, time = 1346746496, oldestActiveXid = 9558248}, You expected checkPointCopy = { ... nextXidEpoch = 2, ... }, correct? (Well, nextXidEpoch = 3 if you count the previous missed increment.) Does pg_control on the primary also bear epoch 1 where epoch 2 is expected? Well, in the aggregate life of this thing as we know it we probably expect more like 4, or even 5. It bears 1 where 2 is expected, though, to answer the original question. Your last restartpoint was exactly five minutes before your last checkpoint, so there's no evidence of a dearth of safe restartpoint opportunities. nm I might try to find the segments leading up to the overflow point and try xlogdumping them to see what we can see. If there's anything to note about the workload, I'd say that it does tend to make fairly pervasive use of long running transactions which can span probably more than one checkpoint, and the txid reporting functions, and a concurrency level of about 300 or so backends ... but per my reading of the mechanism so far, it doesn't seem like any of this should matter. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Yet another issue with pg_upgrade vs unix_socket_directories
On 9/3/12 5:00 PM, Tom Lane wrote: I went back for another try at building the Fedora packages with 9.2 branch tip ... and it still failed at pg_upgrade's make check. The reason for this is that test.sh starts a couple of random postmasters, and those postmasters expect to put their sockets in the configured default location (which is now /var/run/postgresql on Fedora), and that's not there in a minimal build environment. And if it's there, it might not be writable. I hacked it up with the attached quick-and-dirty patch, but I wonder if anyone's got a better idea. Yeah, I have resorted to putting something like export PGHOST=/tmp in all my test scripts, because the above-mentioned issues have affected Debian for a long time. Welcome to the party. ;-) It might actually be useful if the postmaster accepted PGHOST as the default value for the -k option, just like it accepts PGPORT. Then this type setup will become much easier because clients and servers will use the same defaults. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Yet another issue with pg_upgrade vs unix_socket_directories
On Tue, Sep 4, 2012 at 01:44:59PM -0400, Peter Eisentraut wrote: On 9/3/12 5:00 PM, Tom Lane wrote: I went back for another try at building the Fedora packages with 9.2 branch tip ... and it still failed at pg_upgrade's make check. The reason for this is that test.sh starts a couple of random postmasters, and those postmasters expect to put their sockets in the configured default location (which is now /var/run/postgresql on Fedora), and that's not there in a minimal build environment. And if it's there, it might not be writable. I hacked it up with the attached quick-and-dirty patch, but I wonder if anyone's got a better idea. Yeah, I have resorted to putting something like export PGHOST=/tmp in all my test scripts, because the above-mentioned issues have affected Debian for a long time. Welcome to the party. ;-) It might actually be useful if the postmaster accepted PGHOST as the default value for the -k option, just like it accepts PGPORT. Then this type setup will become much easier because clients and servers will use the same defaults. Interesting idea, but PGPORT controls both the tcp and unix domain socket connections. Wouldn't PGHOST just control just unix domain? Is that logical? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Yet another issue with pg_upgrade vs unix_socket_directories
Peter Eisentraut pete...@gmx.net writes: Yeah, I have resorted to putting something like export PGHOST=/tmp in all my test scripts, because the above-mentioned issues have affected Debian for a long time. Welcome to the party. ;-) Yeah, my current patch for Fedora does exactly that in pg_regress, and has it force the test postmaster's unix_socket_directory as well. The problem with pg_upgrade's shell script is that it's not going through pg_regress: it launches some test postmasters directly, and also fires up psql etc directly. So it needs its own fix for this. It might actually be useful if the postmaster accepted PGHOST as the default value for the -k option, just like it accepts PGPORT. Then this type setup will become much easier because clients and servers will use the same defaults. Cute idea, but it'll fall down rather badly if PGHOST is a hostname... There's no time to redesign this stuff for 9.2, but now that I've had some exposure to the testing difficulties created by a nonstandard default socket directory, I'm more interested in trying to fix these issues in core. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On 09/04/2012 10:42 AM, Alvaro Herrera wrote: Somehow the verbose reporting of user relation files being copied does not seem exceedingly useful; and I don't remember seeing that on Linux. Yeah, and it does something odd anyway when it's not writing to a terminal. Can we get rid of it, or make it only work in verbose mode? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2rc1 produces incorrect results
I wrote: Vik Reykja vikrey...@gmail.com writes: Hello. It took me a while to get a version of this that was independent of my data, but here it is. I don't understand what's going wrong but if you change any part of this query (or at least any part I tried), the correct result is returned. Huh. 9.1 gets the wrong answer too, so this isn't a (very) new bug; but curiously, 8.4 and 9.0 seem to get it right. I think this is probably related somehow to Adam Mackler's recent report --- multiple scans of the same CTE seems to be a bit of a soft spot :-( No, I'm mistaken: it's a planner bug. The plan looks like this: QUERY PLAN --- Result (cost=281.29..290.80 rows=20 width=36) CTE a - Nested Loop (cost=126.96..280.17 rows=19 width=4) - Merge Right Join (cost=126.96..220.17 rows=19 width=4) Merge Cond: (((pg_c.relname)::text) = ((t2.id)::text)) Filter: (pg_c.oid IS NULL) - Sort (cost=61.86..63.72 rows=743 width=68) Sort Key: ((pg_c.relname)::text) - Seq Scan on pg_class pg_c (cost=0.00..26.43 rows=743 width=68) - Sort (cost=65.10..67.61 rows=1004 width=4) Sort Key: ((t2.id)::text) - Seq Scan on t2 (cost=0.00..15.04 rows=1004 width=4) - Index Scan using t1_pkey on t1 (cost=0.00..3.14 rows=1 width=4) Index Cond: (id = t2.id***) CTE b - WindowAgg (cost=0.78..1.12 rows=19 width=4) - Sort (cost=0.78..0.83 rows=19 width=4) Sort Key: a.id - CTE Scan on a (cost=0.00..0.38 rows=19 width=4) - Append (cost=0.00..9.51 rows=20 width=36) - CTE Scan on a (cost=0.00..4.66 rows=10 width=4) Filter: is_something SubPlan 3 - CTE Scan on b (cost=0.00..0.43 rows=1 width=4) Filter: (id = a.id***) - CTE Scan on a (cost=0.00..4.66 rows=10 width=4) Filter: is_something SubPlan 4 - CTE Scan on b (cost=0.00..0.43 rows=1 width=4) Filter: (id = a.id***) (30 rows) The planner is assigning a single PARAM_EXEC slot for the parameter passed into the inner indexscan in CTE a and for the parameters passed into the two SubPlans that scan CTE b (the items I marked with *** above). This is safe enough so far as the two SubPlans are concerned, because they don't execute concurrently --- but when SubPlan 3 is first fired, it causes the remainder of CTE a to be computed, so that the nestloop gets iterated some more times, and that overwrites the value of a.id that already got passed down into SubPlan 3. The reason this is so hard to replicate is that the PARAM_EXEC slot can only get re-used for identical-looking Vars (same varno, varlevelsup, vartype, etc) --- so even granted that you've got the right shape of plan, minor unrelated changes in the query can stop the aliasing from occurring. Also, inner indexscans weren't using the PARAM_EXEC mechanism until 9.1, so that's why the example doesn't fail before 9.1. I've always been a bit suspicious of the theory espoused in replace_outer_var that aliasing different Params is OK: * NOTE: in sufficiently complex querytrees, it is possible for the same * varno/abslevel to refer to different RTEs in different parts of the * parsetree, so that different fields might end up sharing the same Param * number. As long as we check the vartype/typmod as well, I believe that * this sort of aliasing will cause no trouble. The correct field should * get stored into the Param slot at execution in each part of the tree. but I've never seen a provably wrong case before. Most likely, this has been broken since we introduced CTEs in 8.4: it's the decoupled timing of execution of main query and CTE that's needed to allow execution of different parts of the plan tree to overlap and thus create the risk. (I get the impression that only recently have people been writing really complex CTE queries, since we found another fundamental bug with them just recently.) I think probably the best fix is to rejigger things so that Params assigned by different executions of SS_replace_correlation_vars and createplan.c can't share PARAM_EXEC numbers. This will result in rather larger ecxt_param_exec_vals arrays at runtime, but the array entries aren't very large, so I don't think it'll matter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] XLogReader v2
Excerpts from Andres Freund's message of jue jul 19 06:29:03 -0400 2012: Hi, Attached is v2 of the patch. Hello, I gave this code a quick read some days ago. Here's the stuff I would change: * There are way too many #ifdef VERBOSE_DEBUG stuff for my taste. It might look better if you had macros such as elog_debug() that are defined to empty if VERBOSE_DEBUG is not defined. (The problem with such an approach is that you have to get into the business of creating one macro for each different param count, so elog_debug1(), elog_debug2() and so on. It also means you have to count the number of args in each call to ensure you're calling the right one.) * In the code beautification front, there are a number of cuddled braces and improperly indented function declarations. * I noticed that you have the IDENTIFICATION tag wrong in both .c and .h files: evidently you renamed the files from readxlog.[ch] to xlogreader. * There are a few elog(PANIC) calls. I am not sure that's a very good idea. It seems to me that you should be using elog(FATAL) there instead ... or do you really want to make the whole server crash? OTOH if we want to make it a true client program, all those elog() calls need to go. * XLogReaderRead() seems a bit too long to me. I would split it with auxiliary functions -- say read a header and read a record. (I mentioned this to Andres on IM and he says he tried that but couldn't find any nice way to do it. I may still try to do it.) * xlogdump's Makefile trick to get all backend object files is ... ugly (an understatement). Really we need the *_desc() routines split so that it can use only those functions, and have a client-side replacement for StringInfo (discussed elsewhere) and some auxilliary functions such as relpathbackend() so that it can compile like a normal client. * why do we pass timeline_id to xlogdump? I don't see that it's used anywhere, but maybe I'm missing something? This is not a full review. After a new version with these fixes is published (either by Andres or myself) some more review might find more serious issues -- I didn't hunt for architectural problems in XLogReader. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade diffs on WIndows
On 09/04/2012 03:09 PM, Andrew Dunstan wrote: I realized this morning that I might have been a bit cavalier in using dos2unix to smooth away differences in the dumpfiles produced by pg_upgrade. Attached is a dump of the diff if this isn't done, with Carriage Returns printed as '*' to make them visible. As can be seen, in function bodies dump2 has the Carriage Returns doubled. I have not had time to delve into how this comes about, and I need to attend to some income-producing activity for a bit, but I'd like to get it cleaned up ASAP. We are under the hammer for 9.2, so any help other people can give on this would be appreciated. Actually, I have the answer - it's quite simple. We just need to open the output files in binary mode when we split the dumpall file. The attached patch fixes it. I think we should backpatch the first part to 9.0. cheers andrew diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c index b905ab0..0a96dde 100644 --- a/contrib/pg_upgrade/dump.c +++ b/contrib/pg_upgrade/dump.c @@ -62,10 +62,10 @@ split_old_dump(void) if ((all_dump = fopen(filename, r)) == NULL) pg_log(PG_FATAL, Could not open dump file \%s\: %s\n, filename, getErrorText(errno)); snprintf(filename, sizeof(filename), %s, GLOBALS_DUMP_FILE); - if ((globals_dump = fopen_priv(filename, w)) == NULL) + if ((globals_dump = fopen_priv(filename, PG_BINARY_W)) == NULL) pg_log(PG_FATAL, Could not write to dump file \%s\: %s\n, filename, getErrorText(errno)); snprintf(filename, sizeof(filename), %s, DB_DUMP_FILE); - if ((db_dump = fopen_priv(filename, w)) == NULL) + if ((db_dump = fopen_priv(filename, PG_BINARY_W)) == NULL) pg_log(PG_FATAL, Could not write to dump file \%s\: %s\n, filename, getErrorText(errno)); current_output = globals_dump; diff --git a/contrib/pg_upgrade/test.sh b/contrib/pg_upgrade/test.sh index d411ac6..3899600 100644 --- a/contrib/pg_upgrade/test.sh +++ b/contrib/pg_upgrade/test.sh @@ -128,10 +128,6 @@ else sh ./delete_old_cluster.sh fi -if [ $testhost = Msys ] ; then - dos2unix $temp_root/dump1.sql $temp_root/dump2.sql -fi - if diff -q $temp_root/dump1.sql $temp_root/dump2.sql; then echo PASSED exit 0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
Excerpts from Robert Creager's message of sáb sep 01 12:12:51 -0400 2012: I change the build-farm.conf file to have the following make line: make = 'make -j 8', # or gmake if required. can include path if necessary. 2 pass, 4 fail. Is this a build configuration you want to pursue? Sure, why not? Parallel building is going to become more common, so it's a good idea to investigate the failures. I would have it build only HEAD though, because we're not likely to backpatch these fixes. I can either create a new machine, or change one of my existing machines. Makes no difference to me. If we want to have it run only HEAD I would say you should create a new animal. Perhaps you should wait until after 9.2 has been released, though, to avoid distracting people :-) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
On 9/1/12 12:12 PM, Robert Creager wrote: I change the build-farm.conf file to have the following make line: make = 'make -j 8', # or gmake if required. can include path if necessary. 2 pass, 4 fail. Is this a build configuration you want to pursue? Sure that would be useful, but it's pretty clear that the check stages don't work in parallel. It think it's because the ports conflict, but there could be any number of other problems. That said, it would be useful, in my mind, to support parallel checks. But unless someone is going to put in the work first, you should restrict your parallel runs to the build and install phases. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
Excerpts from Andres Freund's message of jue jun 28 17:06:49 -0400 2012: On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote: What I wonder is how hard it would be to remove catcache.h's structs into the implementation. Thats the reason why the old and new list implementation currently is included all over the backend... Moving them into the implementation isn't possible, but catcache.h being included just about everywhere simply isn't needed. FWIW this got fixed during some header changes I made a couple of weeks ago. If you have similar fixes to propose, let me know. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
Excerpts from Peter Eisentraut's message of mar sep 04 18:49:46 -0300 2012: On 9/1/12 12:12 PM, Robert Creager wrote: I change the build-farm.conf file to have the following make line: make = 'make -j 8', # or gmake if required. can include path if necessary. 2 pass, 4 fail. Is this a build configuration you want to pursue? Sure that would be useful, but it's pretty clear that the check stages don't work in parallel. It think it's because the ports conflict, but there could be any number of other problems. Is that really the problem? As far as I know, buildfarm doesn't use anything like installcheck-world or similar targets; each check target is run separately, serially, by the BF script. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
On 09/04/2012 05:49 PM, Peter Eisentraut wrote: On 9/1/12 12:12 PM, Robert Creager wrote: I change the build-farm.conf file to have the following make line: make = 'make -j 8', # or gmake if required. can include path if necessary. 2 pass, 4 fail. Is this a build configuration you want to pursue? Sure that would be useful, but it's pretty clear that the check stages don't work in parallel. It think it's because the ports conflict, but there could be any number of other problems. That said, it would be useful, in my mind, to support parallel checks. But unless someone is going to put in the work first, you should restrict your parallel runs to the build and install phases. The buildfarm code doesn't contain a facility to use a different make incantation for each step. It's pretty much an all or nothing deal. Of course, you can hack run_build.pl to make it do that, but I highly discourage that. For one thing, it makes upgrading that much more difficult. All the tweaking is supposed to be done vie the config file. I guess I could add a setting that allowed for per step make flags. Frankly, I have had enough failures of parallel make that I think doing this would generate a significant number of non-repeatable failures (I had one just the other day that took three invocations of make to get right). So I'm not sure doing this would advance us much, although I'm open to persuasion. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On 9/1/12 6:30 AM, Robert Haas wrote: On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut pete...@gmx.net wrote: When initializing a large database, pgbench writes tons of %d tuples done lines. I propose to change this to a sort of progress counter that stays on the same line, as in the attached patch. I'm not sure I like this - what if the output is being saved off to a file? I suppose we could print \n instead of \r then. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some whitespaces in utility.c
On Tue, Sep 4, 2012 at 11:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Tue, Sep 4, 2012 at 6:57 AM, Michael Paquier michael.paqu...@gmail.com wrote: I found some whitespace characters in utility.c introduced by commit 3a0e4d3. Please find attached a patch fixing that which can be applied on postgres master (commit 2f0c7d5). That probably exists in many other places in the source as well, but it's certainly fairly ugly. So I see nothing wrong cleaning it up whe nspotted. Just as a note: we generally leave it to pgindent to fix this sort of thing. I'm not sure it's worth the effort of submitting manual patches for, unless you have reason to think the next pgindent run won't fix it. Understood, thanks for telling about that. I just read some code and bumped into it, for sure doing such maintenance all at once with pgindent saves time and effort. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
Guys, Is this a patch to 9.3? i.e. do we need to delay the release for this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On 04.09.2012 03:02, Dimitri Fontaine wrote: Heikki Linnakangashlinn...@iki.fi writes: Hmm, I was thinking that when walsender gets the position it can send the WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current recovery timeline. If it has changed, refuse to send the new WAL and terminate. That would be a fairly small change, it would just close the window between requesting walsenders to terminate and them actually terminating. It looks to me like a bug fix that also applies to non cascading situation. Is that right? No, only cascading replication is affected. In non-cascading situation, the timeline never changes in the master. It's only in cascading mode that you have a problem, where the standby can cross timelines while it's replaying the WAL, and also sending it over to cascading standby. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On 04.09.2012 15:41, Josh Berkus wrote: Guys, Is this a patch to 9.3? i.e. do we need to delay the release for this? It is for 9.2. I'll do a little bit more testing, and barring any issues, commit the patch. What exactly is the schedule? Do we need to do a RC2 because of this? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
Heikki, It is for 9.2. I'll do a little bit more testing, and barring any issues, commit the patch. What exactly is the schedule? Do we need to do a RC2 because of this? We're currently scheduled to release next week. If we need to do an RC2, we're going to have to do some fast rescheduling; we've already started the publicity machine. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
On Sep 4, 2012 6:06 PM, Andrew Dunstan and...@dunslane.net wrote: Frankly, I have had enough failures of parallel make that I think doing this would generate a significant number of non-repeatable failures (I had one just the other day that took three invocations of make to get right). So I'm not sure doing this would advance us much, although I'm open to persuasion. Seeing as most PostgreSQL bugs appear with concurrency, I think we should leave our default config with 1 for max connections. ;-) Parallel make failures are bugs in the dependencies as described in our make files. For the build phase, I don't recall parallel problems and as a habit I usually use parallel makes. I would like that to be supported and I think I've seen fixes applied when we had issues before. Cutting build times to 1/2 to 1/4 is good. Checks and tests are harder because often they can't run in parallel. But then we shouldn't have them listed as independent prerequisites for targets. Ideally. But fixing it might not be worth the cost since an acceptable work around (rely upon make to serialize the test sequences in the particular order) is pretty painless (so far). Of course, having the ability to run the tests 8 at a time (or more) and reduce the time by 80% would be nice .;-) On Sep 4, 2012 6:06 PM, Andrew Dunstan and...@dunslane.net wrote: On 09/04/2012 05:49 PM, Peter Eisentraut wrote: On 9/1/12 12:12 PM, Robert Creager wrote: I change the build-farm.conf file to have the following make line: make = 'make -j 8', # or gmake if required. can include path if necessary. 2 pass, 4 fail. Is this a build configuration you want to pursue? Sure that would be useful, but it's pretty clear that the check stages don't work in parallel. It think it's because the ports conflict, but there could be any number of other problems. That said, it would be useful, in my mind, to support parallel checks. But unless someone is going to put in the work first, you should restrict your parallel runs to the build and install phases. The buildfarm code doesn't contain a facility to use a different make incantation for each step. It's pretty much an all or nothing deal. Of course, you can hack run_build.pl to make it do that, but I highly discourage that. For one thing, it makes upgrading that much more difficult. All the tweaking is supposed to be done vie the config file. I guess I could add a setting that allowed for per step make flags. Frankly, I have had enough failures of parallel make that I think doing this would generate a significant number of non-repeatable failures (I had one just the other day that took three invocations of make to get right). So I'm not sure doing this would advance us much, although I'm open to persuasion. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
Josh Berkus j...@agliodbs.com writes: Heikki, It is for 9.2. I'll do a little bit more testing, and barring any issues, commit the patch. What exactly is the schedule? Do we need to do a RC2 because of this? We're currently scheduled to release next week. If we need to do an RC2, we're going to have to do some fast rescheduling; we've already started the publicity machine. At this point I would argue that the only thing that should abort the launch is a bad regression. Minor bugs in new features (and this must be minor if it wasn't noticed before) don't qualify. Having said that, it'd be good to get it fixed if we can. The schedule says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed tomorrow (Wednesday)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On 03.09.2012 17:40, Heikki Linnakangas wrote: On 03.09.2012 16:26, Heikki Linnakangas wrote: On 03.09.2012 16:25, Fujii Masao wrote: On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangashlinn...@iki.fi wrote: Hmm, I was thinking that when walsender gets the position it can send the WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current recovery timeline. If it has changed, refuse to send the new WAL and terminate. That would be a fairly small change, it would just close the window between requesting walsenders to terminate and them actually terminating. Yeah, sounds good. Could you implement the patch? If you don't have time, I will I'll give it a shot.. So, this is what I came up with, please review. While testing, I bumped into another related bug: When a WAL segment is restored from the archive, we let a walsender to send that whole WAL segment to a cascading standby. However, there's no guarantee that the restored WAL segment is complete. In particular, if a timeline changes within that segment, e.g 00010004, that segment will be only partially full, and the WAL continues at segment 00020004, at the next timeline. This can also happen if you copy a partial WAL segment to the archive, for example from a crashed master server. Or if you have set up record-based WAL shipping not using streaming replication, per http://www.postgresql.org/docs/devel/static/log-shipping-alternative.html#WARM-STANDBY-RECORD. That manual page says you can only deal with whole WAL files that way, but I think with standby_mode='on', that's actually no longer true. So all in all, it seems like a shaky assumption that once you've restored a WAL file from the archive, you're free to stream it to a cascading slave. I think it would be more robust to limit it to streaming the file only up to the point that it's been replayed - and thus verified - in the 1st standby. If everyone is OK with that change in behavior, the fix is simple. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On 04.09.2012 16:50, Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: Heikki, It is for 9.2. I'll do a little bit more testing, and barring any issues, commit the patch. What exactly is the schedule? Do we need to do a RC2 because of this? We're currently scheduled to release next week. If we need to do an RC2, we're going to have to do some fast rescheduling; we've already started the publicity machine. At this point I would argue that the only thing that should abort the launch is a bad regression. Minor bugs in new features (and this must be minor if it wasn't noticed before) don't qualify. Having said that, it'd be good to get it fixed if we can. The schedule says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed tomorrow (Wednesday)? The attached patch fixes it for me. It fixes the original problem, by adding the missing locking and terminating walsenders on a target timeline change, and also changes the behavior wrt. WAL segments restored from the archive, as I just suggested in another email (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php). The test case I've been using is a master and two standbys. The first standby is set up to connect to the master with streaming replication, and the other standby is set up to connect to the 1st standby, ie. it's a cascading slave. In addition, the master is set up to do WAL archiving to a directory, and both standbys have a restore_command to read from that archive, and restore_target_timeline='latest'. After the master and both standbys are running, I create a dummy recovery.conf file in master's data directory, with just restore_command='/bin/false' in it, and restart the master. That forces a timeline change in the master. With the patch, the 1st standby will notice the new timeline in the archive, switch to that, and reconnect to the master. The cascading connection to the 2nd standby is terminated because of the timeline change, the 2nd standby will also scan the archive and pick up the new timeline, reconnect to the 1st standby, and be in sync again. - Heikki diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b75dab5..be01d63 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -406,7 +406,6 @@ typedef struct XLogCtlData XLogRecPtr *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */ int XLogCacheBlck; /* highest allocated xlog buffer index */ TimeLineID ThisTimeLineID; - TimeLineID RecoveryTargetTLI; /* * archiveCleanupCommand is read from recovery.conf but needs to be in @@ -455,6 +454,8 @@ typedef struct XLogCtlData XLogRecPtr recoveryLastRecPtr; /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ TimestampTz recoveryLastXTime; + /* current effective recovery target timeline */ + TimeLineID RecoveryTargetTLI; /* * timestamp of when we started replaying the current chunk of WAL data, @@ -4467,12 +4468,17 @@ rescanLatestTimeLine(void) ThisTimeLineID))); else { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + /* Switch target */ recoveryTargetTLI = newtarget; list_free(expectedTLIs); expectedTLIs = newExpectedTLIs; - XLogCtl-RecoveryTargetTLI = recoveryTargetTLI; + SpinLockAcquire(xlogctl-info_lck); + xlogctl-RecoveryTargetTLI = recoveryTargetTLI; + SpinLockRelease(xlogctl-info_lck); ereport(LOG, (errmsg(new target timeline is %u, @@ -7519,13 +7525,20 @@ GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch) } /* - * GetRecoveryTargetTLI - get the recovery target timeline ID + * GetRecoveryTargetTLI - get the current recovery target timeline ID */ TimeLineID GetRecoveryTargetTLI(void) { - /* RecoveryTargetTLI doesn't change so we need no lock to copy it */ - return XLogCtl-RecoveryTargetTLI; + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + TimeLineID result; + + SpinLockAcquire(xlogctl-info_lck); + result = xlogctl-RecoveryTargetTLI; + SpinLockRelease(xlogctl-info_lck); + + return result; } /* @@ -8321,7 +8334,7 @@ CreateRestartPoint(int flags) XLogRecPtr endptr; /* Get the current (or recent) end of xlog */ - endptr = GetStandbyFlushRecPtr(); + endptr = GetStandbyFlushRecPtr(NULL); KeepLogSeg(endptr, _logId, _logSeg); PrevLogSeg(_logId, _logSeg); @@ -9837,14 +9850,13 @@ do_pg_abort_backup(void) /* * Get latest redo apply position. * - * Optionally, returns the end byte position of the last restored - * WAL segment. Callers not interested in that value may pass - * NULL for restoreLastRecPtr. + * Optionally, returns the current recovery target timeline. Callers not + * interested in that may pass NULL for targetTLI. * * Exported to allow WALReceiver to read the pointer directly. */ XLogRecPtr -GetXLogReplayRecPtr(XLogRecPtr *restoreLastRecPtr)
Re: [HACKERS] too much pgbench init output
Peter Eisentraut pete...@gmx.net writes: On 9/1/12 6:30 AM, Robert Haas wrote: On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut pete...@gmx.net wrote: When initializing a large database, pgbench writes tons of %d tuples done lines. I propose to change this to a sort of progress counter that stays on the same line, as in the attached patch. I'm not sure I like this - what if the output is being saved off to a file? I suppose we could print \n instead of \r then. Possibly off-the-wall idea: we could fix the too much output problem once and for all by going to a log scale. 10 tuples done 100 tuples done 1000 tuples done 1 tuples done 10 tuples done ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
Andrew Dunstan and...@dunslane.net writes: Frankly, I have had enough failures of parallel make that I think doing this would generate a significant number of non-repeatable failures (I had one just the other day that took three invocations of make to get right). So I'm not sure doing this would advance us much, although I'm open to persuasion. Really? I routinely use -j4 for building, and it's been a long time since I've seen failures. I can believe that for instance make check in contrib would have a problem running in parallel, but the build process per se seems reliable enough from here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade diffs on WIndows
On 09/04/2012 03:44 PM, Andrew Dunstan wrote: On 09/04/2012 03:09 PM, Andrew Dunstan wrote: I realized this morning that I might have been a bit cavalier in using dos2unix to smooth away differences in the dumpfiles produced by pg_upgrade. Attached is a dump of the diff if this isn't done, with Carriage Returns printed as '*' to make them visible. As can be seen, in function bodies dump2 has the Carriage Returns doubled. I have not had time to delve into how this comes about, and I need to attend to some income-producing activity for a bit, but I'd like to get it cleaned up ASAP. We are under the hammer for 9.2, so any help other people can give on this would be appreciated. Actually, I have the answer - it's quite simple. We just need to open the output files in binary mode when we split the dumpall file. The attached patch fixes it. I think we should backpatch the first part to 9.0. OK, nobody else has reacted. I've spoken to Bruce and he seems happy with it, although, TBH, whe I talked to him I thought I understood it and now I'm not so sure. So we have 3 possibilities: leave it as is with an error-hiding hack in the test script, apply this patch which removes the hack and applies a fix that apparently works but which confuses us a bit, or go back to generating errors. The last choice would mean I would need to turn off pg_ugrade testing on Windows pending a fix. And we have to decide pretty much now so we can get 9.2 out the door. Thoughts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
On 09/04/2012 08:37 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Frankly, I have had enough failures of parallel make that I think doing this would generate a significant number of non-repeatable failures (I had one just the other day that took three invocations of make to get right). So I'm not sure doing this would advance us much, although I'm open to persuasion. Really? I routinely use -j4 for building, and it's been a long time since I've seen failures. I can believe that for instance make check in contrib would have a problem running in parallel, but the build process per se seems reliable enough from here. Both cases were vpath builds, which is what I usually use, if that's a useful data point. Maybe I run on lower level hardware than you do. I saw this again this afternoon after I posted the above. In both cases this was the machine that runs the buildfarm's crake. I'll try to get a handle on it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade del/rmdir path fix
On 09/04/2012 02:25 PM, Andrew Dunstan wrote: On 09/04/2012 10:42 AM, Alvaro Herrera wrote: Somehow the verbose reporting of user relation files being copied does not seem exceedingly useful; and I don't remember seeing that on Linux. Yeah, and it does something odd anyway when it's not writing to a terminal. Can we get rid of it, or make it only work in verbose mode? The attached is an attempt to fix this. I think it handles most of what's wrong with this. (The patch is bigger because the code currently uses a variable called fileno - not a good idea.) cheers andrew diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c index 33a867f..0a579d5 100644 --- a/contrib/pg_upgrade/relfilenode.c +++ b/contrib/pg_upgrade/relfilenode.c @@ -14,7 +14,6 @@ #include catalog/pg_class.h #include access/transam.h - static void transfer_single_new_db(pageCnvCtx *pageConverter, FileNameMap *maps, int size); static void transfer_relfile(pageCnvCtx *pageConverter, @@ -136,7 +135,7 @@ transfer_single_new_db(pageCnvCtx *pageConverter, char **namelist = NULL; int numFiles = 0; int mapnum; - int fileno; + int file_no; bool vm_crashsafe_change = false; old_dir[0] = '\0'; @@ -156,8 +155,8 @@ transfer_single_new_db(pageCnvCtx *pageConverter, { if (numFiles 0) { -for (fileno = 0; fileno numFiles; fileno++) - pg_free(namelist[fileno]); +for (file_no = 0; file_no numFiles; file_no++) + pg_free(namelist[file_no]); pg_free(namelist); } @@ -171,7 +170,34 @@ transfer_single_new_db(pageCnvCtx *pageConverter, maps[mapnum].old_relfilenode); snprintf(new_file, sizeof(new_file), %s/%u, maps[mapnum].new_dir, maps[mapnum].new_relfilenode); - pg_log(PG_REPORT, OVERWRITE_MESSAGE, old_file); + if (!isatty(fileno(stdout))) + { + /* don't try to be cute if we're not interactive */ + pg_log(PG_REPORT, %s\n, old_file); + } + else + { + /* + * print the largest rightmost part of the name that can fit + * within the message width. + */ + int remove = strlen(old_file) - atoi(MESSAGE_WIDTH); + char *start = old_file; + char *sep; + + if ( remove 0 ) + { +/* + * If it's a partial path. move past the + * first file path separator we find, so we don't + * print a partial path segment. + */ +start += remove; +if ((sep = strpbrk(start,/\\)) != NULL) + start = sep + 1; + } + pg_log(PG_REPORT, OVERWRITE_MESSAGE, start); + } /* * Copy/link the relation's primary file (segment 0 of main fork) @@ -190,23 +216,23 @@ transfer_single_new_db(pageCnvCtx *pageConverter, snprintf(file_pattern, sizeof(file_pattern), %u_, maps[mapnum].old_relfilenode); - for (fileno = 0; fileno numFiles; fileno++) + for (file_no = 0; file_no numFiles; file_no++) { -char *vm_offset = strstr(namelist[fileno], _vm); +char *vm_offset = strstr(namelist[file_no], _vm); bool is_vm_file = false; /* Is a visibility map file? (name ends with _vm) */ if (vm_offset strlen(vm_offset) == strlen(_vm)) is_vm_file = true; -if (strncmp(namelist[fileno], file_pattern, +if (strncmp(namelist[file_no], file_pattern, strlen(file_pattern)) == 0 (!is_vm_file || !vm_crashsafe_change)) { snprintf(old_file, sizeof(old_file), %s/%s, maps[mapnum].old_dir, - namelist[fileno]); + namelist[file_no]); snprintf(new_file, sizeof(new_file), %s/%u%s, maps[mapnum].new_dir, - maps[mapnum].new_relfilenode, strchr(namelist[fileno], '_')); + maps[mapnum].new_relfilenode, strchr(namelist[file_no], '_')); unlink(new_file); transfer_relfile(pageConverter, old_file, new_file, @@ -225,15 +251,15 @@ transfer_single_new_db(pageCnvCtx *pageConverter, snprintf(file_pattern, sizeof(file_pattern), %u., maps[mapnum].old_relfilenode); - for (fileno = 0; fileno numFiles; fileno++) + for (file_no = 0; file_no numFiles; file_no++) { - if (strncmp(namelist[fileno], file_pattern, + if (strncmp(namelist[file_no], file_pattern, strlen(file_pattern)) == 0) { snprintf(old_file, sizeof(old_file), %s/%s, maps[mapnum].old_dir, - namelist[fileno]); + namelist[file_no]); snprintf(new_file, sizeof(new_file), %s/%u%s, maps[mapnum].new_dir, - maps[mapnum].new_relfilenode, strchr(namelist[fileno], '.')); + maps[mapnum].new_relfilenode, strchr(namelist[file_no], '.')); unlink(new_file); transfer_relfile(pageConverter, old_file, new_file, @@ -244,8 +270,8 @@ transfer_single_new_db(pageCnvCtx *pageConverter, if (numFiles 0) { - for (fileno = 0; fileno numFiles; fileno++) - pg_free(namelist[fileno]); + for (file_no = 0; file_no numFiles; file_no++) + pg_free(namelist[file_no]); pg_free(namelist); } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] pg_upgrade diffs on WIndows
On Tue, Sep 4, 2012 at 08:46:53PM -0400, Andrew Dunstan wrote: On 09/04/2012 03:44 PM, Andrew Dunstan wrote: On 09/04/2012 03:09 PM, Andrew Dunstan wrote: I realized this morning that I might have been a bit cavalier in using dos2unix to smooth away differences in the dumpfiles produced by pg_upgrade. Attached is a dump of the diff if this isn't done, with Carriage Returns printed as '*' to make them visible. As can be seen, in function bodies dump2 has the Carriage Returns doubled. I have not had time to delve into how this comes about, and I need to attend to some income-producing activity for a bit, but I'd like to get it cleaned up ASAP. We are under the hammer for 9.2, so any help other people can give on this would be appreciated. Actually, I have the answer - it's quite simple. We just need to open the output files in binary mode when we split the dumpall file. The attached patch fixes it. I think we should backpatch the first part to 9.0. OK, nobody else has reacted. I've spoken to Bruce and he seems happy with it, although, TBH, whe I talked to him I thought I understood it and now I'm not so sure. So we have 3 possibilities: leave it as is with an error-hiding hack in the test script, apply this patch which removes the hack and applies a fix that apparently works but which confuses us a bit, or go back to generating errors. The last choice would mean I would need to turn off pg_ugrade testing on Windows pending a fix. And we have to decide pretty much now so we can get 9.2 out the door. I am very concerned about putting something into pg_upgrade that we don't fully understand. Adding stuff to pg_upgrade that we think we understand is risky enough, as we have seen in the pg_upgrade churn of the past week. Let's work on chat to find the complete details --- same goes for the log file change we are not sure about either. pg_upgrade is so complicated that I have learned that if we don't fully understand something, it can affect things we don't anticipate. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multiple setup steps for isolation tests
Kevin Grittner wrote: Tom Lane wrote: the attached version works fine for me. Yeah, that should do it. Will apply. Pushed to master and REL9_2_STABLE. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans versus serializable transactions
Kevin Grittner wrote: New version attached. Will apply if no further problems are found. Pushed to master and REL9_2_STABLE. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On 04.09.2012 17:34, Heikki Linnakangas wrote: On 04.09.2012 16:50, Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: Heikki, It is for 9.2. I'll do a little bit more testing, and barring any issues, commit the patch. What exactly is the schedule? Do we need to do a RC2 because of this? We're currently scheduled to release next week. If we need to do an RC2, we're going to have to do some fast rescheduling; we've already started the publicity machine. At this point I would argue that the only thing that should abort the launch is a bad regression. Minor bugs in new features (and this must be minor if it wasn't noticed before) don't qualify. Having said that, it'd be good to get it fixed if we can. The schedule says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed tomorrow (Wednesday)? The attached patch fixes it for me. It fixes the original problem, by adding the missing locking and terminating walsenders on a target timeline change, and also changes the behavior wrt. WAL segments restored from the archive, as I just suggested in another email (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php). Committed that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On Tue, Sep 4, 2012 at 8:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On 9/1/12 6:30 AM, Robert Haas wrote: On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut pete...@gmx.net wrote: When initializing a large database, pgbench writes tons of %d tuples done lines. I propose to change this to a sort of progress counter that stays on the same line, as in the attached patch. I'm not sure I like this - what if the output is being saved off to a file? I suppose we could print \n instead of \r then. Possibly off-the-wall idea: we could fix the too much output problem once and for all by going to a log scale. 10 tuples done 100 tuples done 1000 tuples done 1 tuples done 10 tuples done ... I don't like that, because one of the things you can see by following the current output is where the checkpoint stalls are happening during the load. You'd lose the ability to notice any kind of slowdown after the first few tuples with this kind of format. Actually, this whole things seems like a solution in search of a problem to me. We just reduced the verbosity of pgbench -i tenfold in the very recent past - I would have thought that enough to address this problem. But maybe not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)
On Tue, Sep 4, 2012 at 11:15 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: It is something which has to be considered, and I don't think it's theoretical at all. Here's how we deal with it. We don't use a plain FIFO queue for our transaction requests, but a prioritized FIFO with 10 levels of priority (0 to 9). The highest priority (9) is reserved for utility requests -- where a running transaction needs to spin off a related transaction to do some work for it. For the lowest level (0) we normally allocate only a single connection, and it is used for very long-running reports which we want to queue to run one-at-a-time. As examples of how we categorize queries, filling a large list in an interactive application will run at priority 3, while translating a key which must cause a description on the screen to display is run at priority 8. Normal single-row updates and deletes from an interactive application run at priority 5. Each connection in the pool has a worker thread, and is assigned a minimum priority that it will handle. When all threads are busy and transaction requests are queued, any thread completing a database transaction pulls from the front of the highest priority queue with a waiting request to run a transaction, looking only at priorities which are not beneath it. If there are no waiting requests of high enough priority, the thread waits for one to arrive. I well believe that with this sort of sophisticated system you can make the connection pool much smaller and get a benefit out of it. However, I think it's quite rare for people to have a system this sophisticated. I suspect that's why I typically see much larger pool sizes. Here's my other thought about this: we talk a lot about how a system with 32 cores and 40 drives can't do more than 72 things at once, and that's absolutely true. But I think much of the reason why PostgreSQL users get a benefit out of connection pooling is unrelated to that effect. What I think we're really working around, in many cases, is internal lock contention. That's why people are talking about adjusting formulas for 9.2. It's not that a system with 72 resources can suddenly do more than 72 things; it's that in the old world lock contention could easily make it a loser to have even half that many tasks running at once, and now that's less true. Hopefully we'll make further improvements in the future and it'll become even less true still. So is the real issue the hardware limits of the server, or is it the limits of our software? The former is certainly in the mix, but I personally believe the latter has a lot more to do with pool size selection than we typically credit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote: Actually, this whole things seems like a solution in search of a problem to me. We just reduced the verbosity of pgbench -i tenfold in the very recent past - I would have thought that enough to address this problem. But maybe not. The problem is that a) It blasts out too much output and everything scrolls off the screen, and b) There is no indication of where the end is. These are independent problems, and I'd be happy to address them separately if there are such specific concerns attached to this. Speaking of tenfold, we could reduce the output frequency tenfold to once every 100, which would alleviate this problem for a while longer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade diffs on WIndows
On Tue, 2012-09-04 at 20:46 -0400, Andrew Dunstan wrote: OK, nobody else has reacted. I've spoken to Bruce and he seems happy with it, although, TBH, whe I talked to him I thought I understood it and now I'm not so sure. So we have 3 possibilities: leave it as is with an error-hiding hack in the test script, apply this patch which removes the hack and applies a fix that apparently works but which confuses us a bit, or go back to generating errors. The last choice would mean I would need to turn off pg_ugrade testing on Windows pending a fix. And we have to decide pretty much now so we can get 9.2 out the door. I think now is not the time to cram in poorly understood changes into a release candidate. There is no requirement to have the tests running now or in time for the release, seeing also that no one has been particularly bothered about it for the past 11 months. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
Peter Eisentraut pete...@gmx.net writes: On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote: Actually, this whole things seems like a solution in search of a problem to me. We just reduced the verbosity of pgbench -i tenfold in the very recent past - I would have thought that enough to address this problem. But maybe not. The problem is that a) It blasts out too much output and everything scrolls off the screen, and Robert evidently thinks that the verbosity of the output is a feature not a bug. I'm not convinced that eyeballing pgbench output is a particularly useful way to measure checkpoint stalls, but ... b) There is no indication of where the end is. Well, surely *that* can be fixed in a noncontroversial way: just print M/N tuples done, where N is the target. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 pg_upgrade regression tests on WIndows
On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote: The attached very small patch allows pg_upgrade's make check to succeed on REL9_2_STABLE on my Mingw system. However, I consider the issue I mentioned earlier regarding use of forward slashes in the argument to rmdir to be a significant blocker, so I'm going to go and fix that and then pull this all together. cheers andrew diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c index 6f993df..57ca1df 100644 --- a/contrib/pg_upgrade/exec.c +++ b/contrib/pg_upgrade/exec.c @@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file, else retval = 0; +#ifndef WIN32 if ((log = fopen_priv(log_file, a+)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); fprintf(log, \n\n); fclose(log); +#endif return retval; } OK, I worked with Andrew on this issue, and have applied the attached patch which explains what is happening in this case. Andrew's #ifndef WIN32 was the correct fix. I consider this issue closed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c new file mode 100644 index ac46a9b..99f5006 *** a/contrib/pg_upgrade/exec.c --- b/contrib/pg_upgrade/exec.c *** exec_prog(const char *log_file, const ch *** 63,70 if (written = MAXCMDLEN) pg_log(PG_FATAL, command too long\n); ! if ((log = fopen_priv(log_file, a+)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); pg_log(PG_VERBOSE, %s\n, cmd); fprintf(log, command: %s\n, cmd); --- 63,73 if (written = MAXCMDLEN) pg_log(PG_FATAL, command too long\n); ! if ((log = fopen_priv(log_file, a)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); + #ifdef WIN32 + fprintf(log, \n\n); + #endif pg_log(PG_VERBOSE, %s\n, cmd); fprintf(log, command: %s\n, cmd); *** exec_prog(const char *log_file, const ch *** 97,106 #ifndef WIN32 /* ! * Can't do this on Windows, postmaster will still hold the log file ! * open if the command was pg_ctl start. */ ! if ((log = fopen_priv(log_file, a+)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); fprintf(log, \n\n); fclose(log); --- 100,112 #ifndef WIN32 /* ! * We can't do this on Windows because it will keep the pg_ctl start ! * output filename open until the server stops, so we do the \n\n above ! * on that platform. We use a unique filename for pg_ctl start that is ! * never reused while the server is running, so it works fine. We could ! * log these commands to a third file, but that just adds complexity. */ ! if ((log = fopen_priv(log_file, a)) == NULL) pg_log(PG_FATAL, cannot write to log file %s\n, log_file); fprintf(log, \n\n); fclose(log); diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index 195b927..3058343 *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *** extern char *output_files[]; *** 63,69 #define SERVER_STOP_LOG_FILE SERVER_LOG_FILE #else #define SERVER_START_LOG_FILE pg_upgrade_server_start.log ! /* pg_ctl stop doesn't keep the log file open, so reuse UTILITY_LOG_FILE */ #define SERVER_STOP_LOG_FILE UTILITY_LOG_FILE #endif --- 63,73 #define SERVER_STOP_LOG_FILE SERVER_LOG_FILE #else #define SERVER_START_LOG_FILE pg_upgrade_server_start.log ! /* ! * pg_ctl start keeps SERVER_START_LOG_FILE and SERVER_LOG_FILE open ! * while the server is running, so we use UTILITY_LOG_FILE for pg_ctl ! * stop. ! */ #define SERVER_STOP_LOG_FILE UTILITY_LOG_FILE #endif -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2rc1 produces incorrect results
I wrote: I think probably the best fix is to rejigger things so that Params assigned by different executions of SS_replace_correlation_vars and createplan.c can't share PARAM_EXEC numbers. This will result in rather larger ecxt_param_exec_vals arrays at runtime, but the array entries aren't very large, so I don't think it'll matter. Attached is a draft patch against HEAD for this. I think it makes the planner's handling of outer-level Params far less squishy than it's ever been, but it is rather a large change. Not sure whether to risk pushing it into 9.2 right now, or wait till after we cut 9.2.0 ... thoughts? regards, tom lane diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 1f2bb6cc72f1242f14d55eee7cdc8e0e0d0775a9..02a0f62a53a4e3d06a3ad48d523e959d5d6b2ab7 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *** _outPlannerGlobal(StringInfo str, const *** 1666,1672 WRITE_NODE_TYPE(PLANNERGLOBAL); /* NB: this isn't a complete set of fields */ - WRITE_NODE_FIELD(paramlist); WRITE_NODE_FIELD(subplans); WRITE_BITMAPSET_FIELD(rewindPlanIDs); WRITE_NODE_FIELD(finalrtable); --- 1666,1671 *** _outPlannerGlobal(StringInfo str, const *** 1674,1679 --- 1673,1679 WRITE_NODE_FIELD(resultRelations); WRITE_NODE_FIELD(relationOids); WRITE_NODE_FIELD(invalItems); + WRITE_INT_FIELD(nParamExec); WRITE_UINT_FIELD(lastPHId); WRITE_UINT_FIELD(lastRowMarkId); WRITE_BOOL_FIELD(transientPlan); *** _outPlannerInfo(StringInfo str, const Pl *** 1688,1693 --- 1688,1694 WRITE_NODE_FIELD(parse); WRITE_NODE_FIELD(glob); WRITE_UINT_FIELD(query_level); + WRITE_NODE_FIELD(plan_params); WRITE_BITMAPSET_FIELD(all_baserels); WRITE_NODE_FIELD(join_rel_list); WRITE_INT_FIELD(join_cur_level); *** _outRelOptInfo(StringInfo str, const Rel *** 1754,1759 --- 1755,1761 WRITE_FLOAT_FIELD(allvisfrac, %.6f); WRITE_NODE_FIELD(subplan); WRITE_NODE_FIELD(subroot); + WRITE_NODE_FIELD(subplan_params); /* we don't try to print fdwroutine or fdw_private */ WRITE_NODE_FIELD(baserestrictinfo); WRITE_NODE_FIELD(joininfo); *** _outPlannerParamItem(StringInfo str, con *** 1950,1956 WRITE_NODE_TYPE(PLANNERPARAMITEM); WRITE_NODE_FIELD(item); ! WRITE_UINT_FIELD(abslevel); } /* --- 1952,1958 WRITE_NODE_TYPE(PLANNERPARAMITEM); WRITE_NODE_FIELD(item); ! WRITE_INT_FIELD(paramId); } /* diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 69a1b93b33746370457bff2daf4d4ece66535803..458dae0489c029bd743c75c82f8e5102067e89bf 100644 *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *** set_subquery_pathlist(PlannerInfo *root, *** 1145,1150 --- 1145,1153 else tuple_fraction = root-tuple_fraction; + /* plan_params should not be in use in current query level */ + Assert(root-plan_params == NIL); + /* Generate the plan for the subquery */ rel-subplan = subquery_planner(root-glob, subquery, root, *** set_subquery_pathlist(PlannerInfo *root, *** 1152,1157 --- 1155,1164 subroot); rel-subroot = subroot; + /* Isolate the params needed by this specific subplan */ + rel-subplan_params = root-plan_params; + root-plan_params = NIL; + /* * It's possible that constraint exclusion proved the subquery empty. If * so, it's convenient to turn it back into a dummy path so that we will diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 5d3b293b88a3ee030adae2260520eda69caad4b7..030f420c90eb37946ee333250de54af61d9b82d7 100644 *** a/src/backend/optimizer/plan/createplan.c --- b/src/backend/optimizer/plan/createplan.c *** static HashJoin *create_hashjoin_plan(Pl *** 84,90 Plan *outer_plan, Plan *inner_plan); static Node *replace_nestloop_params(PlannerInfo *root, Node *expr); static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root); ! static void identify_nestloop_extparams(PlannerInfo *root, Plan *subplan); static List *fix_indexqual_references(PlannerInfo *root, IndexPath *index_path); static List *fix_indexorderby_references(PlannerInfo *root, IndexPath *index_path); static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol); --- 84,91 Plan *outer_plan, Plan *inner_plan); static Node *replace_nestloop_params(PlannerInfo *root, Node *expr); static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root); ! static void process_subquery_nestloop_params(PlannerInfo *root, ! List *subplan_params);
Re: [HACKERS] too much pgbench init output
On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut pete...@gmx.net wrote: On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote: Actually, this whole things seems like a solution in search of a problem to me. We just reduced the verbosity of pgbench -i tenfold in the very recent past - I would have thought that enough to address this problem. But maybe not. The problem is that a) It blasts out too much output and everything scrolls off the screen, and b) There is no indication of where the end is. These are independent problems, and I'd be happy to address them separately if there are such specific concerns attached to this. Speaking of tenfold, we could reduce the output frequency tenfold to once every 100, which would alleviate this problem for a while longer. Well, I wouldn't object to displaying a percentage on each output line. But I don't really like the idea of having them less frequent than they already are, because if you run into a situation that makes pgbench -i run slowly, as I occasionally do, it's marginal to tell the difference between slow and completely hung even with the current level of verbosity. However, we could add a -q flag to run more quietly, or something like that. Actually, I'd even be fine with making the default quieter, though we can't use -v for verbose since that's already taken. But I'd like to preserve the option of getting the current amount of output because sometimes I need that to troubleshoot problems. Actually it'd be nice to even get a bit more output: say, a timestamp on each line, and a completion percentage... but now I'm getting greedy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On Wed, Sep 5, 2012 at 9:47 AM, Robert Haas robertmh...@gmail.com wrote: Actually it'd be nice to even get a bit more output: say, a timestamp on each line, and a completion percentage... but now I'm getting greedy. May be we need a verbosity level and print a lot less or a lot more information than what we do today. That will satisfy everyone. Hopefully. Thanks, Pavan
Re: [HACKERS] pg_upgrade diffs on WIndows
Peter Eisentraut pete...@gmx.net writes: On Tue, 2012-09-04 at 20:46 -0400, Andrew Dunstan wrote: OK, nobody else has reacted. I've spoken to Bruce and he seems happy with it, although, TBH, whe I talked to him I thought I understood it and now I'm not so sure. So we have 3 possibilities: leave it as is with an error-hiding hack in the test script, apply this patch which removes the hack and applies a fix that apparently works but which confuses us a bit, or go back to generating errors. The last choice would mean I would need to turn off pg_ugrade testing on Windows pending a fix. And we have to decide pretty much now so we can get 9.2 out the door. I think now is not the time to cram in poorly understood changes into a release candidate. There is no requirement to have the tests running now or in time for the release, seeing also that no one has been particularly bothered about it for the past 11 months. Also, the tests *are* passing right now. I agree, let's not risk destabilizing it. pg_upgrade is way overdue for some quiet time so we can verify a full day's buildfarm cycle on it before the release wrap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On Tue, 2012-09-04 at 19:34 -0700, Heikki Linnakangas wrote: The attached patch fixes it for me. It fixes the original problem, by adding the missing locking and terminating walsenders on a target timeline change, and also changes the behavior wrt. WAL segments restored from the archive, as I just suggested in another email (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php). Committed that. New compiler warnings: xlog.c: In function ‘XLogFileRead’: xlog.c:2785:14: error: unused variable ‘endptr’ [-Werror=unused-variable] xlog.c:2784:25: error: unused variable ‘xlogctl’ [-Werror=unused-variable] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and recovery_target_timeline='latest'
On 04.09.2012 21:56, Peter Eisentraut wrote: On Tue, 2012-09-04 at 19:34 -0700, Heikki Linnakangas wrote: The attached patch fixes it for me. It fixes the original problem, by adding the missing locking and terminating walsenders on a target timeline change, and also changes the behavior wrt. WAL segments restored from the archive, as I just suggested in another email (http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php). Committed that. New compiler warnings: xlog.c: In function ‘XLogFileRead’: xlog.c:2785:14: error: unused variable ‘endptr’ [-Werror=unused-variable] xlog.c:2784:25: error: unused variable ‘xlogctl’ [-Werror=unused-variable] Fixed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Tuesday, September 04, 2012 12:40 AM Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Mon, Sep 3, 2012 at 8:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: I have another question after thinking about that for awhile: is there any security concern there? On Unix-oid systems, we expect the kernel to restrict who can do a kill() on a postgres process. If there's any similar restriction on who can send to that named pipe in the Windows version, it's not obvious from the code. Do we have/need any restriction there? We use the default for CreateNamedPipe() which is: The ACLs in the default security descriptor for a named pipe grant full control to the LocalSystem account, administrators, and the creator owner. They also grant read access to members of the Everyone group and the anonymous account. (ref: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365150(v=vs.85).as px) Hm. The write protections sound fine ... but what's the semantics of reading, is it like Unix pipes? If so, couldn't a random third party drain the pipe by reading from it, and thereby cause signals to be lost? When a client connects to the server-end of a named pipe, the server-end of the pipe is now dedicated to the client. No more connections will be allowed to that server pipe instance until the client has disconnected. This is from paper: http://www.blakewatts.com/namedpipepaper.html, it mentions about security issues in named pipes. The function CallNamedPipe() used for sending signal in pgkill() has following definition: Connects to a message-type pipe (and waits if an instance of the pipe is not available), writes to and reads from the pipe, and then closes the pipe. (http://msdn.microsoft.com/en-us/library/windows/desktop/aa365144(v=vs.85).a spx) So I think based on above 2 points it can be deduced that the signal sent by pgkill() cannot be read by anyone else. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers