Re: [HACKERS] Extension Templates S03E11
On 12/03/2013 09:25 AM, Jeff Davis wrote: On Mon, 2013-12-02 at 11:07 +0200, Heikki Linnakangas wrote: There should be no difference between file-based extensions and catalog-based extensions. It's just two different ways to install the same extension. The extension author doesn't need to care about that, it's the DBA that decides which method to use to install it. I'm going to object loudly to any proposal that doesn't meet that criteria. But we're arguably already in this position today. For a SQL-only extension, the author can choose between: 1. Using a schema/namespace a. installable over libpq b. installable by non-superusers c. no special handling when it comes to administration 2. Using an extension a. convenient metadata (e.g. "requires") b. upgrade scripts c. omitted from pg_dump so can be managed separately d. relocatable e. not tied to one schema And if the author decides to change, it requires porting the extension to the other form. Note: I'm using "extension" loosely here. We might call the SQL-only, user-installable variety something else. Good point. It's not too hard to install an "extension" written as an extension as plain schema objects, though. You can just run the .sql script through psql. That's what you used to do before extensions were invented. (the scripts in contrib contain an explicit check against that, but I don't think that's common outside contrib) Another perspective is that that's already a situation we'd rather not have. Let's not make it worse by introducing a third way to install an extension, which again requires the extension author to package the extension differently. So how do we get to the point where we have clear advice to the author of a SQL-only extension? And how do we do that without asking them to port anything? Yeah, that's the crucial question of this whole thread. Stephen mentioned using external tools and/or metadata, but to me that sounds like it would require porting the extension away from what's on PGXN today. Why? The external tool can pick the extension in its current form from PGXN, and install it via libpq. The tool might have to jump through some hoops to do it, and we might need some new backend functionality to support it, but I don't see why the extension author needs to do anything. That said, it might make the tool easier to write if we place some new requirements for extension authors. Like, stipulate that the .sql file is in the top-level directory of the extension tarball. But the same extension would still be installable with "make; make install". - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.7
Hello, This is rather trivial and superficial comments as not fully gripping functions of this patchset. - Some patches have line offset to master. Rebase needed. Other random comments follows, = 0001: - You assined HeapTupleGetOid(tuple) into relid to read in several points but no modification. Nevertheless, you left HeapTupleGetOid not replaced there. I think 'relid' just for repeated reading has far small merit compared to demerit of lowering readability. You'd be better to make them uniform in either way. = 0002: - You are identifying the wal_level with the expr 'wal_level >= WAL_LEVEL_LOGICAL' but it seems somewhat improper. - In heap_insert, you added following comment and code, > * Also, if this is a catalog, we need to transmit combocids to > * properly decode, so log that as well. > */ >need_tuple_data = RelationIsLogicallyLogged(relation); >if (RelationIsAccessibleInLogicalDecoding(relation)) >log_heap_new_cid(relation, heaptup); Actually 'is a catalog' is checkied in RelationIsAcc...Decodeing() but this either of naming or commnet should be changed for consistent look. (I this the name of the macro is rather long but gives a vague illustration of the function..) - RelationIsAccessibleInLogicalDecoding and RelationIsLogicallyLogged are identical in code. Together with the name ambiguity, this is quite confising and cause of future misuse between these macros, I suppose. Plus the names seem too long. - In heap_insert, the information conveyed on rdata[3] seems to be better to be in rdata[2] because of the scarecity of the additional information. XLOG_HEAP_CONTAINS_NEW_TUPLE only seems to be needed. Also is in heap_multi_insert and heap_upate. - In heap_multi_insert, need_cids referred only once so might be better removed. - In heap_delete, at the point adding replica identity, same to the aboves, rdata[3] could be moved into rdata[2] making new type like 'xl_heap_replident'. - In heapam_xlog.h, the new type xl_heap_header_len is inadequate in both of naming which is confising and construction on which the header in xl_heap_header is no longer be a header and indecisive member name 't_len'.. - In heapam_xlog.h, XLOG_HEAP_CONTAINS_OLD looks incomplete. And it seems to be used in nowhere in this patchset. It should be removed. - log_heap_new_cid() is called at several part just before other xlogs is being inserted. I suppose this should be built in the target xlog structures. - in RecovoerPreparedTransactions(), any commend needed for the reason calling XLogLogicalInfoActive().. - In xact.c, the comment for the member 'didLogXid' in TransactionStateData seems differ from it's meaning. It becomes true when any WAL record for the current transaction id just has been written to WAL buffer. So the comment, > /* has xid been included in WAL record? */ would be better be something like (Should need corrected as I'm not native speaker.) /* Any WAL record for this transaction has been emitted ? */ and also the member name should be something like XidIsLogged. (Not so chaned?) - The name of the function MarkCurrentTransactionIdLoggedIfAny, although irregular abbreviations are discouraged, seems too long. Isn't MarkCur(r/rent)XidLoggedIfAny sufficient? Anyway, the work involving this function seems would be better to be done in some other way.. - The comment for RelationGetIndexAttrBitmap() should be edited for attrKind. - The macro name INDEX_ATTR_BITMAP_KEY should be INDEX_ATTR_BITMAP_FKEY. And INDEX_ATTR_BITMAP_IDENTITY_KEY should be INDEX_ATTR_BITMAP_REPLID_KEY or something. - In rel.h the member name 'rd_idattr' would be better being 'rd_replidattr' or something like that. = 0004: - Could the macro name 'RelationIsUsedAsCatalogTable' be as simple as IsUserCatalogRelation or something? It's from the viewpoint of not only simplicity but also similarity to other macro and/or functions having closer functionality. You already call the table 'user_catalog_table' in rel.h. To be continued to next mail. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.
On Mon, Dec 2, 2013 at 11:41 PM, Tom Lane wrote: > Andres Freund writes: > At the same time, I'm pretty skeptical that any simple regression-test > type facility would have caught the bugs we've fixed lately ... The replication bug would have been reproducible at least, Heikki produced a simple test case able to reproduce it. For the MultiXact stuff... well some more infrastructure in core might be needed before having a wrapper calling test scripts aimed to manipulate cluster of nodes. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
On Tue, Dec 3, 2013 at 4:28 PM, Michael Paquier wrote: > On Tue, Dec 3, 2013 at 3:30 PM, Sawada Masahiko wrote: >> On Tue, Dec 3, 2013 at 12:02 PM, Michael Paquier >> wrote: Indeed, I forgot this code path. Completing for saving the state and xlog_redo for replay would be enough. >>> Wait a minute, I retract this argument. By using this method a master >>> server would be able to produce WAL files with inconsistent hint bit >>> data when they are replayed if log_hint_bits is changed after a >>> restart of the master. >> >> How case does it occur? >> I think pg_rewind can disagree to rewind if log_hint_bits is changed to >> 'OFF'. >> Is this not enough? > > After more thinking... > Before performing a rewind on a node, what we need to know is that > log_hint_bits was set to true when WAL forked, because of the issue > that Robert mentioned here: > http://www.postgresql.org/message-id/519e5493.5060...@vmware.com > It does not really matter if the node used log_hint_bits set to false > in its latest state (Node to-be-rewinded might have been restarted > after WAL forked). > > So, after more thinking, yes using XLOG_PARAMETER_CHANGE and > PGC_POSTMASTER for this parameter would be enough. However on the > pg_rewind side we would need to track the value of log_hint_bits when > analyzing the WALs and ensure that it was set to true at fork point. > This is not something that the core should about though. Yep, pg_rewind needs to track the value of wal_log_hintbits. I think value of wal_log_hintbits always needs to have been set true after fork point. And if wal_log_hintbits is set false when we perform pg_rewind, we can not that. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Expiring tuples from aggregation in window frames efficiently
Hi folks, I was casting my mind back to an idea that came up when windowing functions were first implemented in 8.4 during some talk about how ROWS BETWEEN might implemented in the future. This has now been implemented but with the implementation there is some quadratic behaviour when tuples move off the top of the window frame. In this case the aggregate value must be re-calculated for all the remaining rows in the frame for every row that exits the window's frame. e.g with SUM(value) OVER(ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) where the frame head is always at the current row. There is a comment in nodeWindowAgg.c on line 457 explaining that it may be possible to optimise this by having functions which can "expire" tuples out of the aggregate value. These are described as "negative transition function". Currently I'm looking over this to try to determine how hard this would be to implement it, with views that if it is within my ability then I'd like to have a patch ready for the next commitfest. The purpose of this email is just to publish my rough implementation ideas with the hope that someone can tell me if I'm on the right track or that I'm off track somewhere. Also there may be optimisation opportunities that I've not thought of. The idea would be to allow an extra function type when a user does CREATE AGGREGATE, this would be the function which removes a tuple from aggregation, I'll call these "aggregate subtraction functions" from now on. For SUM() I think the aggregation subtraction function would just do a - instead of a +, and look pretty much the same as intN_sum(). It looks like it would be similar for AVG() where we'd just remove from the count and the sum. COUNT(col) and COUNT(*) I would imagine would be quite similar though from what I've manage to figure out so far the implementation of this is a bit special. The specification of this aggregate subtraction function would be optional in CREATE AGGREGATE and if it was present it would be used instead of looping over all the rows in the frame each time the head of the window frame moves to the next row. I'm thinking that it would be useful to make it so these aggregate subtraction functions returned true if they've managed to remove the tuple and false to tell the executor to fall back and loop over each tuple in the frame. I feel this would be useful as it would allow MAX() and MIN() to at least partially get on-board with this optimisation Let me explain: If the current MAX() is 10 and we ask the aggregate subtraction function to subtract a tuple with 9, then the MAX() is still 10. So, in this case we could just compare 9 with 10 and see that it is lower and then just return true, thus pretending we actually did the subtraction, but the fact being no subtraction was required. The same applies to MIN, though of course with the comparison around the other way. It may also be possible to store a counter which we could increase each time MAX() receives a tuple with the current maximum value, this would be set back to 1 when the MAX receives a higher value. The aggregate subtraction function could decrement that counter each time it is asked to subtract the current maximum value and only return false when that counter gets to 0. Though I think this might mean that MAX() could take a small performance hit when it is used even not as a windowing aggregate. The pseudo logic would look something along the lines of: IN > "1" MAX() = "1", maxcount = 1 IN > "2" MAX() = "2", maxcount = 1 (max increased, reset maxcount to 1) IN > "2" MAX() = "2", maxcount = 2 OUT > "1" MAX() = "2", maxcount = 2 (value is less than current MAX, so don't touch the maxcount) OUT > "2" MAX() = "2", maxcount = 1 OUT > "2" MAX() = ???, maxcount = 0 (maxcount is 0 so return false as we need to recalculate) Tonight I'm trying to figure out how all of this could be implemented, I'll list below my rough idea of how this might be done. Please shout out if there is something out of order here. 1. Modifiy pg_proc.h adding a new bool flag for aggsubtract function. 2. In CREATE FUNCTION allow AGGSUBTRACT to be specified similar to how WINDOW is, only demand that these functions return BOOLEAN. 3. Alter CREATE AGGREGATE to allow the AGGSUBTRACT function type, this will be optional. 4. Write agg subtract functions for COUNT(col), COUNT(*), SUM(), AVG(), MIN() and MAX() 5. Change eval_windowaggregates() to attempt to use the aggregate subtraction function if it exists for this aggregate and if that function returns true use the updated aggregate state. I think perhaps as a simple initial implementation I could skip steps 2 and 3 and for step 4 just implement int4_sum_neg(), though I'm not quite sure yet if these steps would be required for initdb to work. Also the use passing volatile functions to the aggregate requires some thought. I would imagine the optimisation needs to be disabled in this case, but I'm not quite sure actually how the code sho
Re: [HACKERS] Get more from indices.
I wrote: > I've modified the patch to work in such a way. Also, as ISTM the patch > is more complicated than what the patch really does, I've simplified the > patch. I've revised the patch a bit. Please find attached the patch. Thanks, Best regards, Etsuro Fujita pathkey_and_uniqueindx_v7_20131203.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.7
Hello, this is continued comments. > = 0004: > To be continued to next mail. = 0005: - In heapam.c, it seems to be better replacing t_self only during logical decoding. - In GetOldestXmin(), the parameter name 'alreadyLocked' would be better being simplly 'nolock' since alreadyLocked seems to me suggesting that it will unlock the lock acquired beforehand. - Before that, In LogicalDecodingAcquireFreeSlot, lock window for procarray is extended after GetOldestXmin, but procarray does not seem to be accessed during the additional period. If you are holding this lock for the purpose other than accessing procArray, it'd be better to provide its own lock object. > LWLockAcquire(ProcArrayLock, LW_SHARED); > slot->effective_xmin = GetOldestXmin(true, true, true); > slot->xmin = slot->effective_xmin; > > if (!TransactionIdIsValid(LogicalDecodingCtl->xmin) || > NormalTransactionIdPrecedes(slot->effective_xmin, LogicalDecodingCtl->xmin)) > LogicalDecodingCtl->xmin = slot->effective_xmin; > LWLockRelease(ProcArrayLock); - In dropdb in dbcommands.c, ereport is invoked referring the result of LogicalDecodingCountDBSlots. But it seems better to me issueing this exception within LogicalDecodingCountDBSlots even if goto is required. - In LogStandbySnapshot in standby.c, two complementary conditions are imposed on two same unlocks. It might be somewhat paranoic but it is safer being like follows, | XLogRecPtr recptr = InvalidXLogRecPtr; | | | /* LogCurrentRunningXacts shoud be done before unlock when logical decoding*/ | if (wal_level >= WAL_LEVEL_LOGICAL) |recptr = LogCurrentRunningXacts(running); | | LWLockRelease(ProcArrayLock); | | if (recptr == InvalidXLogRecPtr) |recptr = LogCurrentRunningXacts(running); - In tqual.c, in Setup/RevertFrom DecodingSnapshots, the name CatalogSnapshotData looks lacking unity with other Snapshot*Data's. = 0007: - In heapam.c, the new global variable 'RecentGlobalDataXmin' is quite similar to 'RecentGlobalXmin' and does not represents what it is. The name should be changed. RecentGlobalNonCatalogXmin would be more preferable.. - Althgough simplly from my teste, the following part in heapam.c > if (IsSystemRelation(scan->rs_rd) > || RelationIsAccessibleInLogicalDecoding(scan->rs_rd)) > heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin); > else > heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin); would be readable to be like, > if (IsSystemRelation(scan->rs_rd) > || RelationIsAccessibleInLogicalDecoding(scan->rs_rd)) > xmin = RecentGlobalXmin > else > xmin = RecentGlobalDataXmin > heap_page_prune_opt(scan->rs_rd, buffer, xmin); index_fetch_heap in indexam.c has similar part to this, and you coded in latter style in IndexBuildHeapScan in index.c. - In procarray.c, you should add documentation for new parameter 'systable' for GetOldestXmin. And the name 'systable' seems somewhat confusing, since its full-splled meaning is 'including systables'. This name should be changed to 'include_systable' or 'only_usertable' with inverting or something.. 0008 and after to come later.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote: > > Fix a couple of bugs in MultiXactId freezing > > > > Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look > > into a multixact to check the members against cutoff_xid. > > > ! /* > > !* This is a multixact which is not marked LOCK_ONLY, > > but which > > !* is newer than the cutoff_multi. If the update_xid > > is below the > > !* cutoff_xid point, then we can just freeze the Xmax > > in the > > !* tuple, removing it altogether. This seems simple, > > but there > > !* are several underlying assumptions: > > !* > > !* 1. A tuple marked by an multixact containing a very > > old > > !* committed update Xid would have been pruned away by > > vacuum; we > > !* wouldn't be freezing this tuple at all. > > !* > > !* 2. There cannot possibly be any live locking members > > remaining > > !* in the multixact. This is because if they were > > alive, the > > !* update's Xid would had been considered, via the > > lockers' > > !* snapshot's Xmin, as part the cutoff_xid. > > READ COMMITTED transactions can reset MyPgXact->xmin between commands, > defeating that assumption; see SnapshotResetXmin(). I have attached an > isolationtester spec demonstrating the problem. Any idea how to cheat our way out of that one given the current way heap_freeze_tuple() works (running on both primary and standby)? My only idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. We can't even realistically create a new multixact with fewer members with the current format of xl_heap_freeze. > The test spec additionally > covers a (probably-related) assertion failure, new in 9.3.2. Too bad it's too late to do anthing about it for 9.3.2. :(. At least the last seems actually unrelated, I am not sure why it's 9.3.2 only. Alvaro, are you looking? > That was the only concrete runtime problem I found during a study of the > newest heap_freeze_tuple() and heap_tuple_needs_freeze() code. I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been released the interactions between cutoff_xid/multi would have caused me to say "back to the drawing" board... I'm not suprised if further things are lurking there. > One thing that > leaves me unsure is the fact that vacuum_set_xid_limits() does no locking to > ensure a consistent result between GetOldestXmin() and GetOldestMultiXactId(). > Transactions may start or end between those calls, making the > GetOldestMultiXactId() result represent a later set of transactions than the > GetOldestXmin() result. I suspect that's fine. New transactions have no > immediate effect on either cutoff, and transaction end can only increase a > cutoff. Using a slightly-lower cutoff than the maximum safe cutoff is always > okay; consider vacuum_defer_cleanup_age. Yes, that seems fine to me, with the same reasoning. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Skip hole in log_newpage
The log_newpage function, used to WAL-log a full copy of a page, is missing the trick we normally use for full-page images to leave out the unused space on the block. That's pretty trivial to implement, so we should. The place where this matters the most is when building a new B-tree index. When wal_level > minimal, all pages in the created index are logged with log_newpage, and by default we leave 10% free space on index pages. So implementing this reduces the amount of WAL generated by index creation by roughly 10%. Anyone see a problem with this? - Heikki *** a/src/backend/access/gin/gininsert.c --- b/src/backend/access/gin/gininsert.c *** *** 435,444 ginbuildempty(PG_FUNCTION_ARGS) START_CRIT_SECTION(); GinInitMetabuffer(MetaBuffer); MarkBufferDirty(MetaBuffer); ! log_newpage_buffer(MetaBuffer); GinInitBuffer(RootBuffer, GIN_LEAF); MarkBufferDirty(RootBuffer); ! log_newpage_buffer(RootBuffer); END_CRIT_SECTION(); /* Unlock and release the buffers. */ --- 435,444 START_CRIT_SECTION(); GinInitMetabuffer(MetaBuffer); MarkBufferDirty(MetaBuffer); ! log_newpage_buffer(MetaBuffer, false); GinInitBuffer(RootBuffer, GIN_LEAF); MarkBufferDirty(RootBuffer); ! log_newpage_buffer(RootBuffer, false); END_CRIT_SECTION(); /* Unlock and release the buffers. */ *** a/src/backend/access/gist/gist.c --- b/src/backend/access/gist/gist.c *** *** 83,89 gistbuildempty(PG_FUNCTION_ARGS) START_CRIT_SECTION(); GISTInitBuffer(buffer, F_LEAF); MarkBufferDirty(buffer); ! log_newpage_buffer(buffer); END_CRIT_SECTION(); /* Unlock and release the buffer */ --- 83,89 START_CRIT_SECTION(); GISTInitBuffer(buffer, F_LEAF); MarkBufferDirty(buffer); ! log_newpage_buffer(buffer, true); END_CRIT_SECTION(); /* Unlock and release the buffer */ *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 6207,6222 log_heap_update(Relation reln, Buffer oldbuf, * memory and writing them directly to smgr. If you're using buffers, call * log_newpage_buffer instead. * ! * Note: the NEWPAGE log record is used for both heaps and indexes, so do ! * not do anything that assumes we are touching a heap. */ XLogRecPtr log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno, ! Page page) { xl_heap_newpage xlrec; XLogRecPtr recptr; ! XLogRecData rdata[2]; /* NO ELOG(ERROR) from here till newpage op is logged */ START_CRIT_SECTION(); --- 6207,6228 * memory and writing them directly to smgr. If you're using buffers, call * log_newpage_buffer instead. * ! * If the page follows the standard page layout, with a PageHeader and unused ! * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows ! * the unused space to be left out from the WAL record, making it smaller. */ XLogRecPtr log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno, ! Page page, bool page_std) { xl_heap_newpage xlrec; XLogRecPtr recptr; ! XLogRecData rdata[3]; ! ! /* ! * Note: the NEWPAGE log record is used for both heaps and indexes, so do ! * not do anything that assumes we are touching a heap. ! */ /* NO ELOG(ERROR) from here till newpage op is logged */ START_CRIT_SECTION(); *** *** 6225,6239 log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno, xlrec.forknum = forkNum; xlrec.blkno = blkno; rdata[0].data = (char *) &xlrec; rdata[0].len = SizeOfHeapNewpage; rdata[0].buffer = InvalidBuffer; rdata[0].next = &(rdata[1]); ! rdata[1].data = (char *) page; ! rdata[1].len = BLCKSZ; ! rdata[1].buffer = InvalidBuffer; ! rdata[1].next = NULL; recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata); --- 6231,6288 xlrec.forknum = forkNum; xlrec.blkno = blkno; + if (page_std) + { + /* Assume we can omit data between pd_lower and pd_upper */ + uint16 lower = ((PageHeader) page)->pd_lower; + uint16 upper = ((PageHeader) page)->pd_upper; + + if (lower >= SizeOfPageHeaderData && + upper > lower && + upper <= BLCKSZ) + { + xlrec.hole_offset = lower; + xlrec.hole_length = upper - lower; + } + else + { + /* No "hole" to compress out */ + xlrec.hole_offset = 0; + xlrec.hole_length = 0; + } + } + else + { + /* Not a standard page header, don't try to eliminate "hole" */ + xlrec.hole_offset = 0; + xlrec.hole_length = 0; + } + rdata[0].data = (char *) &xlrec; rdata[0].len = SizeOfHeapNewpage; rdata[0].buffer = InvalidBuffer; rdata[0].next = &(rdata[1]); ! if (xlrec.hole_length == 0) ! { ! rdata[1].data = (char *) page; ! rdata[1].len = BLCKSZ; ! rdata[1].buffer = InvalidBuffer; ! rdata[1].next = NULL; ! } ! else ! { ! /* must skip the hole */ ! rdata[1].data = (char *) page; ! rdata[1].len = xlrec.hole_offset; ! rdata[1
Re: [HACKERS] Skip hole in log_newpage
Hi, On 2013-12-03 13:03:41 +0200, Heikki Linnakangas wrote: > The log_newpage function, used to WAL-log a full copy of a page, is missing > the trick we normally use for full-page images to leave out the unused space > on the block. That's pretty trivial to implement, so we should. > > The place where this matters the most is when building a new B-tree index. > When wal_level > minimal, all pages in the created index are logged with > log_newpage, and by default we leave 10% free space on index pages. So > implementing this reduces the amount of WAL generated by index creation by > roughly 10%. Sounds like a good idea to me. > Anyone see a problem with this? I haven't looked thoroughly through all callsites, but shouldn't the vacuumlazy callsite use std = true? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip hole in log_newpage
On 12/03/2013 01:37 PM, Andres Freund wrote: I haven't looked thoroughly through all callsites, but shouldn't the vacuumlazy callsite use std = true? Well, it's logging an empty page, ie. a page full of zeros. I'm not sure if you'd consider that a "standard" page. Like the backup-block code in xlog.c, log_newpage actually makes a full page image without the hole if pd_lower == 0, even if you pass std = 'true', so the end result is the same. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip hole in log_newpage
On 2013-12-03 13:57:04 +0200, Heikki Linnakangas wrote: > On 12/03/2013 01:37 PM, Andres Freund wrote: > >I haven't looked thoroughly through all callsites, but shouldn't the > >vacuumlazy callsite use std = true? > > Well, it's logging an empty page, ie. a page full of zeros. I'm not sure if > you'd consider that a "standard" page. Like the backup-block code in xlog.c, > log_newpage actually makes a full page image without the hole if pd_lower == > 0, even if you pass std = 'true', so the end result is the same. Hm. It should have been PageInit()ed and thus have sensible pd_lower/upper, right? Otherwise we'd have entered thePageIsNew() branch above it. It's obviously not critical, but it seems like a shame to write 8kb when it's not necessary. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip hole in log_newpage
On 12/03/2013 02:03 PM, Andres Freund wrote: On 2013-12-03 13:57:04 +0200, Heikki Linnakangas wrote: On 12/03/2013 01:37 PM, Andres Freund wrote: I haven't looked thoroughly through all callsites, but shouldn't the vacuumlazy callsite use std = true? Well, it's logging an empty page, ie. a page full of zeros. I'm not sure if you'd consider that a "standard" page. Like the backup-block code in xlog.c, log_newpage actually makes a full page image without the hole if pd_lower == 0, even if you pass std = 'true', so the end result is the same. Hm. It should have been PageInit()ed and thus have sensible pd_lower/upper, right? Otherwise we'd have entered thePageIsNew() branch above it. It's obviously not critical, but it seems like a shame to write 8kb when it's not necessary. Ah, you're right. I was thinking that that is the PageIsNew() branch, but it's not. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
Hello, I've encountered a small bug and fixed it. I guess this occurs on all major releases. I saw this happen on 9.2 and 9.4devel. Please find attached the patch and commit this. [Problem] If I mistakenly set an invalid value to listen_addresses, say '-1', and start the database server, it fails to start as follows. In my environment (RHEL6 for Intel64), it takes about 15 seconds before postgres prints the messages. This is OK. [maumau@myhost pgdata]$ pg_ctl -w start waiting for server to startLOG: could not translate host name "-1", service "5450" to address: Temporary failure in name resolution WARNING: could not create listen socket for "-1" FATAL: could not create any TCP/IP sockets stopped waiting pg_ctl: could not start server Examine the log output. [maumau@myhost pgdata]$ When I start the server without -w and try to stop it, "pg_ctl stop" waits for 60 seconds and timed out before it fails. This is what I'm seeing as a problem. I expected "pg_ctl stop" to respond quickly with success or failure depending on the timing. [maumau@myhost pgdata]$ pg_ctl start server starting ...(a few seconds later) [maumau@myhost ~]$ pg_ctl stop waiting for server to shut down. .. failed pg_ctl: server does not shut down HINT: The "-m fast" option immediately disconnects sessions rather than waiting for session-initiated disconnection. [maumau@myhost ~]$ [Cause] The problem occurs in the sequence below: 1. postmaster creates $PGDATA/postmaster.pid. 2. postmaster tries to resolve the value of listen_addresses to IP addresses. This took about 15 seconds in my failure scenario. 3. During 2, pg_ctl sends SIGTERM to postmaster. 4. postmaster terminates immediately without deleting $PGDATA/postmaster.pid. This is because it hasn't set signal handlers yet. 5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears. But the file does not disappear and it times out. [Fix] Make pg_ctl check if postmaster is still alive, because postmaster might have crashed unexpectedly. Regards MauMau pg_stop_fail.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip hole in log_newpage
On Tue, Dec 3, 2013 at 6:03 AM, Heikki Linnakangas wrote: > The log_newpage function, used to WAL-log a full copy of a page, is missing > the trick we normally use for full-page images to leave out the unused space > on the block. That's pretty trivial to implement, so we should. > > The place where this matters the most is when building a new B-tree index. > When wal_level > minimal, all pages in the created index are logged with > log_newpage, and by default we leave 10% free space on index pages. So > implementing this reduces the amount of WAL generated by index creation by > roughly 10%. > > Anyone see a problem with this? Looks good to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
On Mon, Dec 2, 2013 at 11:26 PM, Noah Misch wrote: > On Mon, Dec 02, 2013 at 08:56:03PM -0500, Tom Lane wrote: >> Noah Misch writes: >> > ... I propose merely changing the syntax to "TABLE FOR ROWS (...)". >> >> Ugh :-(. Verbose and not exactly intuitive, I think. I don't like >> any of the other options you listed much better. Still, the idea of >> using more than one word might get us out of the bind that a single >> word would have to be a fully reserved one. >> >> > ROWS FROM >> >> This one's a little less awful than the rest. What about "ROWS OF"? > > I had considered ROWS OF and liked it, but I omitted it from the list on > account of the shift/reduce conflict from a naturally-written Bison rule. > Distinguishing it from a list of column aliases takes extra look-ahead. We > could force that to work. However, if we ever wish to allow an arbitrary > from_item in the list, it would become ambiguous: is this drawing rows from > "a" or just using an alias with a column list? > > WITH a AS (SELECT oid FROM pg_am ORDER BY 1) SELECT * FROM rows of(a, a); > > ROWS FOR is terse and conflict-free. "FOR" evokes the resemblance to looping > over the parenthesized section with the functions acting as generators. I like the idea of using ROWS + some additional word. I think I mildly prefer Tom's suggestion of ROWS FROM to your suggestion of ROWS FOR, but I can live with either one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.7
On 2013-11-29 01:16:39 -0500, Robert Haas wrote: > On Thu, Nov 14, 2013 at 8:46 AM, Andres Freund wrote: > > [ new patches ] > > Here's an updated version of patch #2. I didn't really like the > approach you took in the documentation, so I revised it. Fair enough. > Apart from that, I spent a lot of time looking at > HeapSatisfiesHOTandKeyUpdate. I'm not very happy with your changes. > The idea seems to be that we'll iterate through all of the HOT columns > regardless, but that might be very inefficient. Suppose there are 100 > HOT columns, the last one is the only key column, and only the first > one has been modified. Once we look at #1 and determine that it's not > HOT, we should zoom forward and skip over the next 98, and only look > at the last one; your version does not behave like that. Well, the hot bitmap will only contains indexed columns, so for that's only going to happen if there's actually indexes over all those columns. And in that case it seems unlikely that the performance of that routine matters. That said, keeping the old performance characteristics seems like a good idea to me. Not sure anymore why I changed it that way. > I've taken a crack at rewriting > this logic, and the result looks cleaner and simpler to me, but I > haven't tested it beyond the fact that it passes make check. See what > you think. Hm. I think it actually will not abort early in call cases either, but that looks fixable. Imagine what happens if id_attrs or key_attrs is empty, ISTM that we'll check all hot columns in that case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, Dec 3, 2013 at 1:31 AM, Jeff Davis wrote: > On Sun, 2013-12-01 at 15:48 +0100, Dimitri Fontaine wrote: >> Jeff Davis writes: >> > I don't see why we are trying to accommodate a case where the author >> > doesn't offer enough full SQL scripts and offers broken downgrade >> > scripts; or why that case is different from offering broken upgrade >> > scripts. >> >> That's fair enough I guess. I will work on automating the choice of the >> first full script to use then, for next patch version. > > Can we separate this feature out? It's an issue with extensions today, > and I'm eager to make some progress after the explosion of differing > opinions today. +1 for separating that part out. I thought it was separated, at some point. > Robert, do you think this is an acceptable approach to solve your pet > peeve here: > > http://www.postgresql.org/message-id/CA > +tgmoae3qs4qbqfxouzzfxrsxa0zy8ibsoysuutzdumpea...@mail.gmail.com I'd need to look exactly what's being proposed in more detail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.7
Hi, On 2013-11-28 21:15:18 -0500, Robert Haas wrote: > OK, I've committed the patch to adjust the definition of > IsSystemRelation()/IsSystemClass() and add > IsCatalogRelation()/IsCatalogClass(). Thanks for taking care of this! > I kibitzed your decision about > which function to use in a few places - specifically, I made all of > the places that cared about allow_system_table_mods uses the IsSystem > functions, and all the places that cared about invalidation messages > use the IsCatalog functions. I don't think any of these changes are > more cosmetic, but I think it may reduce the chance of errors or > inconsistencies in the face of future changes. Agreed. Do you think we need to do anything about the ERROR: cannot remove dependency on schema pg_catalog because it is a system object thingy? Imo the current state is much more consistent than the earlier one, but that's still a quite surprising leftover... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder
Hello, The attached patch implements the below proposal, and moves libecpg.dll, libecpg_compat.dll, and libpgtypes.dll from lib to bin folder on Windows, where they should be. http://www.postgresql.org/message-id/10470B394DB8486F93AC60107CC44C8B@maumau As Andrew-san said, I don't expect it to be back-ported. In addition, I'll remove libpq.dll from lib folder unless somebody objects. Currently, libpq.dll is placed in both bin and lib. I guess libpq.dll was left in lib because it was considered necessary for ECPG DLLs. Would removing libpq.dll from lib cause any problem? No. The following modules in lib need libpq.dll when they are loaded by postgres.exe by LoadLibrary(). But the necessary libpq.dll is loaded from bin folder. According to the DLL search order rule, DLLs are loaded from the same folder as the loading program, which is postgres.exe in our case. dblink.dll libpqwalreceiver.dll postgres_fdw.dll Regards MauMau ECPG_DLL_not_move_libpq.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Mon, Dec 2, 2013 at 7:46 PM, Robert Haas wrote: >> Just tossing an idea out there. What if you could install an extension >> by specifying not a local file name but a URL. Obviously there's a >> security issue but for example we could allow only https URLs with >> verified domain names that are in a list of approved domain names >> specified by a GUC. > > That's a different feature, but I don't see anything preventing > someone from implementing that as an extension, today, without any > core support at all. It would only be usable in cases where the share > directory is writable by the database server (i.e. low-security > installations) and you'd have to make it a function call rather than > piggybacking on CREATE EXTENSION, but neither of those things sound > bad to me. (And if they are bad, they could be addressed by providing > hooks or event triggers, leaving the rest of the functionality in the > extension module.) Well none of this isn't implementable as an extension if you have write access to the database server's share directory. This is all about UI. CREATE EXTENSION is about having the core do the bookkeeping about which files belong to which version of which extension. I thought the fundamental problem the "in-catalog" extensions were trying to solve were the issue with not having access to the filesystem. If that's the case then being able to say create extension from http://... would solve that. If the fundamental problem is that you want multi-tenant databases to be able to have different .so files visible depending on which database is opened then that's a bit trickier. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Tom Lane (t...@sss.pgh.pa.us) wrote: > > On 3 December 2013 02:02, Dimitri Fontaine wrote: > > ISTM that the real solution to this particular problem is to decouple > > the extensions that are currently in contrib from a specific postgres > > version. > > "Problem"? It's not a bug that you get hstore 1.2 when you dump from 9.2 > and reload into 9.3; that's a feature. You wanted an upgrade, presumably, I don't buy this argument at *all* and it's not going to fly when we've got multiple versions of an extension available concurrently. I'm willing to accept that we have limitations when it comes from a packaging perspective (today) around extensions but the notion that my backup utility should intentionally omit information which is required to restore the database to the same state it was in is ridiculous. > or you'd not have been going to 9.3 in the first place. This notion that a given major version of PG only ever has one version of an extension associated with it is *also* wrong and only makes any sense for contrib extensions- which are the exception rather than the rule today. > The entire reason > why the extension mechanism works like it does is to allow that sort of > reasonably-transparent upgrade. It would not be a step forward to break > that by having pg_dump prevent it (which it would fail to do anyway, > likely, since the receiving installation might not have 1.1 available > to install). I agree that there should be a way to *allow* such an upgrade to happen transparently and perhaps we keep it the way it is for contrib extensions as a historical artifact, but other extensions are independent of the PG major version and multiple versions will be available concurrently for them and having pg_dump willfully ignore the extension version is a receipe for broken backups. Step back and consider a user who is just trying to restore his backup of his 9.2 database into a new server, also with 9.2, as quickly as he can to get his system online again. Having four different versions of extension X installed and available for 9.2, no clue or information about which version was installed into which databases and getting mysterious failures and errors because they're not all compatible is not what anyone is going to want to deal with in that situation. I certainly don't see extensions (outside of contrib) in the general sense as being either tied to specific PG versions or being required to maintain the same API that they started with on day 1. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Extension Templates S03E11
Robert Haas writes: >> Can we separate this feature out? It's an issue with extensions today, >> and I'm eager to make some progress after the explosion of differing >> opinions today. > > +1 for separating that part out. I thought it was separated, at some point. http://www.postgresql.org/message-id/caltqxtetvi-exhdbspuey3trthug51eswuod8cur2t+rxtg...@mail.gmail.com http://www.postgresql.org/message-id/m2r4k8jpfl@2ndquadrant.fr The only way for to bugfix all the reported problems had been to have regression testing… and it's become a necessary dependency of the extension templates patch, so I just included it in. My interdependent git branches development fu seems to have totally disappeared after the main extension patch that needed 7 of thoses… > I'd need to look exactly what's being proposed in more detail. What I did propose is a new GUC default_full_version: + default_full_version (string) + + + This option allows an extension author to avoid shiping all versions + of all scripts when shipping an extension. When a version is requested + and the matching script does not exist on disk, + set default_full_version to the first + script you still ship and PostgreSQL will apply the intermediate + upgrade script as per the ALTER EXTENSION UPDATE + command. + + + For example, say you did provide the extension pair + version 1.0 and are now providing the + version 1.1. If you want both current and new users + to be able to install the new version, you can provide both the + scripts pair--1.0--1.1.sql + and pair--1.1.sql, adding to the already + existing pair--1.0.sql. + + + When specifying default_version + and default_full_version = 1.0 you can instead only + provide only the scripts pair--1.0.sql + and pair-1.0--1.1.sql. The CREATE + EXTENSION pair; will then automatically use the afore + mentionned scripts to install version 1.0 then update it to 1.1. + + What Jeff is proposing is to simplify that down and have PostgreSQL auto discover the upgrade cycle when the version asked for isn't directly available with a creation script. We would keep the behavior depicted here, just in a fully automated way. Working on a separate patch for that, then. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Tom Dunstan (pg...@tomd.cc) wrote: > Extensions in contrib live in a weird place. Totally builtin stuff > should obviously be dumped without versions, and stuff which is > completely separate and follows its own release schedule should > obviously be versioned. I guess we consider all modules in contrib to > offer the same transparent upgrade guarantees as builtins, so they > shouldn't be versioned, but it feels like some of them should be, if > only because some aren't particularly tied in to the backend all that > tightly. But I guess that's a bogus metric, the true metric is whether > we want people to treat them as basically built-in, with the upgrade > guarantees that go along with that. Note that we don't actually make guarantees about either builtins or contrib modules when it comes to major version upgrades. The current way we package contrib and how we tie contrib releases to PG releases means that we can get away with omitting the version and saying "well, if you restore to the same PG major version then you'll get the same contrib extension version, and if you restore into a different version then obviously you want the version of contrib with that major version" but that *only* works for contrib. We need to accept that other extensions exist and that they aren't tied to PG major versions nor to our release schedule. There's a lot more extensions out there today which are *not* in contrib than there are ones which *are*. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Trust intermediate CA for client certificates
* Bruce Momjian (br...@momjian.us) wrote: > Uh, this thread actually started with Ian's feature request, and has > changed to document the current behavior. Whoops, apologies for that then- I clearly came into it (or perhaps more accurately, was brought into it) after the start of the thread. I'm happy to start a new thread for the discussion around documenting the current behavior, if that would help. :) Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Extension Templates S03E11
* Jeff Davis (pg...@j-davis.com) wrote: > Stephen mentioned using external tools and/or metadata, but to me that > sounds like it would require porting the extension away from what's on > PGXN today. Not at all- and that'd be the point. An external tool could take the PGXN extension, run 'make', then 'make install' (into a userland directory), extract out the script and then, with a little help from PG, run that script in "extension creation mode" via libpq. Another option, which I generally like better, is to have a new package format for PGXN that contains the results of "make install", more-or-less, synonymous to Debian source vs. .deb packages. Perhaps we could even have psql understand that format and be able to install the extension via a backslash command instead of having an external tool, but I think an external tool for dependency tracking and downloading of necessary dependencies ala Debian would be better than teaching psql to do that. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote: > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote: > > > Fix a couple of bugs in MultiXactId freezing > > > > > > Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look > > > into a multixact to check the members against cutoff_xid. > > > > > ! /* > > > ! * This is a multixact which is not marked > > > LOCK_ONLY, but which > > > ! * is newer than the cutoff_multi. If the > > > update_xid is below the > > > ! * cutoff_xid point, then we can just freeze > > > the Xmax in the > > > ! * tuple, removing it altogether. This seems > > > simple, but there > > > ! * are several underlying assumptions: > > > ! * > > > ! * 1. A tuple marked by an multixact containing > > > a very old > > > ! * committed update Xid would have been pruned > > > away by vacuum; we > > > ! * wouldn't be freezing this tuple at all. > > > ! * > > > ! * 2. There cannot possibly be any live locking > > > members remaining > > > ! * in the multixact. This is because if they > > > were alive, the > > > ! * update's Xid would had been considered, via > > > the lockers' > > > ! * snapshot's Xmin, as part the cutoff_xid. > > > > READ COMMITTED transactions can reset MyPgXact->xmin between commands, > > defeating that assumption; see SnapshotResetXmin(). I have attached an > > isolationtester spec demonstrating the problem. > > Any idea how to cheat our way out of that one given the current way > heap_freeze_tuple() works (running on both primary and standby)? My only > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > We can't even realistically create a new multixact with fewer members > with the current format of xl_heap_freeze. Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update XID consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet another injection of complexity. > > The test spec additionally > > covers a (probably-related) assertion failure, new in 9.3.2. > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > last seems actually unrelated, I am not sure why it's 9.3.2 > only. Alvaro, are you looking? (For clarity, the other problem demonstrated by the test spec is also a 9.3.2 regression.) > > That was the only concrete runtime problem I found during a study of the > > newest heap_freeze_tuple() and heap_tuple_needs_freeze() code. > > I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been > released the interactions between cutoff_xid/multi would have caused me > to say "back to the drawing" board... I'm not suprised if further things > are lurking there. heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting spurious lock contention due to wraparound of the multixact space. The comment is gone, and that mechanism no longer poses a threat. However, a non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker XIDs, just updater XIDs) may cause similar spurious contention. > + /* > + * The multixact has an update hidden within. > Get rid of it. > + * > + * If the update_xid is below the cutoff_xid, > it necessarily > + * must be an aborted transaction. In a > primary server, such > + * an Xmax would have gotten marked invalid by > + * HeapTupleSatisfiesVacuum, but in a replica > that is not > + * called before we are, so deal with it in the > same way. > + * > + * If not below the cutoff_xid, then the tuple > would have been > + * pruned by vacuum, if the update committed > long enough ago, > + * and we wouldn't be freezing it; so it's > either recently > + * committed, or in-progress. Deal with this > by setting the > + * Xmax to the update Xid directly and remove > the IS_MULTI > + * bit. (We know there cannot be running > lockers in this > + * multi, because it's below the cutoff_multi > value.) > + */ > + > + if (TransactionIdPrecedes(update_
Re: [HACKERS] Extension Templates S03E11
* Greg Stark (st...@mit.edu) wrote: > I thought the fundamental problem the "in-catalog" extensions were > trying to solve were the issue with not having access to the > filesystem. If that's the case then being able to say create extension > from http://... would solve that. That's not really 'solved' unless you feel we can depend on that "create extension from URL" to work at pg_restore time... I wouldn't have guessed that people would accept that, but I've already been wrong about such things in this thread once. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Recovery bug in GIN, missing full page image
While looking at Alexander's GIN patch, I noticed an ancient bug in the WAL-logging of GIN entry-tree insertions. entryPlaceToPage and dataPlacetoPage functions don't make a full-page image of the page, when inserting a downlink on a non-leaf page. The comment says: /* * Prevent full page write if child's split occurs. That is needed to * remove incomplete splits while replaying WAL * * data.updateBlkno contains new block number (of newly created right * page) for recently splited page. */ The code is doing what the comment says, but that's wrong. You can't just skip the full page write, it's needed for torn page protection like in any other case. The correct fix would've been to change the redo-routine to do the incomplete split tracking for the page even if it's restored from a full page image. This was broken by this commit back in 2007: commit 853d1c3103fa961ae6219f0281885b345593d101 Author: Teodor Sigaev Date: Mon Jun 4 15:56:28 2007 + Fix bundle bugs of GIN: - Fix possible deadlock between UPDATE and VACUUM queries. Bug never was observed in 8.2, but it still exist there. HEAD is more sensitive to bug after recent "ring" of buffer improvements. - Fix WAL creation: if parent page is stored as is after split then incomplete split isn't removed during replay. This happens rather rare, only on large tables with a lot of updates/inserts. - Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left page after deletion of page. That causes wrong rightlink field: it pointed to deleted page. - add checking of match of clearing incomplete split - cleanup incomplete split list after proceeding All of this chages doesn't change on-disk storage, so backpatch... But second point may be an issue for replaying logs from previous version. The relevant part is the "Fix WAL creation" item. I searched the archives but couldn't find any discussion leading to this fix. In 2010, Tom actually fixed the redo-routine in commit 4016bdef8aded77b4903c457050622a5a1815c16, along with other fixes. So all we need to do now is to fix that bogus logic in entryPlaceToPage to not skip the full-page-image. Attached is a patch for 9.3. I've whacked the code a lot in master, but the same bug is present there too. - Heikki diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c index 13ab448..32eba4c 100644 --- a/src/backend/access/gin/gindatapage.c +++ b/src/backend/access/gin/gindatapage.c @@ -381,7 +381,6 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda { Page page = BufferGetPage(buf); int sizeofitem = GinSizeOfDataPageItem(page); - int cnt = 0; /* these must be static so they can be returned to caller */ static XLogRecData rdata[3]; @@ -401,32 +400,25 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE; /* - * Prevent full page write if child's split occurs. That is needed to - * remove incomplete splits while replaying WAL - * - * data.updateBlkno contains new block number (of newly created right - * page) for recently splited page. + * For incomplete-split tracking, we need the updateBlkno information and + * the inserted item even when we make a full page image of the page, so + * put the buffer reference in a separate XLogRecData entry. */ - if (data.updateBlkno == InvalidBlockNumber) - { - rdata[0].buffer = buf; - rdata[0].buffer_std = FALSE; - rdata[0].data = NULL; - rdata[0].len = 0; - rdata[0].next = &rdata[1]; - cnt++; - } + rdata[0].buffer = buf; + rdata[0].buffer_std = FALSE; + rdata[0].data = NULL; + rdata[0].len = 0; + rdata[0].next = &rdata[1]; - rdata[cnt].buffer = InvalidBuffer; - rdata[cnt].data = (char *) &data; - rdata[cnt].len = sizeof(ginxlogInsert); - rdata[cnt].next = &rdata[cnt + 1]; - cnt++; + rdata[1].buffer = InvalidBuffer; + rdata[1].data = (char *) &data; + rdata[1].len = sizeof(ginxlogInsert); + rdata[1].next = &rdata[2]; - rdata[cnt].buffer = InvalidBuffer; - rdata[cnt].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem)); - rdata[cnt].len = sizeofitem; - rdata[cnt].next = NULL; + rdata[2].buffer = InvalidBuffer; + rdata[2].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem)); + rdata[2].len = sizeofitem; + rdata[2].next = NULL; if (GinPageIsLeaf(page)) { @@ -442,7 +434,7 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda btree->curitem++; } data.nitem = btree->curitem - savedPos; - rdata[cnt].len = sizeofitem * data.nitem; + rdata[2].len = sizeofitem * data.nitem; } else { diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c index 8ead38f..f6c06
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
Noah Misch writes: > On Mon, Dec 02, 2013 at 08:56:03PM -0500, Tom Lane wrote: >> Ugh :-(. Verbose and not exactly intuitive, I think. I don't like >> any of the other options you listed much better. Still, the idea of >> using more than one word might get us out of the bind that a single >> word would have to be a fully reserved one. > I had considered ROWS OF and liked it, but I omitted it from the list on > account of the shift/reduce conflict from a naturally-written Bison rule. > Distinguishing it from a list of column aliases takes extra look-ahead. Hmm, yeah, you're right --- at least one of the first two words needs to be reserved (not something that can be a ColId, at least), or else we can't tell them from a table name and alias. So this approach doesn't give us all that much extra wiggle room. We do have a number of already-fully-reserved prepositions (FOR, FROM, IN, ON, TO) but none of them seem like great choices. > ROWS FOR is terse and conflict-free. "FOR" evokes the resemblance to looping > over the parenthesized section with the functions acting as generators. Meh. I don't find that analogy compelling. After sleeping on it, your other suggestion of TABLE OF, or possibly TABLE FROM, is starting to grow on me. Who else has an opinion? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Andres Freund writes: > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: >> The test spec additionally >> covers a (probably-related) assertion failure, new in 9.3.2. > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > last seems actually unrelated, I am not sure why it's 9.3.2 > only. Alvaro, are you looking? Is this bad enough that we need to re-wrap the release? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip hole in log_newpage
Heikki Linnakangas writes: > The log_newpage function, used to WAL-log a full copy of a page, is > missing the trick we normally use for full-page images to leave out the > unused space on the block. That's pretty trivial to implement, so we should. > Anyone see a problem with this? +1. Don't forget to bump the WAL version identifier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On Mon, Dec 2, 2013 at 6:08 PM, Tom Dunstan wrote: > On 3 Dec 2013, at 03:37, Robert Haas wrote: >> I also like this feature. It would be really neat if a FOREIGN KEY >> constraint with a WHERE clause could use a *partial* index on the >> foreign table provided that the index would be guaranteed to be predOK >> for all versions of the foreign key checking query. That might be >> hard to implement, though. > > Well, with this patch, under the hood the FK query is doing (in the case of > RESTRICT): > > SELECT 1 FROM ONLY "public"."comment" x WHERE (the id) OPERATOR(pg_catalog.=) > "parent_id" AND (parent_entity = 'event') FOR KEY SHARE OF x; > > If we stick a partial index on the column, disable seq scans and run the > query, we get: > > tom=# create index comment_event_id on comment (parent_id) where > parent_entity = 'event'; > CREATE INDEX > tom=# set enable_seqscan = off; > SET > tom=# explain SELECT 1 FROM ONLY "public"."comment" x WHERE 20 > OPERATOR(pg_catalog.=) "parent_id" AND (parent_entity = 'event') FOR KEY > SHARE OF x; >QUERY PLAN > > LockRows (cost=0.12..8.15 rows=1 width=6) >-> Index Scan using comment_event_id on comment x (cost=0.12..8.14 > rows=1 width=6) > Index Cond: (20 = parent_id) > Filter: (parent_entity = 'event'::commentable_entity) > (4 rows) > > Is that what you had in mind? Yeah, more or less, but the key is ensuring that it wouldn't let you create the constraint in the first place if the partial index specified *didn't* match the WHERE clause. For example, suppose the partial index says WHERE parent_entity = 'event' but the constraint definition is WHERE parent_event = 'somethingelse'. That ought to fail, just as creating a regular foreign constraint will fail if there's no matching unique index. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 09:48:23 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > >> The test spec additionally > >> covers a (probably-related) assertion failure, new in 9.3.2. > > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > > last seems actually unrelated, I am not sure why it's 9.3.2 > > only. Alvaro, are you looking? > > Is this bad enough that we need to re-wrap the release? Tentatively I'd say no, the only risk is loosing locks afaics. Thats much bettter than corrupting rows as in 9.3.1. But I'll look into it in a bit more detail as soon as I am of the phone call I am on. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
* Tom Lane (t...@sss.pgh.pa.us) wrote: > After sleeping on it, your other suggestion of TABLE OF, or possibly > TABLE FROM, is starting to grow on me. > > Who else has an opinion? Alright, for my 2c, I like having this syntax include 'TABLE' simply because it's what folks coming from Oracle might be looking for. Following from that, to keep it distinct from the spec's notion of 'TABLE', my preference is 'TABLE FROM'. I don't particularly like 'TABLE OF', nor do I like the various 'ROWS' suggestions. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Extension Templates S03E11
On Tue, Dec 3, 2013 at 8:44 AM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> > On 3 December 2013 02:02, Dimitri Fontaine wrote: >> > ISTM that the real solution to this particular problem is to decouple >> > the extensions that are currently in contrib from a specific postgres >> > version. >> >> "Problem"? It's not a bug that you get hstore 1.2 when you dump from 9.2 >> and reload into 9.3; that's a feature. You wanted an upgrade, presumably, > > I don't buy this argument at *all* and it's not going to fly when we've > got multiple versions of an extension available concurrently. I'm > willing to accept that we have limitations when it comes from a > packaging perspective (today) around extensions but the notion that my > backup utility should intentionally omit information which is required > to restore the database to the same state it was in is ridiculous. > >> or you'd not have been going to 9.3 in the first place. > > This notion that a given major version of PG only ever has one version > of an extension associated with it is *also* wrong and only makes any > sense for contrib extensions- which are the exception rather than the > rule today. > >> The entire reason >> why the extension mechanism works like it does is to allow that sort of >> reasonably-transparent upgrade. It would not be a step forward to break >> that by having pg_dump prevent it (which it would fail to do anyway, >> likely, since the receiving installation might not have 1.1 available >> to install). > > I agree that there should be a way to *allow* such an upgrade to happen > transparently and perhaps we keep it the way it is for contrib > extensions as a historical artifact, but other extensions are > independent of the PG major version and multiple versions will be > available concurrently for them and having pg_dump willfully ignore the > extension version is a receipe for broken backups. > > Step back and consider a user who is just trying to restore his backup > of his 9.2 database into a new server, also with 9.2, as quickly as he > can to get his system online again. Having four different versions of > extension X installed and available for 9.2, no clue or information > about which version was installed into which databases and getting > mysterious failures and errors because they're not all compatible is not > what anyone is going to want to deal with in that situation. I think Tom's original idea here was that new versions of extensions *shouldn't ever* be backward-incompatible, and therefore if this problem arises it's the extension author's fault. It isn't however clear that this dream is likely to be realized in practice. For example, the only difference between hstore 1.0 and hstore 1.1 is that we dropped the => operator, for the very good reason that we have been slowly working towards deprecating => as an operator name so that we can eventually use it for the purpose that the SQL standard specifies. Given that we've done it in core, we can hardly say that no one will ever do this anywhere else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 09:16:18 -0500, Noah Misch wrote: > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote: > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote: > > Any idea how to cheat our way out of that one given the current way > > heap_freeze_tuple() works (running on both primary and standby)? My only > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > > We can't even realistically create a new multixact with fewer members > > with the current format of xl_heap_freeze. > > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update XID > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are > already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet > another injection of complexity. I think its pretty much checked that way already, but the problem seems to be how to avoid checks on xid commit/abort in that case. I've complained in 20131121200517.gm7...@alap2.anarazel.de that the old pre-condition that multixacts aren't checked when they can't be relevant (via OldestVisibleM*) isn't observed anymore. So, if we re-introduce that condition again, we should be on the safe side with that, right? > > > The test spec additionally > > > covers a (probably-related) assertion failure, new in 9.3.2. > > > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > > last seems actually unrelated, I am not sure why it's 9.3.2 > > only. Alvaro, are you looking? > > (For clarity, the other problem demonstrated by the test spec is also a 9.3.2 > regression.) Yea, I just don't see why yet... Looking now. > heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting > spurious lock contention due to wraparound of the multixact space. The > comment is gone, and that mechanism no longer poses a threat. However, a > non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker > XIDs, just updater XIDs) may cause similar spurious contention. Yea, I noticed that that comment was missing as well. I think what we should do now is to rework freezing in HEAD to make all this more reasonable. > Numerous comments in the vicinity (e.g. ones at MultiXactStateData) reflect a > pre-9.3 world. Most or all of that isn't new with the patch at hand, but it > does complicate study. Yea, Alvaro sent a patch for that somewhere, it seems a patch in the series got lost when foreign key locks were originally applied. I think we seriously need to put a good amount of work into the multixact.c stuff in the next months. Otherwise it will be a maintenance nightmore for a fair bit more time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, Dec 3, 2013 at 8:43 AM, Dimitri Fontaine wrote: > What Jeff is proposing is to simplify that down and have PostgreSQL auto > discover the upgrade cycle when the version asked for isn't directly > available with a creation script. > > We would keep the behavior depicted here, just in a fully automated way. > > Working on a separate patch for that, then. I like the idea of making it automatic, but it won't work in all cases. For example, suppose someone ships 1.0, 1.0--1.2, 1.1, and 1.1--1.2. Since versions aren't intrinsically ordered, the system has no way of knowing whether it's preferable to run 1.0 and then 1.0--1.2 or instead run 1.1 and then 1.1--1.2. So I think we will need either to introduce a way of ordering version numbers (which Tom has previously opposed) or some concept like your default_full_version. In more normal cases, however, the system can (and probably should) figure out what was intended by choosing the *shortest* path to get to the intended version. For example, if someone ships 1.0, 1.0--1.1, 1.1, and 1.1--1.2, the system should choose to run 1.1 and then 1.1--1.2, not 1.0 and then 1.0--1.1 and then 1.1--1.2. But that can be automatic: only if there are two paths of equal length (as in the example in the previous paragraph) do we need help from the user to figure out what to do. We should also consider the possibility of a user trying to deliberately install and older release. For example, if the user has 1.0, 1.0--1.1, 1.1, 1.1--1.2, and 1.2--1.0 (a downgrade script) with default_full_version = 1.2, an attempt to install 1.0 should run just the 1.0 script, NOT 1.2 and then 1.2--1.0. Putting all that together, I'm inclined to suggest that what we really need is a LIST of version numbers, rather than just one. If there one path to the version we're installing is shorter than any other, we choose that, period. If there are multiple paths of equal length, we break the tie by choosing which version number appears first in the aforementioned list. If that still doesn't break the tie, either because none of the starting points are mentioned in that list or because there are multiple equal-length paths starting in the same place, we give up and emit an error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 09:16:18 -0500, Noah Misch wrote: > > > The test spec additionally > > > covers a (probably-related) assertion failure, new in 9.3.2. > > > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > > last seems actually unrelated, I am not sure why it's 9.3.2 > > only. Alvaro, are you looking? > > (For clarity, the other problem demonstrated by the test spec is also a 9.3.2 > regression.) The backtrace for the Assert() you found is: #4 0x004f1da5 in CreateMultiXactId (nmembers=2, members=0x1ce17d8) at /home/andres/src/postgresql/src/backend/access/transam/multixact.c:708 #5 0x004f1aeb in MultiXactIdExpand (multi=6241831, xid=6019366, status=MultiXactStatusUpdate) at /home/andres/src/postgresql/src/backend/access/transam/multixact.c:462 #6 0x004a5d8e in compute_new_xmax_infomask (xmax=6241831, old_infomask=4416, old_infomask2=16386, add_to_xmax=6019366, mode=LockTupleExclusive, is_update=1 '\001', result_xmax=0x7fffca02a700, result_infomask=0x7fffca02a6fe, result_infomask2=0x7fffca02a6fc) at /home/andres/src/postgresql/src/backend/access/heap/heapam.c:4651 #7 0x004a2d27 in heap_update (relation=0x7f9fc45cc828, otid=0x7fffca02a8d0, newtup=0x1ce1740, cid=0, crosscheck=0x0, wait=1 '\001', hufd=0x7fffca02a850, lockmode=0x7fffca02a82c) at /home/andres/src/postgresql/src/backend/access/heap/heapam.c:3304 #8 0x00646f04 in ExecUpdate (tupleid=0x7fffca02a8d0, oldtuple=0x0, slot=0x1ce12c0, planSlot=0x1ce0740, epqstate=0x1ce0120, estate=0x1cdfe98, canSetTag=1 '\001') at /home/andres/src/postgresql/src/backend/executor/nodeModifyTable.c:690 So imo it isn't really a new problem, it existed all along :/. We only don't hit it in your terstcase before because we spuriously thought that a tuple was in-progress if *any* member of the old multi were still running in some cases instead of just the updater. But I am pretty sure it can also reproduced in 9.3.1. Imo the MultiXactIdSetOldestMember() call in heap_update() needs to be moved outside of the if (satisfies_key). Everything else is vastly more complex. Alvaro, correct? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote: > On 2013-12-03 09:16:18 -0500, Noah Misch wrote: > > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote: > > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > > > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote: > > > Any idea how to cheat our way out of that one given the current way > > > heap_freeze_tuple() works (running on both primary and standby)? My only > > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > > > We can't even realistically create a new multixact with fewer members > > > with the current format of xl_heap_freeze. > > > > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all update > > XID > > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers > > are > > already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit yet > > another injection of complexity. > > I think its pretty much checked that way already, but the problem seems > to be how to avoid checks on xid commit/abort in that case. I've > complained in 20131121200517.gm7...@alap2.anarazel.de that the old > pre-condition that multixacts aren't checked when they can't be relevant > (via OldestVisibleM*) isn't observed anymore. > So, if we re-introduce that condition again, we should be on the safe > side with that, right? What specific commit/abort checks do you have in mind? > > > > The test spec additionally > > > > covers a (probably-related) assertion failure, new in 9.3.2. > > > > > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > > > last seems actually unrelated, I am not sure why it's 9.3.2 > > > only. Alvaro, are you looking? > > > > (For clarity, the other problem demonstrated by the test spec is also a > > 9.3.2 > > regression.) > > Yea, I just don't see why yet... Looking now. Sorry, my original report was rather terse. I speak of the scenario exercised by the second permutation in that isolationtester spec. The multixact is later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. 9.3.2 does freeze it to InvalidTransactionId per the code I cited in my first response on this thread, which wrongly removes a key lock. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 10:29:54 -0500, Noah Misch wrote: > Sorry, my original report was rather terse. I speak of the scenario exercised > by the second permutation in that isolationtester spec. The multixact is > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. 9.3.2 > does freeze it to InvalidTransactionId per the code I cited in my first > response on this thread, which wrongly removes a key lock. That one is clear, I was only confused about the Assert() you reported. But I think I've explained that elsewhere. I currently don't see fixing the errorneous freezing of lockers (not the updater though) without changing the wal format or synchronously waiting for all lockers to end. Which both see like a no-go? While it's still a major bug it seems to still be much better than the previous case of either inaccessible or reappearing rows. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Hi, On 2013-12-03 10:29:54 -0500, Noah Misch wrote: > On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote: > > On 2013-12-03 09:16:18 -0500, Noah Misch wrote: > > > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote: > > > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > > > > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote: > > > > Any idea how to cheat our way out of that one given the current way > > > > heap_freeze_tuple() works (running on both primary and standby)? My only > > > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > > > > We can't even realistically create a new multixact with fewer members > > > > with the current format of xl_heap_freeze. > > > > > > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple? We'd then ensure all > > > update XID > > > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax > > > consumers are > > > already expected to check HEAP_XMAX_INVALID first. Seems doable, albeit > > > yet > > > another injection of complexity. > > > > I think its pretty much checked that way already, but the problem seems > > to be how to avoid checks on xid commit/abort in that case. I've > > complained in 20131121200517.gm7...@alap2.anarazel.de that the old > > pre-condition that multixacts aren't checked when they can't be relevant > > (via OldestVisibleM*) isn't observed anymore. > > So, if we re-introduce that condition again, we should be on the safe > > side with that, right? > > What specific commit/abort checks do you have in mind? MultiXactIdIsRunning() does a TransactionIdIsInProgress() for each member which in turn does TransactionIdDidCommit(). Similar when locking a tuple that's locked/updated without a multixact where we go for a TransactionIdIsInProgress() in XactLockTableWait(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dynamic configuration via LDAP in postmaster
I need advise about where is best place for adding such features. Currently I found that 'postmaster' have event loop(including handling SIGHUP) inside PostgressMain(postgress.c) for realoding configuration file, based on my investigation my plan is handling ldap events just before SIGHUP. PS I guess tomorrow I will start implement this feature inside 'postmaster', but before I start I wish to know expert opinion about where are most good place for dispatching of incomming messages(about configuration has changed etc) from the ldap. -- *Regard,Soshnikov Vasily mailto:dedok@gmail.com *
Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes
The changes are very well divided into logical units, so I think I could understand the ideas. I'm not too familiar with the ecpg internals, so consider this at least a coding review. git apply: Clean, except for one finding below Build: no errors/warnings Documentation build: no errors/warnings. The changes do appear in ecpg.html. Regression tests: all passed Non-Linux platforms: I can't verify, but I haven't noticed any new dependencies added. Comments (in the source code): good. (My) additional comments: 22.patch tuples_left > 0 instead of just tuples_left seems to me safer in for-loops. Not sure if there's a convention for this though. 23.patch git apply --verbose ~/cybertec/patches/ecpq/23.patch /home/anton/cybertec/patches/ecpq/23.patch:494: space before tab in indent. /*-- /home/anton/cybertec/patches/ecpq/23.patch:495: space before tab in indent. translator: this string will be truncated at 149 characters expanded. */ /home/anton/cybertec/patches/ecpq/23.patch:4019: trailing whitespace. exec sql close :curname; Tests - 23.patch src/interfaces/ecpg/test/sql/cursorsubxact.pgc /* * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT * is used with an already existing name. */ Shouldn't it be "... if a CURSOR is used with an already existing name?". Or just "... implicit RELEASE SAVEPOINT after an error"? I'd also appreciate a comment where exactly the savepoint is (implicitly) released. 23.patch and 24.patch - SO_MAJOR_VERSION and also interfaces/ecpg/include/ecpglib.h is changed Thus all client applications probably need to be preprocessed & compiled against the PG 9.4. I don't know how this is usually enforced. I'm mentioning it for the sake of completeness. // Antonin Houska (Tony) On 11/28/2013 03:21 PM, Boszormenyi Zoltan wrote: > 2013-11-20 14:53 keltezéssel, Boszormenyi Zoltan írta: >> 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta: >>> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta: 2013-11-12 07:01 keltezéssel, Noah Misch írta: > On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote: >> The old contents of my GIT repository was removed so you need to >> clone it fresh. https://github.com/zboszor/ecpg-readahead.git >> I won't post the humongous patch again, since sending a 90KB >> compressed file to everyone on the list is rude. > Patches of that weight show up on a regular basis. I don't think it's > rude. OK, here it is. >>> >>> ... >>> Subsequent patches will come as reply to this email. >> >> Infrastructure changes in ecpglib/execute.c to split up >> ECPGdo and ecpg_execute() and expose the parts as >> functions internal to ecpglib. > > Rebased after killing the patch that changed the DECLARE CURSOR command tag. > All infrastructure patches are attached, some of them compressed. > > Best regards, > Zoltán Böszörményi > > > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
On Fri, Nov 29, 2013 at 04:14:00PM -0500, Bruce Momjian wrote: > On Thu, Mar 28, 2013 at 03:06:30PM -0400, Steve Singer wrote: > > So to summarise: > > > > Plan A: The first patch I attached for pg_upgrade + documentation > > changes, and changing the other places that call PQconndefaults() to > > accept failures on either out of memory, or an invalid PGSERVICE > > > > Plan B: Create a new function PQconndefaults2(char * errorBuffer) or > > something similar that returned error information to the caller. > > > > Plan C: PQconndefaults() just ignores an invalid service but > > connection attempts fail because other callers of > > conninfo_add_defaults still pay attention to connection failures. > > This is the second patch I sent. > > > > Plan D: Service lookup failures are always ignored by > > conninfo_add_defaults. If you attempt to connect with a bad > > PGSERVICE set it will behave as if no PGSERVICE value was set. I > > don't think anyone explicitly proposed this yet. > > > > Plan 'D' is the only option that I'm opposed to, it will effect a > > lot more applications then ones that call PQconndefaults() and I > > feel it will confuse users. > > > > I'm not convinced plan B is worth the effort of having to maintain > > two versions of PQconndefaults() for a while to fix a corner case. > > OK, I am back to looking at this issue from March. I went with option > C, patch attached, which seems to be the most popular. It is similar to > Steve's patch, but structured differently and includes a doc patch. Patch applied. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trust intermediate CA for client certificates
On Mon, Dec 2, 2013 at 05:35:06PM -0500, Andrew Dunstan wrote: > > On 12/02/2013 04:17 PM, Tom Lane wrote: > >Bruce Momjian writes: > >>Sorry, I should have said: > >>Tom is saying that for his openssl version, a client that passed > >>an intermediate certificate had to supply a certificate _matching_ > >>something in the remote root.crt, not just signed by it. > >>At least I think that was the issue, rather than requiring the client to > >>supply a "root" certificate, meaning the client can supply an > >>intermediate or root certificicate, as long as it appears in the > >>root.crt file on the remote end. > >As far as the server is concerned, anything listed in its root.crt *is* a > >trusted root CA. Doesn't matter if it's a child of some other CA. > > > But it does need to be signed by a trusted signatory. At least in my > test script (pretty ugly, but shown below for completeness), the > Intermediate CA cert is signed with the Root cert rather than being > self-signed as the Root cert is, and so if the server doesn't have > that root cert as a trusted cert the validation fails. > > In case 1, we put the root CA cert on the server and append the > intermediate CA cert to the client's cert. This succeeds. In case 2, > we put the intermediate CA cert on the server without the root CA's > cert, and use the bare client cert. This fails. In case 3, we put > both the root and the intermediate certs in the server's root.crt, > and use the bare client key, and as expected this succeeds. > > So the idea that you can just plonk any Intermediate CA cert in > root.crt and have all keys it signs validated is not true, AFAICT. > > OpenSSL version 1.0.0j was used in these tests, on a Fedora 16 box. OK, that behavior matches the behavior Ian observed and also matches my most recent doc patch. I know Tom saw something different, but unless he can reproduce it, I am thinking my doc patch is our best solution. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Stephen Frost writes: > That's not really 'solved' unless you feel we can depend on that "create > extension from URL" to work at pg_restore time... I wouldn't have > guessed that people would accept that, but I've already been wrong about > such things in this thread once. Basically, with the extra software I want to build out-of-core, what you have is an externally maintained repository and the scripts are downloaded at CREATE EXTENSION time. With the Extension Template, you then have a solid cache you can rely on at pg_restore time. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add full object name to the tag field
Robert Haas wrote > On Mon, Dec 2, 2013 at 9:49 AM, Asit Mahato < > rigid.asit@ > > wrote: >> Hi all, >> >> I am a newbie. I am unable to understand the to do statement given below. >> >> Add full object name to the tag field. eg. for operators we need >> '=(integer, >> integer)', instead of just '='. >> >> please help me out with an example. >> >> Thanks and Regards, >> Asit Mahato > > Cast the OID of the operator to regoperator instead of regoper. This seems too simple an answer to be useful, and utterly confusing to the OP. The ToDo item in question is in the pg_dump/pg_restore section. In would seem to be possibly referring to this e-mail thread: "Re: pg_dump sort order for functions" http://www.postgresql.org/message-id/flat/9837222c1001120712t1e64eb2fg9f502757b6002...@mail.gmail.com#9837222c1001120712t1e64eb2fg9f502757b6002...@mail.gmail.com though there is no link to prior discussion attached to the ToDo item. The thread itself suggests there was yet prior discussion on the topic though my quick search did not turn anything up. It seems that not all objects (though it appears functions are currently one exception) are fully descriptive in their tag/name output which make deterministic ordering more difficult. The goal is, say for operators, to output not only the base operator symbol (regoper) but the types associated with the left-hand and right-hand sides (regoperator). The additional type information makes the entire name unique (barring cross-schema conflicts at least, which can be mitigated) and allows for deterministic ordering. Asit, hopefully this gives you enough context to ask better questions which others can answer more fully. Also, it may help to give more details about yourself and your goals and not just throw out that you are a newbie. The later gives people little guidance in how they should structure their help. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Add-full-object-name-to-the-tag-field-tp5781167p5781431.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: > Stephen Frost writes: > > That's not really 'solved' unless you feel we can depend on that "create > > extension from URL" to work at pg_restore time... I wouldn't have > > guessed that people would accept that, but I've already been wrong about > > such things in this thread once. > > Basically, with the extra software I want to build out-of-core, what you > have is an externally maintained repository and the scripts are > downloaded at CREATE EXTENSION time. I should have included above "unless we actually dump the extension objects out during pg_dump." > With the Extension Template, you then have a solid cache you can rely on > at pg_restore time. Extension templates are not needed for us to be able to dump and restore extensions. I do not understand why you continue to argue for extension templates as a solution to that problem when it's utter overkill and far worse than just dumping the extension objects out at pg_dump time and having a way to add them back as part of the extension on restore. I understand that you once proposed that and it was shot down but I think we need to move past that now that we've seen what the alternative is.. That isn't to say anything about the code or about you specifically, but, for my part, I really don't like nor see the value of sticking script blobs into the catalog as some kind of representation of database objects. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Time-Delayed Standbys
Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Greetings, CK -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pgpok2vtj3rMM.pgp Description: PGP signature
Re: [HACKERS] 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote: > > I wonder if we ought to mark each page as all-visible in > > raw_heap_insert() when we first initialize it, and then clear the flag > > when we come across a tuple that isn't all-visible. We could try to > > set hint bits on the tuple before placing it on the page, too, though > > I'm not sure of the details. > > I went with the per-page approach because I wanted to re-use the vacuum > lazy function. Is there some other code that does this already? I am > trying to avoid yet-another set of routines that would need to be > maintained or could be buggy. This hit bit setting is tricky. > > And thanks much for the review! So, should I put this in the next commit fest? I still have an unknown about the buffer number to use here: ! /* XXX use 0 or real offset? */ ! ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ? ! BufferGetBlockNumber(buf) : 0, offnum); Is everyone else OK with this approach? Updated patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c new file mode 100644 index 951894c..44ae5d8 *** a/src/backend/access/heap/rewriteheap.c --- b/src/backend/access/heap/rewriteheap.c *** *** 107,112 --- 107,114 #include "access/rewriteheap.h" #include "access/transam.h" #include "access/tuptoaster.h" + #include "access/visibilitymap.h" + #include "commands/vacuum.h" #include "storage/bufmgr.h" #include "storage/smgr.h" #include "utils/memutils.h" *** typedef OldToNewMappingData *OldToNewMap *** 172,177 --- 174,180 /* prototypes for internal functions */ static void raw_heap_insert(RewriteState state, HeapTuple tup); + static void update_page_vm(Relation relation, Page page, BlockNumber blkno); /* *** end_heap_rewrite(RewriteState state) *** 280,285 --- 283,289 state->rs_buffer); RelationOpenSmgr(state->rs_new_rel); + update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno); PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, *** raw_heap_insert(RewriteState state, Heap *** 632,637 --- 636,642 */ RelationOpenSmgr(state->rs_new_rel); + update_page_vm(state->rs_new_rel, page, state->rs_blockno); PageSetChecksumInplace(page, state->rs_blockno); smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, *** raw_heap_insert(RewriteState state, Heap *** 677,679 --- 682,704 if (heaptup != tup) heap_freetuple(heaptup); } + + static void + update_page_vm(Relation relation, Page page, BlockNumber blkno) + { + Buffer vmbuffer = InvalidBuffer; + TransactionId visibility_cutoff_xid; + + visibilitymap_pin(relation, blkno, &vmbuffer); + Assert(BufferIsValid(vmbuffer)); + + if (!visibilitymap_test(relation, blkno, &vmbuffer) && + heap_page_is_all_visible(relation, InvalidBuffer, page, + &visibility_cutoff_xid)) + { + PageSetAllVisible(page); + visibilitymap_set(relation, blkno, InvalidBlockNumber, + InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); + } + ReleaseBuffer(vmbuffer); + } diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c new file mode 100644 index 7f40d89..a42511b *** a/src/backend/access/heap/visibilitymap.c --- b/src/backend/access/heap/visibilitymap.c *** visibilitymap_set(Relation rel, BlockNum *** 278,284 map[mapByte] |= (1 << mapBit); MarkBufferDirty(vmBuf); ! if (RelationNeedsWAL(rel)) { if (XLogRecPtrIsInvalid(recptr)) { --- 278,284 map[mapByte] |= (1 << mapBit); MarkBufferDirty(vmBuf); ! if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf)) { if (XLogRecPtrIsInvalid(recptr)) { diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c new file mode 100644 index fe2d9e7..4f6578f *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** static void lazy_record_dead_tuple(LVRel *** 151,158 ItemPointer itemptr); static bool lazy_tid_reaped(ItemPointer itemptr, void *state); static int vac_cmp_itemptr(const void *left, const void *right); - static bool heap_page_is_all_visible(Relation rel, Buffer buf, - TransactionId *visibility_cutoff_xid); /* --- 151,156 *** lazy_vacuum_page(Relation onerel, BlockN *** 1197,1203 * check if the page has become all-visible. */ if (!visibilitymap_test(onerel, blkno, vmbuffer) && ! heap_page_is_all_visible(onerel, buffer, &visibility_cutoff_xid)) { Assert(BufferIsValid(*vmbuffer)); PageSe
Re: [HACKERS] Extension Templates S03E11
Robert Haas writes: > In more normal cases, however, the system can (and probably should) > figure out what was intended by choosing the *shortest* path to get to > the intended version. For example, if someone ships 1.0, 1.0--1.1, > 1.1, and 1.1--1.2, the system should choose to run 1.1 and then > 1.1--1.2, not 1.0 and then 1.0--1.1 and then 1.1--1.2. But that can > be automatic: only if there are two paths of equal length (as in the > example in the previous paragraph) do we need help from the user to > figure out what to do. Yeah. > We should also consider the possibility of a user trying to > deliberately install and older release. For example, if the user has > 1.0, 1.0--1.1, 1.1, 1.1--1.2, and 1.2--1.0 (a downgrade script) with > default_full_version = 1.2, an attempt to install 1.0 should run just > the 1.0 script, NOT 1.2 and then 1.2--1.0. In what I did, if you want version 1.0 and we have a script --1.0.sql around, then we just use that script, never kicking the path chooser. The path chooser at CREATE EXTENSION time is only execised when we don't have a direct script to support that specific version you're asking. > Putting all that together, I'm inclined to suggest that what we really > need is a LIST of version numbers, rather than just one. If there one > path to the version we're installing is shorter than any other, we > choose that, period. If there are multiple paths of equal length, we That's what Jeff did propose, yes. > break the tie by choosing which version number appears first in the > aforementioned list. If that still doesn't break the tie, either > because none of the starting points are mentioned in that list or > because there are multiple equal-length paths starting in the same > place, we give up and emit an error. Jeff also did mention about tiebreakers without entering into any level of details. We won't be able to just use default_version as the tiebreaker list here, because of the following example: default_version = 1.2, 1.0 create extension foo version '1.1'; With such a setup it would prefer 1.2--1.1 to 1.0--1.1, which doesn't look like what we want. Instead, we want default_version = 1.2 create_from_version_candidates = 1.0 create extension foo version '1.1'; Then the tie breaker is the 1.0 in "create_from_version_candidates" so we would run foo--1.0.sql and then foo--1.0--1.1.sql. Comments? Baring objections, I'm going to prepare a new branch to support developping that behavior against only file based extensions, and submit a spin-off patch to the current CF entry. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic configuration via LDAP in postmaster
On 12/03/2013 05:44 PM, Vasily Soshnikov wrote: I need advise about where is best place for adding such features. Currently I found that 'postmaster' have event loop(including handling SIGHUP) inside PostgressMain(postgress.c) for realoding configuration file, based on my investigation my plan is handling ldap events just before SIGHUP. PS I guess tomorrow I will start implement this feature inside 'postmaster', but before I start I wish to know expert opinion about where are most good place for dispatching of incomming messages(about configuration has changed etc) from the ldap. postmaster is kept very small on purpose, because if that process dies, it will take the whole server down with it. I'd suggest writing a custom background worker that talks with the ldap server. It can then write the changes to the configuration files, and send SIGHUP to reload. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes
Thanks for the review. 2013-12-03 16:48 keltezéssel, Antonin Houska írta: The changes are very well divided into logical units, so I think I could understand the ideas. I'm not too familiar with the ecpg internals, so consider this at least a coding review. git apply: Clean, except for one finding below Build: no errors/warnings Documentation build: no errors/warnings. The changes do appear in ecpg.html. Regression tests: all passed Non-Linux platforms: I can't verify, but I haven't noticed any new dependencies added. Comments (in the source code): good. (My) additional comments: 22.patch tuples_left > 0 instead of just tuples_left seems to me safer in for-loops. Not sure if there's a convention for this though. I'll look at it, maybe the >0 had a reason. 23.patch git apply --verbose ~/cybertec/patches/ecpq/23.patch /home/anton/cybertec/patches/ecpq/23.patch:494: space before tab in indent. /*-- /home/anton/cybertec/patches/ecpq/23.patch:495: space before tab in indent. translator: this string will be truncated at 149 characters expanded. */ /home/anton/cybertec/patches/ecpq/23.patch:4019: trailing whitespace. exec sql close :curname; I will fix the extra spaces. Tests - 23.patch src/interfaces/ecpg/test/sql/cursorsubxact.pgc /* * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT * is used with an already existing name. */ Shouldn't it be "... if a CURSOR is used with an already existing name?". Or just "... implicit RELEASE SAVEPOINT after an error"? I'd also appreciate a comment where exactly the savepoint is (implicitly) released. If you do this: SAVEPOINT A; SAVEPOINT A; /* same savepoint name */ then the operations between the two are implicitly committed to the higher subtransaction, i.e. it works as if there was a RELEASE SAVEPOINT A; before the second SAVEPOINT A; It happens to be tested with a DECLARE CURSOR statement and the runtime has to refresh its knowledge about the cursor by propagating it into a subtransaction one level higher. 23.patch and 24.patch - SO_MAJOR_VERSION and also interfaces/ecpg/include/ecpglib.h is changed Thus all client applications probably need to be preprocessed & compiled against the PG 9.4. I don't know how this is usually enforced. I'm mentioning it for the sake of completeness. The soversion has changed because of the changes in already existing exported functions. // Antonin Houska (Tony) On 11/28/2013 03:21 PM, Boszormenyi Zoltan wrote: 2013-11-20 14:53 keltezéssel, Boszormenyi Zoltan írta: 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta: 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta: 2013-11-12 07:01 keltezéssel, Noah Misch írta: On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote: The old contents of my GIT repository was removed so you need to clone it fresh. https://github.com/zboszor/ecpg-readahead.git I won't post the humongous patch again, since sending a 90KB compressed file to everyone on the list is rude. Patches of that weight show up on a regular basis. I don't think it's rude. OK, here it is. ... Subsequent patches will come as reply to this email. Infrastructure changes in ecpglib/execute.c to split up ECPGdo and ecpg_execute() and expose the parts as functions internal to ecpglib. Rebased after killing the patch that changed the DECLARE CURSOR command tag. All infrastructure patches are attached, some of them compressed. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Hi Alvaro, Noah, On 2013-12-03 15:57:10 +0100, Andres Freund wrote: > On 2013-12-03 09:48:23 -0500, Tom Lane wrote: > > Andres Freund writes: > > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote: > > >> The test spec additionally > > >> covers a (probably-related) assertion failure, new in 9.3.2. > > > > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the > > > last seems actually unrelated, I am not sure why it's 9.3.2 > > > only. Alvaro, are you looking? > > > > Is this bad enough that we need to re-wrap the release? > > Tentatively I'd say no, the only risk is loosing locks afaics. Thats > much bettter than corrupting rows as in 9.3.1. But I'll look into it in > a bit more detail as soon as I am of the phone call I am on. After looking, I think I am revising my opinion. The broken locking behaviour (outside of freeze, which I don't see how we can fix in time), is actually bad. Would that stop us from making the release date with packages? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing parallel pg_restore from pipe
On Wed, Apr 24, 2013 at 03:33:42PM -0400, Andrew Dunstan wrote: > > On 04/23/2013 07:53 PM, Timothy Garnett wrote: > >Hi All, > > > >Currently the -j option to pg_restore, which allows for > >parallelization in the restore, can only be used if the input file > >is a regular file and not, for ex., a pipe. However this is a > >pretty common occurrence for us (usually in the form of pg_dump | > >pg_restore to copy an individual database or some tables thereof > >from one machine to another). While there's no good way to > >parallelize the data load steps when reading from a pipe, the > >index and constraint building can still be parallelized and as > >they are generally CPU bound on our machines we've found quite a > >bit of speedup from doing so. > > > >Attached is two diffs off of the REL9_2_4 tag that I've been > >using. The first is a simple change that serially loads the data > >section before handing off the remainder of the restore to the > >existing parallelized restore code (the .ALT. diff). The second > >which gets more parallelization but is a bit more of a change uses > >the existing dependency analysis code to allow index building etc. > >to occur in parallel with data loading. The data loading tasks are > >still performed serially in the main thread, but non-data loading > >tasks are scheduled in parallel as their dependencies are > >satisfied (with the caveat that the main thread can only dispatch > >new tasks between data loads). > > > >Anyways, the question is if people think this is generally useful. > >If so I can clean up the preferred choice a bit and rebase it off > >of master, etc. > > > I don't think these are bad ideas at all, and probably worth doing. > Note that there are some fairly hefty changes affecting this code in > master, so your rebasing could be tricky. Is there any progress on this: doing parallel pg_restore from a pipe? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Stephen Frost writes: > I understand that you once proposed that and it was shot down but I > think we need to move past that now that we've seen what the alternative > is.. That isn't to say anything about the code or about you > specifically, but, for my part, I really don't like nor see the value of > sticking script blobs into the catalog as some kind of representation of > database objects. Well yeah, I'm having quite a hard time to withdraw that proposal, which is the fourth one in three years, and that had been proposed to me on this very mailing list, and got the infamous "community buy-in" about a year ago. It's always a hard time when you're being told that the main constraints you had to work with suddenly are no more, because after all this work, we realize that imposing those constraints actually made no sense. I understand that it can happen, it still really sucks when it does. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse wrote: > Hi Fabrizio, > > looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It > applies and compiles w/o errors or warnings. I set up a master and two > hot standbys replicating from the master, one with 5 minutes delay and > one without delay. After that I created a new database and generated > some test data: > > CREATE TABLE test (val INTEGER); > INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); > > The non-delayed standby nearly instantly had the data replicated, the > delayed standby was replicated after exactly 5 minutes. I did not > notice any problems, errors or warnings. > > Thanks for your review Christian... Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
2013/12/3 Stephen Frost > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > After sleeping on it, your other suggestion of TABLE OF, or possibly > > TABLE FROM, is starting to grow on me. > > > > Who else has an opinion? > > Alright, for my 2c, I like having this syntax include 'TABLE' simply > because it's what folks coming from Oracle might be looking for. > Following from that, to keep it distinct from the spec's notion of > 'TABLE', my preference is 'TABLE FROM'. I don't particularly like > 'TABLE OF', nor do I like the various 'ROWS' suggestions. > +1 Pavel > > Thanks, > > Stephen >
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Andres Freund writes: >> On 2013-12-03 09:48:23 -0500, Tom Lane wrote: >>> Is this bad enough that we need to re-wrap the release? > After looking, I think I am revising my opinion. The broken locking > behaviour (outside of freeze, which I don't see how we can fix in time), > is actually bad. > Would that stop us from making the release date with packages? That's hardly answerable when you haven't specified how long you think it'd take to fix. In general, though, I'm going to be exceedingly unhappy if this release introduces new regressions. If we have to put off the release to fix something, maybe we'd better do so. And we'd damn well better get it right this time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 12:22:33 -0500, Tom Lane wrote: > Andres Freund writes: > >> On 2013-12-03 09:48:23 -0500, Tom Lane wrote: > >>> Is this bad enough that we need to re-wrap the release? > > > After looking, I think I am revising my opinion. The broken locking > > behaviour (outside of freeze, which I don't see how we can fix in time), > > is actually bad. > > Would that stop us from making the release date with packages? > > That's hardly answerable when you haven't specified how long you > think it'd take to fix. There's two things that are broken as-is: 1) the freezing of multixacts: The new state is much better than the old one because the old one corrupted data, while the new one somes removes locks when you explicitly FREEZE. 2) The broken locking behaviour in Noah's test without the FREEZE. I don't see a realistic chance of fixing 1) in 9.3. Not even sure if it can be done without changing the freeze WAL format. But 2) should be fixed and basically is a oneliner + comments + test. Alvaro? > In general, though, I'm going to be exceedingly unhappy if this release > introduces new regressions. If we have to put off the release to fix > something, maybe we'd better do so. And we'd damn well better get it > right this time. I think that's really hard for the multixacts stuff. There's lots of stuff not really okay in there, but we can't do much about that now :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Atri Sharma writes: > Please find attached the latest patch for WITHIN GROUP. This patch is > after fixing the merge conflicts. I've started to look at this patch now. I have a couple of immediate reactions to the catalog changes: 1. I really hate the way you've overloaded the transvalue to do something that has approximately nothing to do with transition state (and haven't updated catalogs.sgml to explain that, either). Seems like it'd be cleaner to just hardwire a bool column that distinguishes regular and hypothetical input rows. And why do you need aggtranssortop for that? I fail to see the point of sorting on the flag column. 2 I also don't care for the definition of aggordnargs, which is the number of direct arguments to an ordered set function, except when it isn't. Rather than overloading it to be both a count and a couple of flags, I wonder whether we shouldn't expand aggisordsetfunc to be a three-way "aggregate kind" field (ordinary agg, ordered set, or hypothetical set function). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Andres Freund writes: > Any idea how to cheat our way out of that one given the current way > heap_freeze_tuple() works (running on both primary and standby)? My only > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > We can't even realistically create a new multixact with fewer members > with the current format of xl_heap_freeze. Maybe we should just bite the bullet and change the WAL format for heap_freeze (inventing an all-new record type, not repurposing the old one, and allowing WAL replay to continue to accept the old one). The implication for users would be that they'd have to update slave servers before the master when installing the update; which is unpleasant, but better than living with a known data corruption case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane wrote: > Andres Freund writes: > > Any idea how to cheat our way out of that one given the current way > > heap_freeze_tuple() works (running on both primary and standby)? My only > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > > We can't even realistically create a new multixact with fewer members > > with the current format of xl_heap_freeze. > > Maybe we should just bite the bullet and change the WAL format for > heap_freeze (inventing an all-new record type, not repurposing the old > one, and allowing WAL replay to continue to accept the old one). The > implication for users would be that they'd have to update slave servers > before the master when installing the update; which is unpleasant, but > better than living with a known data corruption case. > Agreed. It may suck, but it sucks less. How badly will it break if they do the upgrade in the wrong order though. Will the slaves just stop (I assume this?) or is there a risk of a wrong-order upgrade causing extra breakage? And if they do shut down, would just upgrading the slave fix it, or would they then have to rebuild the slave? (actually, don't we recommend they always rebuild the slave *anyway*? In which case the problem is even smaller..) I think we've always told people to upgrade the slave first, and it's the logical thing that AFAIK most other systems require as well, so that's not an unreasonable requirement at all. I assume we'd then get rid of the old record type completely in 9.4, right? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Extension Templates S03E11
On Dec 3, 2013, at 9:14 AM, Dimitri Fontaine wrote: > I understand that it can happen, it still really sucks when it does. > > I have not followed this project closely, Dimitri, but I for one have appreciated your tenacity in following through on it. Extensions are awesome, thanks to you, and I’m happy to see all efforts to make it more so. Thank you. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote: > On 2013-12-03 10:29:54 -0500, Noah Misch wrote: > > Sorry, my original report was rather terse. I speak of the scenario > > exercised > > by the second permutation in that isolationtester spec. The multixact is > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. 9.3.2 > > does freeze it to InvalidTransactionId per the code I cited in my first > > response on this thread, which wrongly removes a key lock. > > That one is clear, I was only confused about the Assert() you > reported. But I think I've explained that elsewhere. > > I currently don't see fixing the errorneous freezing of lockers (not the > updater though) without changing the wal format or synchronously waiting > for all lockers to end. Which both see like a no-go? Not fixing it at all is the real no-go. We'd take both of those undesirables before just tolerating the lost locks in 9.3. The attached patch illustrates the approach I was describing earlier. It fixes the test case discussed above. I haven't verified that everything else in the system is ready for it, so this is just for illustration purposes. -- Noah Misch EnterpriseDB http://www.enterprisedb.com *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 5384,5412 heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, TransactionId update_xid; /* !* This is a multixact which is not marked LOCK_ONLY, but which !* is newer than the cutoff_multi. If the update_xid is below the !* cutoff_xid point, then we can just freeze the Xmax in the !* tuple, removing it altogether. This seems simple, but there !* are several underlying assumptions: !* !* 1. A tuple marked by an multixact containing a very old !* committed update Xid would have been pruned away by vacuum; we !* wouldn't be freezing this tuple at all. !* !* 2. There cannot possibly be any live locking members remaining !* in the multixact. This is because if they were alive, the !* update's Xid would had been considered, via the lockers' !* snapshot's Xmin, as part the cutoff_xid. !* !* 3. We don't create new MultiXacts via MultiXactIdExpand() that !* include a very old aborted update Xid: in that function we only !* include update Xids corresponding to transactions that are !* committed or in-progress. */ update_xid = HeapTupleGetUpdateXid(tuple); if (TransactionIdPrecedes(update_xid, cutoff_xid)) ! freeze_xmax = true; } } else if (TransactionIdIsNormal(xid) && --- 5384,5404 TransactionId update_xid; /* !* This is a multixact which is not marked LOCK_ONLY, but which is !* newer than the cutoff_multi. To advance relfrozenxid, we must !* eliminate any updater XID that falls below cutoff_xid. !* Affected multixacts may also contain outstanding lockers, which !* we must preserve. Forming a new multixact is tempting, but !* solving it that way requires a WAL format change. Instead, !* mark the tuple LOCK_ONLY. The updater XID is still present, !* but consuming code will notice LOCK_ONLY and assume it's gone. */ update_xid = HeapTupleGetUpdateXid(tuple); if (TransactionIdPrecedes(update_xid, cutoff_xid)) ! { ! tuple->t_infomask |= HEAP_XMAX_LOCK_ONLY; ! changed = true; ! } } } else if (TransactionIdIsNormal(xid) && -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Tom Lane wrote: > Andres Freund writes: > > Any idea how to cheat our way out of that one given the current way > > heap_freeze_tuple() works (running on both primary and standby)? My only > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > > We can't even realistically create a new multixact with fewer members > > with the current format of xl_heap_freeze. > > Maybe we should just bite the bullet and change the WAL format for > heap_freeze (inventing an all-new record type, not repurposing the old > one, and allowing WAL replay to continue to accept the old one). The > implication for users would be that they'd have to update slave servers > before the master when installing the update; which is unpleasant, but > better than living with a known data corruption case. That was my suggestion too (modulo, I admit, the bit about it being a new, separate record type.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Magnus Hagander writes: > On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane wrote: >> Maybe we should just bite the bullet and change the WAL format for >> heap_freeze (inventing an all-new record type, not repurposing the old >> one, and allowing WAL replay to continue to accept the old one). The >> implication for users would be that they'd have to update slave servers >> before the master when installing the update; which is unpleasant, but >> better than living with a known data corruption case. > Agreed. It may suck, but it sucks less. > How badly will it break if they do the upgrade in the wrong order though. > Will the slaves just stop (I assume this?) or is there a risk of a > wrong-order upgrade causing extra breakage? I assume what would happen is the slave would PANIC upon seeing a WAL record code it didn't recognize. Installing the updated version should allow it to resume functioning. Would be good to test this, but if it doesn't work like that, that'd be another bug to fix IMO. We've always foreseen the possible need to do something like this, so it ought to work reasonably cleanly. > I assume we'd then get rid of the old record type completely in 9.4, right? Yeah, we should be able to drop it in 9.4, since we'll surely have other WAL format changes anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic configuration via LDAP in postmaster
Thank you for the advice, nowadays we(company where I work) use such scheme but that scheme is not always useful at the stage of development of the back-end infrastructure. Also we have found a better solution : have ldap for dynamic configuraion out of the box. So question is following: if you got such task, where do you put ldap functionality inside postgres daemon? PS I have read the basics of ideology Posgresql and I know why you guys make main server tiny in functionality and this is great, but we would like to have more complex solution. This feature may be useful not only for me and my company. On 3 December 2013 20:54, Heikki Linnakangas wrote: > On 12/03/2013 05:44 PM, Vasily Soshnikov wrote: > >> I need advise about where is best place for adding such features. >> >> Currently I found that 'postmaster' have event loop(including handling >> SIGHUP) inside PostgressMain(postgress.c) for realoding configuration >> file, based on my investigation my plan is handling ldap events just >> before >> SIGHUP. >> >> PS I guess tomorrow I will start implement this feature inside >> 'postmaster', but before I start I wish to know expert opinion about where >> are most good place for dispatching of incomming messages(about >> configuration has changed etc) from the ldap. >> > > postmaster is kept very small on purpose, because if that process dies, it > will take the whole server down with it. I'd suggest writing a custom > background worker that talks with the ldap server. It can then write the > changes to the configuration files, and send SIGHUP to reload. > > - Heikki > -- *Regard,Soshnikov Vasily mailto:dedok@gmail.com *
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane wrote: > Magnus Hagander writes: > > On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane wrote: > >> Maybe we should just bite the bullet and change the WAL format for > >> heap_freeze (inventing an all-new record type, not repurposing the old > >> one, and allowing WAL replay to continue to accept the old one). The > >> implication for users would be that they'd have to update slave servers > >> before the master when installing the update; which is unpleasant, but > >> better than living with a known data corruption case. > > > Agreed. It may suck, but it sucks less. > > > How badly will it break if they do the upgrade in the wrong order though. > > Will the slaves just stop (I assume this?) or is there a risk of a > > wrong-order upgrade causing extra breakage? > > I assume what would happen is the slave would PANIC upon seeing a WAL > record code it didn't recognize. Installing the updated version should > allow it to resume functioning. Would be good to test this, but if it > doesn't work like that, that'd be another bug to fix IMO. We've always > foreseen the possible need to do something like this, so it ought to > work reasonably cleanly. > Yeah, as long as that's tested and actually works, that sounds like an acceptable thing to deal with. > > I assume we'd then get rid of the old record type completely in 9.4, > right? > > Yeah, we should be able to drop it in 9.4, since we'll surely have other > WAL format changes anyway. > And even if not, there's no point in keeping it unless we actually support replication from 9.3 -> 9.4, I think, and I don't believe anybody has even considered working on that yet :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 13:14:38 -0500, Noah Misch wrote: > On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote: > > On 2013-12-03 10:29:54 -0500, Noah Misch wrote: > > > Sorry, my original report was rather terse. I speak of the scenario > > > exercised > > > by the second permutation in that isolationtester spec. The multixact is > > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. > > > 9.3.2 > > > does freeze it to InvalidTransactionId per the code I cited in my first > > > response on this thread, which wrongly removes a key lock. > > > > That one is clear, I was only confused about the Assert() you > > reported. But I think I've explained that elsewhere. > > > > I currently don't see fixing the errorneous freezing of lockers (not the > > updater though) without changing the wal format or synchronously waiting > > for all lockers to end. Which both see like a no-go? > > Not fixing it at all is the real no-go. We'd take both of those undesirables > before just tolerating the lost locks in 9.3. I think it's changing the wal format then. > The attached patch illustrates the approach I was describing earlier. It > fixes the test case discussed above. I haven't verified that everything else > in the system is ready for it, so this is just for illustration purposes. That might be better than the current state because the potential damage from such not frozen locks would be to get "could not access status of transaction ..." errors (*). But the problem I see with it is that a FOR UPDATE/NO KEY UPDATE lock taken out by UPDATE is different than the respective lock taken out by SELECT FOR SHARE: typedef enum { MultiXactStatusForKeyShare = 0x00, MultiXactStatusForShare = 0x01, MultiXactStatusForNoKeyUpdate = 0x02, MultiXactStatusForUpdate = 0x03, /* an update that doesn't touch "key" columns */ MultiXactStatusNoKeyUpdate = 0x04, /* other updates, and delete */ MultiXactStatusUpdate = 0x05 } MultiXactStatus; Ignoring the difference that way isn't going to fly nicely. *: which already are possible because we do not check multis properly against OldestVisibleMXactId[] anymore. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Noah Misch wrote: > > On 2013-12-03 10:29:54 -0500, Noah Misch wrote: > > > Sorry, my original report was rather terse. I speak of the scenario > > > exercised > > > by the second permutation in that isolationtester spec. The multixact is > > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all. > > > 9.3.2 > > > does freeze it to InvalidTransactionId per the code I cited in my first > > > response on this thread, which wrongly removes a key lock. > > > > That one is clear, I was only confused about the Assert() you > > reported. But I think I've explained that elsewhere. > > > > I currently don't see fixing the errorneous freezing of lockers (not the > > updater though) without changing the wal format or synchronously waiting > > for all lockers to end. Which both see like a no-go? > > Not fixing it at all is the real no-go. We'd take both of those undesirables > before just tolerating the lost locks in 9.3. Well, unless I misunderstand, this is only a problem in the case that cutoff_multi is not yet past but cutoff_xid is; and that there are locker transactions still running. So it's really a corner case. Not saying it's impossible to hit, mind. > The attached patch illustrates the approach I was describing earlier. It > fixes the test case discussed above. I haven't verified that everything else > in the system is ready for it, so this is just for illustration purposes. Wow, this is scary. I don't oppose it in principle, but we'd better go over the whole thing once more to ensure every place that cares is prepared to deal with this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 13:11:13 -0500, Tom Lane wrote: > Andres Freund writes: > > Any idea how to cheat our way out of that one given the current way > > heap_freeze_tuple() works (running on both primary and standby)? My only > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty. > > We can't even realistically create a new multixact with fewer members > > with the current format of xl_heap_freeze. > > Maybe we should just bite the bullet and change the WAL format for > heap_freeze (inventing an all-new record type, not repurposing the old > one, and allowing WAL replay to continue to accept the old one). The > implication for users would be that they'd have to update slave servers > before the master when installing the update; which is unpleasant, but > better than living with a known data corruption case. I wondered about that myself. How would you suggest the format to look like? ISTM per tuple we'd need: * OffsetNumber off * uint16 infomask * TransactionId xmin * TransactionId xmax I don't see why we'd need infomask2, but so far being overly skimpy in that place hasn't shown itself to be the greatest idea? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, 2013-12-03 at 10:08 +0200, Heikki Linnakangas wrote: > Another perspective is that that's already a situation we'd rather not > have. Let's not make it worse by introducing a third way to install an > extension, which again requires the extension author to package the > extension differently. +1. > Why? The external tool can pick the extension in its current form from > PGXN, and install it via libpq. The tool might have to jump through some > hoops to do it, and we might need some new backend functionality to > support it, but I don't see why the extension author needs to do anything. Ideally, it would end up the same whether it was installed by either method -- the same entries in pg_extension, etc. I assume that's what you mean by "new backend functionality". This sounds like Inline Extensions to me, which was previously proposed. If I recall, that proposal trailed off because of issues with dump/reload. If you dump the contents of the extension, it's not really an extension; but if you don't, then the administrator can't back up the database (because he might not have all of the extension templates for the extensions installed). That's when the idea appeared for extension templates stored in the catalog, so that the administrator would always have all of the necessary templates present. A few interesting messages I found: http://www.postgresql.org/message-id/CA +TgmobJ-yCHt_utgJJL9WiiPssUAJWFd=3=ulrob9nhbpc...@mail.gmail.com http://www.postgresql.org/message-id/CA +tgmoztujw7beyzf1dzdcrbg2djcvtyageychx8d52oe1g...@mail.gmail.com http://www.postgresql.org/message-id/CA +tgmoa_0d6ef8upc03qx0unhjfzozeosno_ofucf5jgw+8...@mail.gmail.com http://www.postgresql.org/message-id/18054.1354751...@sss.pgh.pa.us (+Robert,Tom because I am referencing their comments) Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Andres Freund wrote: > I wondered about that myself. How would you suggest the format to look > like? > ISTM per tuple we'd need: > > * OffsetNumber off > * uint16 infomask > * TransactionId xmin > * TransactionId xmax > > I don't see why we'd need infomask2, but so far being overly skimpy in > that place hasn't shown itself to be the greatest idea? I don't see that we need the xmin; a simple bit flag indicating whether the Xmin was frozen should be enough. For xmax we need more detail, as you propose. In infomask, are you proposing to store the complete infomask, or just the flags being changed? Note we have a set of bits used in other wal records, XLHL_XMAX_IS_MULTI and friends, which perhaps we can use here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
> "Tom" == Tom Lane writes: Tom> 1. I really hate the way you've overloaded the transvalue to do Tom> something that has approximately nothing to do with transition Tom> state (and haven't updated catalogs.sgml to explain that, Tom> either). Seems like it'd be cleaner to just hardwire a bool Tom> column that distinguishes regular and hypothetical input rows. The intention here is that while the provided functions all fit the spec's idea of how inverse distribution or hypothetical set functions work, the actual implementation mechanisms are more generally applicable than that and a user-supplied function could well find something else useful to do with them. Accordingly, hardcoding stuff seemed inappropriate. Tom> And why do you need aggtranssortop for that? I fail to see the Tom> point of sorting on the flag column. It is convenient to the implementation to be able to rely on encountering the hypothetical row deterministically before (or in some cases after, as in cume_dist) its peers in the remainder of the sort order. Removing that sort would make the results of the functions incorrect. There should probably be some comments about that. oops. Tom> 2 I also don't care for the definition of aggordnargs, which is Tom> the number of direct arguments to an ordered set function, Tom> except when it isn't. Rather than overloading it to be both a Tom> count and a couple of flags, I wonder whether we shouldn't Tom> expand aggisordsetfunc to be a three-way "aggregate kind" field Tom> (ordinary agg, ordered set, or hypothetical set function). It would still be overloaded in some sense because a non-hypothetical ordered set function could still take an arbitrary number of args (using variadic "any") - there aren't any provided, but there's no good reason to disallow user-defined functions doing that - so you'd still need a special value like -1 for aggordnargs to handle that. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, 2013-12-03 at 09:20 -0500, Stephen Frost wrote: > * Jeff Davis (pg...@j-davis.com) wrote: > > Stephen mentioned using external tools and/or metadata, but to me that > > sounds like it would require porting the extension away from what's on > > PGXN today. > > Not at all- and that'd be the point. An external tool could take the > PGXN extension, run 'make', then 'make install' (into a userland > directory), extract out the script and then, with a little help from PG, > run that script in "extension creation mode" via libpq. What is stopping Extension Templates, as proposed, from being this special "extension creation mode"? What would be a better design? It seems like the porting issue is just a matter of finding someone to write a tool to reliably translate packages from PGXN into a form suitable to be sent using SQL commands; which we would need anyway for this special mode. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why we are going to have to go DirectIO
All, https://lkml.org/lkml/2013/11/24/133 What this means for us: http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data It seems clear that Kernel.org, since 2.6, has been in the business of pushing major, hackish, changes to the IO stack without testing them or even thinking too hard about what the side-effects might be. This is perhaps unsurprising given that two of the largest sponsors of the Kernel -- who, incidentally, do 100% of the performance testing -- don't use the IO stack. This says to me that Linux will clearly be an undependable platform in the future with the potential to destroy PostgreSQL performance without warning, leaving us scrambling for workarounds. Too bad the alternatives are so unpopular. I don't know where we'll get the resources to implement our own storage, but it's looking like we don't have a choice. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello wrote: > On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse > wrote: >> >> Hi Fabrizio, >> >> looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It >> applies and compiles w/o errors or warnings. I set up a master and two >> hot standbys replicating from the master, one with 5 minutes delay and >> one without delay. After that I created a new database and generated >> some test data: >> >> CREATE TABLE test (val INTEGER); >> INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); >> >> The non-delayed standby nearly instantly had the data replicated, the >> delayed standby was replicated after exactly 5 minutes. I did not >> notice any problems, errors or warnings. >> > > Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote: > On 2013-12-03 13:14:38 -0500, Noah Misch wrote: > > On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote: > > > I currently don't see fixing the errorneous freezing of lockers (not the > > > updater though) without changing the wal format or synchronously waiting > > > for all lockers to end. Which both see like a no-go? > > > > Not fixing it at all is the real no-go. We'd take both of those > > undesirables > > before just tolerating the lost locks in 9.3. > > I think it's changing the wal format then. I'd rather have an readily-verifiable fix that changes WAL format than a tricky fix that avoids doing so. So, modulo not having seen the change, +1. > > The attached patch illustrates the approach I was describing earlier. It > > fixes the test case discussed above. I haven't verified that everything > > else > > in the system is ready for it, so this is just for illustration purposes. > > That might be better than the current state because the potential damage > from such not frozen locks would be to get "could not access status of > transaction ..." errors (*). > *: which already are possible because we do not check multis properly > against OldestVisibleMXactId[] anymore. Separate issue. That patch adds to the ways we can end up with an extant multixact referencing an locker XID no longer found it CLOG, but it doesn't introduce that problem. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On Tue, Dec 3, 2013 at 1:44 PM, Josh Berkus wrote: > All, > > https://lkml.org/lkml/2013/11/24/133 > > What this means for us: > > http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data > > It seems clear that Kernel.org, since 2.6, has been in the business of > pushing major, hackish, changes to the IO stack without testing them or > even thinking too hard about what the side-effects might be. This is > perhaps unsurprising given that two of the largest sponsors of the > Kernel -- who, incidentally, do 100% of the performance testing -- don't > use the IO stack. > > This says to me that Linux will clearly be an undependable platform in > the future with the potential to destroy PostgreSQL performance without > warning, leaving us scrambling for workarounds. Too bad the > alternatives are so unpopular. > > I don't know where we'll get the resources to implement our own storage, > but it's looking like we don't have a choice. This seems like a strange reaction to an article that's mostly about how Linux is now *fixing* a problem that could cause PostgreSQL to experience performance problems. I agree that we'll probably eventually need to implement our own storage layer, but this article isn't evidence of urgency so far as I can determine on first read-through. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12/03/2013 10:46 AM, Robert Haas wrote: On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. I would agree that it is a good idea. Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 13:49:49 -0500, Noah Misch wrote: > On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote: > > On 2013-12-03 13:14:38 -0500, Noah Misch wrote: > > > Not fixing it at all is the real no-go. We'd take both of those > > > undesirables > > > before just tolerating the lost locks in 9.3. > > > > I think it's changing the wal format then. > > I'd rather have an readily-verifiable fix that changes WAL format than a > tricky fix that avoids doing so. So, modulo not having seen the change, +1. Well, who's going to write that then? I can write something up, but I really would not like not to be solely responsible for it. That means we cannot release 9.3 on schedule anyway, right? > > > The attached patch illustrates the approach I was describing earlier. It > > > fixes the test case discussed above. I haven't verified that everything > > > else > > > in the system is ready for it, so this is just for illustration purposes. > > > > > That might be better than the current state because the potential damage > > from such not frozen locks would be to get "could not access status of > > transaction ..." errors (*). > > > > *: which already are possible because we do not check multis properly > > against OldestVisibleMXactId[] anymore. > > Separate issue. That patch adds to the ways we can end up with an extant > multixact referencing an locker XID no longer found it CLOG, but it doesn't > introduce that problem. Sure, that was an argument in favor of your idea, not against it ;). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
On Tue, Dec 03, 2013 at 10:03:32AM -0500, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > After sleeping on it, your other suggestion of TABLE OF, or possibly > > TABLE FROM, is starting to grow on me. > > > > Who else has an opinion? > > Alright, for my 2c, I like having this syntax include 'TABLE' simply > because it's what folks coming from Oracle might be looking for. > Following from that, to keep it distinct from the spec's notion of > 'TABLE', my preference is 'TABLE FROM'. I don't particularly like > 'TABLE OF', nor do I like the various 'ROWS' suggestions. I like having "ROWS" in there somehow, because it denotes the distinction from SQL-standard TABLE(). Suppose we were to implement the SQL-standard TABLE(), essentially just mapping it to UNNEST(). Then we'd have "TABLE (f())" that unpacks the single array returned by f(), and we'd have "TABLE FROM (f())" that unpacks the set of rows returned by f(). The word "FROM" alone does not indicate that difference the way including "ROWS" does. (I don't object to having "FROM" in addition to "ROWS".) -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On 2013-12-03 15:40:44 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > I wondered about that myself. How would you suggest the format to look > > like? > > ISTM per tuple we'd need: > > > > * OffsetNumber off > > * uint16 infomask > > * TransactionId xmin > > * TransactionId xmax > > > > I don't see why we'd need infomask2, but so far being overly skimpy in > > that place hasn't shown itself to be the greatest idea? > I don't see that we need the xmin; a simple bit flag indicating whether > the Xmin was frozen should be enough. Yea, that would work as well. > For xmax we need more detail, as you propose. In infomask, are you > proposing to store the complete infomask, or just the flags being > changed? Note we have a set of bits used in other wal records, > XLHL_XMAX_IS_MULTI and friends, which perhaps we can use here. Tentatively the complete one. I don't think we'd win enough by using compute_infobits/fix_infomask_from_infobits and we'd need to extend the bits stored in there unless we are willing to live with not transporting XMIN/XMAX_COMMITTED which doesn't seem like a good idea. Btw, why is it currently ok to modify the tuple in heap_freeze_tuple() without being in a critical section? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-03 13:46:28 -0500, Robert Haas wrote: > On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello > wrote: > > On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse > > wrote: > >> > >> Hi Fabrizio, > >> > >> looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It > >> applies and compiles w/o errors or warnings. I set up a master and two > >> hot standbys replicating from the master, one with 5 minutes delay and > >> one without delay. After that I created a new database and generated > >> some test data: > >> > >> CREATE TABLE test (val INTEGER); > >> INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); > >> > >> The non-delayed standby nearly instantly had the data replicated, the > >> delayed standby was replicated after exactly 5 minutes. I did not > >> notice any problems, errors or warnings. > >> > > > > Thanks for your review Christian... > > So, I proposed this patch previously and I still think it's a good > idea, but it got voted down on the grounds that it didn't deal with > clock drift. I view that as insufficient reason to reject the > feature, but others disagreed. I really fail to see why clock drift should be this patch's responsibility. It's not like the world would go under^W data corruption would ensue if the clocks drift. Your standby would get delayed imprecisely. Big deal. From what I know of potential users of this feature, they would set it to at the very least 30min - that's WAY above the range for acceptable clock-drift on servers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/03/2013 10:44 AM, Josh Berkus wrote: All, https://lkml.org/lkml/2013/11/24/133 What this means for us: http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data It seems clear that Kernel.org, since 2.6, has been in the business of pushing major, hackish, changes to the IO stack without testing them or even thinking too hard about what the side-effects might be. This is perhaps unsurprising given that two of the largest sponsors of the Kernel -- who, incidentally, do 100% of the performance testing -- don't use the IO stack. This says to me that Linux will clearly be an undependable platform in the future with the potential to destroy PostgreSQL performance without warning, leaving us scrambling for workarounds. Too bad the alternatives are so unpopular. I don't know where we'll get the resources to implement our own storage, but it's looking like we don't have a choice. This seems rather half cocked. I read the article. They found a problem, that really will only affect a reasonably small percentage of users, created a test case, reported it, and a patch was produced. Kind of like how we do it. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian wrote: > On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote: >> > I wonder if we ought to mark each page as all-visible in >> > raw_heap_insert() when we first initialize it, and then clear the flag >> > when we come across a tuple that isn't all-visible. We could try to >> > set hint bits on the tuple before placing it on the page, too, though >> > I'm not sure of the details. >> >> I went with the per-page approach because I wanted to re-use the vacuum >> lazy function. Is there some other code that does this already? I am >> trying to avoid yet-another set of routines that would need to be >> maintained or could be buggy. This hit bit setting is tricky. >> >> And thanks much for the review! > > So, should I put this in the next commit fest? +1. > I still have an unknown > about the buffer number to use here: > > ! /* XXX use 0 or real offset? */ > ! ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ? > ! BufferGetBlockNumber(buf) : 0, offnum); > > Is everyone else OK with this approach? Updated patch attached. I started looking at this a little more the other day but got bogged down in other things before I got through it all. I think we're going to want to revise this so that it doesn't go through the functions that current assume that they're always deal with a shared_buffer, but I haven't figured out exactly what the most elegant way of doing that is yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Select query performance and shared buffers
We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or table joins. When we send concurrent Select queries to these tables, query performance doesn't scale out with the number of CPU cores. We find that complex Select queries scale out better than simpler ones. We also find that increasing the block size from 8 KB to 32 KB, or increasing shared_buffers to include the working set mitigates the problem to some extent. For our experiments, we chose an 8-core machine with 68 GB of memory from Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and set shared_buffers to 4 GB. We then generated 1, 2, 4, and 8 separate tables using the data generator from the industry standard TPC-H benchmark. Each table we generated, called lineitem-1, lineitem-2, etc., had about 750 MB of data. Next, we sent 1, 2, 4, and 8 concurrent Select queries to these tables to observe the scale out behavior. Our expectation was that since this machine had 8 cores, our run times would stay constant all throughout. Also, we would have expected the machine's CPU utilization to go up to 100% at 8 concurrent queries. Neither of those assumptions held true. We found that query run times degraded as we increased the number of concurrent Select queries. Also, CPU utilization flattened out at less than 50% for the simpler queries. Full results with block size of 8KB are below: Table select count(*)TPC-H Simple (#6)[2] TPC-H Complex (#1)[1] 1 Table / 1 query 1.5 s2.5 s 8.4 s 2 Tables / 2 queries 1.5 s2.5 s 8.4 s 4 Tables / 4 queries 2.0 s2.9 s 8.8 s 8 Tables / 8 queries 3.3 s4.0 s 9.6 s We then increased the block size (BLCKSZ) from 8 KB to 32 KB and recompiled PostgreSQL. This change had a positive impact on query completion times. Here are the new results with block size of 32 KB: Table select count(*)TPC-H Simple (#6)[2] TPC-H Complex (#1)[1] 1 Table / 1 query 1.5 s2.3 s 8.0 s 2 Tables / 2 queries 1.5 s2.3 s 8.0 s 4 Tables / 4 queries 1.6 s2.4 s 8.1 s 8 Tables / 8 queries 1.8 s2.7 s 8.3 s As a quick side, we also repeated the same experiment on an EC2 instance with 16 CPU cores, and found that the scale out behavior became worse there. (We also tried increasing the shared_buffers to 30 GB. This change completely solved the scaling out problem on this instance type, but hurt our performance on the hi1.4xlarge instances.) Unfortunately, increasing the block size from 8 to 32 KB has other implications for some of our customers. Could you help us out with the problem here? What can we do to identify the problem's root cause? Can we work around it? Thank you, Metin [1] http://examples.citusdata.com/tpch_queries.html#query-1 [2] http://examples.citusdata.com/tpch_queries.html#query-6
[HACKERS] Problem with displaying "wide" tables in psql
Hi. Psql definitely have a problem with displaying "wide" tables. Even in expanded mode, they look horrible. So I tried to solve this problem. Before the patch: postgres=# \x 1 Expanded display (expanded) is on. postgres=# \pset border 2 Border style (border) is 2. postgres=# select * from pg_stats; +-[ RECORD 1 ]---+-- --+ | schemaname | pg_catalog | | tablename | pg_proc ... and after: +-[ RECORD 1 ]---+-+ | schemaname | pg_catalog | | tablename | pg_proc | | attname| proname | | inherited | f | | null_frac | 0 | | avg_width | 64 | | n_distinct | -0.823159 | | most_common_vals | {max,min,overlaps,has_column_privileg
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Andres Freund wrote: > On 2013-12-03 13:49:49 -0500, Noah Misch wrote: > > On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote: > > > On 2013-12-03 13:14:38 -0500, Noah Misch wrote: > > > > Not fixing it at all is the real no-go. We'd take both of those > > > > undesirables > > > > before just tolerating the lost locks in 9.3. > > > > > > I think it's changing the wal format then. > > > > I'd rather have an readily-verifiable fix that changes WAL format than a > > tricky fix that avoids doing so. So, modulo not having seen the change, +1. > > Well, who's going to write that then? I can write something up, but I > really would not like not to be solely responsible for it. I will give this a shot. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying "wide" tables in psql
Hello do you know a pager less trick http://merlinmoncure.blogspot.cz/2007/10/better-psql-with-less.html Regards Pavel Stehule 2013/12/3 Sergey Muraviov > Hi. > > Psql definitely have a problem with displaying "wide" tables. > Even in expanded mode, they look horrible. > So I tried to solve this problem. > > Before the patch: > postgres=# \x 1 > Expanded display (expanded) is on. > postgres=# \pset border 2 > Border style (border) is 2. > postgres=# select * from pg_stats; > > +-[ RECORD 1 > ]---+-- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --+ > | schemaname | pg_catalog > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/03/2013 10:59 AM, Joshua D. Drake wrote: > This seems rather half cocked. I read the article. They found a problem, > that really will only affect a reasonably small percentage of users, > created a test case, reported it, and a patch was produced. "Users with at least one file bigger than 50% of RAM" is unlikely to be a small percentage. > > Kind of like how we do it. I like to think we'd have at least researched the existing literature on 2Q algorithms (which is extensive) before implementing our own. Oh, wait, we *did*. Nor is this the first ill-considered performance hack pushed into mainline kernels without any real testing. It's not even the first *that year*. While I am angry over this -- no matter what Kernel.org fixes now, we're going to have to live with their mistake for 3 years -- the DirectIO thing isn't just me; when I've gone to Linux Kernel events to talk about IO, that's the response I've gotten from most Linux hackers: "you shouldn't be using the filesystem, use DirectIO and implement your own storage." That's why they don't feel that it's a problem to break the IO stack; they really don't believe that anyone who cares about performance should be using it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Jeff Davis (pg...@j-davis.com) wrote: > This sounds like Inline Extensions to me, which was previously proposed. I've not looked at that proposal very carefully, but I agree that what we're describing is a lot closer to 'inline extensions' than 'extension templates'. > If I recall, that proposal trailed off because of issues with > dump/reload. If you dump the contents of the extension, it's not really > an extension; but if you don't, then the administrator can't back up the > database (because he might not have all of the extension templates for > the extensions installed). That's when the idea appeared for extension > templates stored in the catalog, so that the administrator would always > have all of the necessary templates present. When it comes to dump/reload, I'd much rather see a mechanism which uses our deep understanding of the extension's objects (as database objects) to implement the dump/reload than a text blob which is carried forward from major version to major version and may even fail to run. I realize that's different from extension files which are out on the filesystem, but I do not see that as a bad thing. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
Noah Misch writes: > On Tue, Dec 03, 2013 at 10:03:32AM -0500, Stephen Frost wrote: >> Alright, for my 2c, I like having this syntax include 'TABLE' simply >> because it's what folks coming from Oracle might be looking for. >> Following from that, to keep it distinct from the spec's notion of >> 'TABLE', my preference is 'TABLE FROM'. I don't particularly like >> 'TABLE OF', nor do I like the various 'ROWS' suggestions. > I like having "ROWS" in there somehow, because it denotes the distinction from > SQL-standard TABLE(). Suppose we were to implement the SQL-standard TABLE(), > essentially just mapping it to UNNEST(). Then we'd have "TABLE (f())" that > unpacks the single array returned by f(), and we'd have "TABLE FROM (f())" > that unpacks the set of rows returned by f(). The word "FROM" alone does not > indicate that difference the way including "ROWS" does. Hm ... fair point, except that "ROWS" doesn't seem to suggest the right thing either, at least not to me. After further thought I've figured out what's been grating on me about Noah's suggestions: he suggests that we're distinguishing "TABLE [FROM ELEMENTS]" from "TABLE FROM ROWS", but this is backwards. What UNNEST() really does is take an array, extract the elements, and make a table of those. Similarly, what our feature does is take a set (the result of a set-returning function), extract the rows, and make a table of those. So what would seem appropriate to me is "TABLE [FROM ARRAY]" versus "TABLE FROM SET". Now I find either of those phrases to be one word too many, but the key point is that I'd probably prefer something involving SET over something involving ROWS. (Both of those are unreserved_keyword, so this doesn't move the ball at all in terms of finding an unambiguous syntax.) Another issue is that if you are used to the Oracle syntax, in which an UNNEST() is presumed, it's not exactly clear that TABLE ROWS, or any other phrase including TABLE, *doesn't* also imply an UNNEST. So to me that's kind of a strike against Stephen's preference --- I'm thinking we might be better off not using the word TABLE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Stephen Frost writes: > When it comes to dump/reload, I'd much rather see a mechanism which uses > our deep understanding of the extension's objects (as database objects) > to implement the dump/reload than a text blob which is carried forward > from major version to major version and may even fail to run. Note that we're already doing that in the binary_upgrade code path. I agree that generalizing that approach sounds like a better idea than keeping a text blob around. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
On Tue, Dec 3, 2013 at 02:01:52PM -0500, Robert Haas wrote: > On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian wrote: > > On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote: > >> > I wonder if we ought to mark each page as all-visible in > >> > raw_heap_insert() when we first initialize it, and then clear the flag > >> > when we come across a tuple that isn't all-visible. We could try to > >> > set hint bits on the tuple before placing it on the page, too, though > >> > I'm not sure of the details. > >> > >> I went with the per-page approach because I wanted to re-use the vacuum > >> lazy function. Is there some other code that does this already? I am > >> trying to avoid yet-another set of routines that would need to be > >> maintained or could be buggy. This hit bit setting is tricky. > >> > >> And thanks much for the review! > > > > So, should I put this in the next commit fest? > > +1. OK, I added it to the January commit fest. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 18 October 2013 19:03, Fabrízio de Royes Mello wrote: > The attached patch is a continuation of Robert's work [1]. Reviewing v2... > I made some changes: > - use of Latches instead of pg_usleep, so we don't have to wakeup regularly. OK > - call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because > might change the trigger file's location OK > - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and > XLOG_XACT_COMMIT_COMPACT checks Why just those? Why not aborts and restore points also? > - don't care about clockdrift because it's an admin problem. Few minor points on things * The code with comment "Clear any previous recovery delay time" is in wrong place, move down or remove completely. Setting the delay to zero doesn't prevent calling recoveryDelay(), so that logic looks wrong anyway. * The loop exit in recoveryDelay() is inelegant, should break if <= 0 * There's a spelling mistake in sample * The patch has whitespace in one place and one important point... * The delay loop happens AFTER we check for a pause. Which means if the user notices a problem on a commit, then hits pause button on the standby, the pause will have no effect and the next commit will be applied anyway. Maybe just one commit, but its an off by one error that removes the benefit of the patch. So I think we need to test this again after we finish delaying if (xlogctl->recoveryPause) recoveryPausesHere(); We need to explain in the docs that this is intended only for use in a live streaming deployment. It will have little or no meaning in a PITR. I think recovery_time_delay should be called _apply_delay to highlight the point that it is the apply of records that is delayed, not the receipt. And hence the need to document that sync rep is NOT slowed down by setting this value. And to make the name consistent with other parameters, I suggest min_standby_apply_delay We also need to document caveats about the patch, in that it only delays on timestamped WAL records and other records may be applied sooner than the delay in some circumstances, so it is not a way to avoid all cancellations. We also need to document the behaviour of the patch is to apply all data received as quickly as possible once triggered, so the specified delay does not slow down promoting the server to a master. That might also be seen as a negative behaviour, since promoting the master effectively sets recovery_time_delay to zero. I will handle the additional documentation, if you can update the patch with the main review comments. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Jeff Davis (pg...@j-davis.com) wrote: > On Tue, 2013-12-03 at 09:20 -0500, Stephen Frost wrote: > > * Jeff Davis (pg...@j-davis.com) wrote: > > > Stephen mentioned using external tools and/or metadata, but to me that > > > sounds like it would require porting the extension away from what's on > > > PGXN today. > > > > Not at all- and that'd be the point. An external tool could take the > > PGXN extension, run 'make', then 'make install' (into a userland > > directory), extract out the script and then, with a little help from PG, > > run that script in "extension creation mode" via libpq. > > What is stopping Extension Templates, as proposed, from being this > special "extension creation mode"? What would be a better design? The extra catalog tables which store SQL scripts in text columns is one of my main objections to the as-proposed Extension Templates. I view those scripts as a poor man's definition of database objects which are defined properly in the catalog already. The other big issue is that there isn't an easy way to see how we could open up the ability to create extensions to non-superusers with this approach. What I think we should really be mulling over is if we need anything further when it comes to non-superuser extensions; a new namespace (eg: schemas for extensions, or maybe prefix for user extensions, or just a notion of ownership which gets combined with the name when doing operations with an extension)? a new name (not extensions, but something else)? > It seems like the porting issue is just a matter of finding someone to > write a tool to reliably translate packages from PGXN into a form > suitable to be sent using SQL commands; which we would need anyway for > this special mode. That's what I was thinking and hoping. :) Of course, we haven't yet figured out exactly what we want this special mode to look like, so it's a bit tricky to ask anyone to write such a tool. I keep thinking this should be something like: create a schema, set the search path to that schema, run the extension script more-or-less as is, then 'register' that schema as being an extension with a certain version. That 'registration' process could also handle renaming the schema, if the user wants the extension in a different schema (or perhaps the initial schema was some kind of "temporary" schema) or moving the objects into an existing schema, if that's what is requested. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Another issue is that if you are used to the Oracle syntax, in which an > UNNEST() is presumed, it's not exactly clear that TABLE ROWS, or any other > phrase including TABLE, *doesn't* also imply an UNNEST. So to me that's > kind of a strike against Stephen's preference --- I'm thinking we might be > better off not using the word TABLE. I see the concern there, but I would think a bit of documentation around that would help them find UNNEST quickly, if that's what they're really looking for. On the flip side, I imagine it could be jarring seeing 'TABLE FROM' when you're used to Oracle's 'TABLE'. I haven't got any great suggestions about how to incorporate 'SET' and I I do still like 'TABLE' as that's what we're building, but I'll be happy to have this capability even if it's 'TABLE FROM SET ROWS THING'. Thanks, Stephen signature.asc Description: Digital signature