Re: Remove page-read callback from XLogReaderState.
Rebased. I intentionally left duplicate code in XLogNeedData but changed my mind to remove it. It makes the function small and clearer. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 7c4ce152d248546c8f56057febae6b17b6fa71bb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 5 Sep 2019 20:21:55 +0900 Subject: [PATCH v9 1/4] Move callback-call from ReadPageInternal to XLogReadRecord. The current WAL record reader reads page data using a call back function. Although it is not so problematic alone, it would be a problem if we are going to do add tasks like encryption which is performed on page data before WAL reader reads them. To avoid that the record reader facility has to have a new code path corresponds to every new callback, this patch separates page reader from WAL record reading facility by modifying the current WAL record reader to a state machine. As the first step of that change, this patch moves the page reader function out of ReadPageInternal, then the remaining tasks of the function are taken over by the new function XLogNeedData. As the result XLogPageRead directly calls the page reader callback function according to the feedback from XLogNeedData. --- src/backend/access/transam/xlog.c | 16 +- src/backend/access/transam/xlogreader.c | 273 ++ src/backend/access/transam/xlogutils.c| 10 +- .../replication/logical/logicalfuncs.c| 2 +- src/backend/replication/walsender.c | 10 +- src/bin/pg_rewind/parsexlog.c | 16 +- src/bin/pg_waldump/pg_waldump.c | 8 +- src/include/access/xlogreader.h | 23 +- src/include/access/xlogutils.h| 2 +- src/include/replication/logicalfuncs.h| 2 +- src/test/recovery/t/011_crash_recovery.pl | 1 + 11 files changed, 213 insertions(+), 150 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b602e28f61..a7630c643a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -884,7 +884,7 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, int source, bool notfoundOk); static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source); -static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, +static bool XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf); static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr); @@ -4249,7 +4249,6 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, XLogRecord *record; XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data; - /* Pass through parameters to XLogPageRead */ private->fetching_ckpt = fetching_ckpt; private->emode = emode; private->randAccess = (RecPtr != InvalidXLogRecPtr); @@ -11540,7 +11539,7 @@ CancelBackup(void) * XLogPageRead() to try fetching the record from another source, or to * sleep and retry. */ -static int +static bool XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf) { @@ -11599,7 +11598,8 @@ retry: readLen = 0; readSource = 0; - return -1; + xlogreader->readLen = -1; + return false; } } @@ -11694,7 +11694,8 @@ retry: goto next_record_is_invalid; } - return readLen; + xlogreader->readLen = readLen; + return true; next_record_is_invalid: lastSourceFailed = true; @@ -11708,8 +11709,9 @@ next_record_is_invalid: /* In standby-mode, keep trying */ if (StandbyMode) goto retry; - else - return -1; + + xlogreader->readLen = -1; + return false; } /* diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index c8b0d2303d..66f3bc5597 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -34,8 +34,8 @@ static void report_invalid_record(XLogReaderState *state, const char *fmt,...) pg_attribute_printf(2, 3); static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); -static int ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, - int reqLen); +static bool XLogNeedData(XLogReaderState *state, XLogRecPtr pageptr, + int reqLen, bool header_inclusive); static void XLogReaderInvalReadState(XLogReaderState *state); static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess); @@ -244,7 +244,6 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg) uint32 targetRecOff; uint32 pageHeaderSize; bool gotheader; - int readOff; /* * randAccess indicates whether to
Re: Fix comment in XLogFileInit()
On Mon, Oct 21, 2019 at 5:28 PM Amit Kapila wrote: > > On Mon, Oct 21, 2019 at 12:28 PM Fujii Masao wrote: > > > > I found that the argument name of XLogFileInit() is wrong in its comment. > > Attached is the patch that fixes that typo. > > > > LGTM. Thanks for checking! Committed. Regards, -- Fujii Masao
Re: pgbench - extend initialization phase control
On Thu, Oct 17, 2019 at 8:09 PM Fabien COELHO wrote: > > > Hello, > > > Failed regression test. It's necessary to change the first a in “allowed > > step characters are” to uppercase A in the regression test of > > 002_pgbench_no_server.pl. > > Argh. I think I ran the test, then stupidly updated the message afterwards > to better match best practices, without rechecking:-( > > > The behavior of "g" is different between v12 and the patche, and > > backward compatibility is lost. In v12, BEGIN and COMMIT are specified > > only by choosing "g". It's a problem that backward compatibility is > > lost. > > Somehow yes, but I do not see this as an actual problem from a functional > point of view: it just means that if you use a 'dtgvp' with the newer > version and if the inserts were to fail, then they are not under an > explicit transaction, so previous inserts are not cleaned up. However, > this is a pretty unlikely case, and anyway the error is reported, so any > user would be expected not to go on after the initialization phase. > > So basically I do not see the very small regression for an unlikely corner > case to induce any problem in practice. > > The benefit of controlling where begin/end actually occur is that it may > have an impact on performance, and it allows to check that. I still fail to understand the benefit of addition of () settings. Could you clarify what case () settings are useful for? You are thinking to execute all initialization SQL statements within single transaction, e.g., -I (dtgp), for some reasons? > > When using ( and ) with the -I, the documentation should indicate that > > double > > quotes are required, > > Or single quotes, or backslash, if launch from the command line. I added a > mention of escaping or protection in the doc in that case. What about using, for example, b (BEGIN) and c (COMMIT) instead to avoid such restriction? > > and "v" not be able to enclose in ( and ). > > That is a postgresql limitation, which may evolve. There could be others. > I updated the doc to say that some commands may not work inside an > explicit transaction. I think that it's better to check whehter "v" is enclosed with () or not at the beginning of pgbench, and report an error if it is. Otherwise, if -I (dtgv) is specified, pgbench reports an error after time-consuming data generation is performed, and of course that data generation is rollbacked. Regards, -- Fujii Masao
Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could not find tuple for parent of relation ...
On Tue, Oct 15, 2019 at 11:40:47AM -0500, Justin Pryzby wrote: > Not only can't I DROP the _ccold indexes, but also dropping the table doesn't > cause them to be dropped, and then I can't even slash dee them anymore: Yes, I can confirm the report. In the case of this scenario the reindex is waiting for the first transaction to finish before step 5, the cancellation causing the follow-up process to not be done (set_dead & the next ones). So at this stage the swap has actually happened. I am still analyzing the report in depths, but you don't have any problems with a plain index when interrupting at this stage, and the old index can be cleanly dropped with the new one present, so my first thoughts are that we are just missing some more dependency cleanup at the swap phase when dealing with a partition index. -- Michael signature.asc Description: PGP signature
Re: Fix of fake unlogged LSN initialization
At Mon, 21 Oct 2019 14:03:47 +0900, Michael Paquier wrote in > On Sat, Oct 19, 2019 at 05:03:00AM +, tsunakawa.ta...@fujitsu.com wrote: > > The attached trivial patch fixes the initialization of the fake > > unlogged LSN. Currently, BootstrapXLOG() in initdb sets the initial > > fake unlogged LSN to FirstNormalUnloggedLSN (=1000), but the > > recovery and pg_resetwal sets it to 1. The patch modifies the > > latter two cases to match initdb. > > > > I don't know if this do actual harm, because the description of > > FirstNormalUnloggedLSN doesn't give me any idea. > > From xlogdefs.h added by 9155580: > /* > * First LSN to use for "fake" LSNs. > * > * Values smaller than this can be used for special per-AM purposes. > */ > #define FirstNormalUnloggedLSN ((XLogRecPtr) 1000) > > So it seems to me that you have caught a bug here, and that we had > better back-patch to v12 so as recovery and pg_resetwal don't mess up > with AMs using lower values than that. +1 -- Kyotaro Horiguchi NTT Open Source Software Center
回复:回复:回复:Bug about drop index concurrently
>That suggests you're doing a lot of 'drop index concurrently', right? Not completely, In the actual scene, in fact, I didn't perform too much 'drop index concurrently' on the master. I just just execute a lot of queries on the standby. You know, it will have a certain probability every time you execute ‘drop index concurrently’ on the master. Although it is small, it may appear. >Yes, but we can't really do that, I'm afraid. > >We certainly can't do that on the master because we simply don't have >the necessary information about locks from the standby, and we really >don't want to have it, because with a busy standby that might be quite a >bit of data (plust the standby would have to wait for the master to >confirm each lock acquisition, I think which seems pretty terrible). > yeah, we can't do this and will lose too much in order to achieve this. >On the standby, we don't really have an idea that the there's a drop >index running - we only get information about AE locks, and a bunch of >catalog updates. I don't think we have a way to determine this is a drop >index in concurrent mode :-( > >More preciresly, the master sends information about AccessExclusiveLock >in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when >the standby replays that, it should acquire those locks. > >For regular DROP INDEX we send this: > >rmgr: Standby ... desc: LOCK xid db 16384 rel 16385 >rmgr: Standby ... desc: LOCK xid db 16384 rel 20573 >... catalog changes ... >rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST; > rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache > 6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608 > >while for DROP IDNEX CONCURRENTLY we send this > >rmgr: Heap... desc: INPLACE ... catalog update >rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32 >relcache 21288 relcache 16385 >rmgr: Heap... desc: INPLACE ... catalog update >rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32 > relcache 21288 relcache 16385 >rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288 >... catalog updates ... >rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.042568 CEST; > rels: base/16384/21288; inval msgs: catcache 32 catcache 7 catcache 6 > catcache 50 catcache 49 relcache 21288 relcache 16385 snapshot 2608 > yeah, you are right, I got this . >So just a single lock on the index, but not the lock on the relation >itself (which makes sense, because for DROP INDEX CONCURRENTLY we don'>t get >an exclusive lock on the table). > >I'm not quite familiar with this part of the code, but the SELECT >backends are clearly getting a stale list of indexes from relcache, and >try to open an index which was already removed by the redo. We do >acquire the lock on the index itself, but that's not sufficient :-( > >Not sure how to fix this. I wonder if we could invalidate the relcache >for the relation at some point. --method one -- >Or maybe we could add additional >information to the WAL to make the redo wait for all lock waiters, just >like on the master. But that might be tricky because of deadlocks, and >because the redo could easily get "stuck" waiting for a long-running >queries. --method two -- for the method one , I don't think this can solve the problem at all. Because we can't predict on the standby when the master will 'drop index concurrently'. I also have a way to manually reproduce this bug in order to make it easier for you to understand the problem. You can try to follow the steps below: 1. We build a connection on the standby, then use gdb to attach to the backend process 2. We set a breakpoint at plancat.c:196 in get_relation_info 3. We execute 'explain select ' on this backend, in gdb you will see that the breakpoint has hit, and the query will hang. 4.We execute 'drop index concurrently' on the master. (If 'drop index' will be blocked, because there is a query on our standby, the master can not get EXlock). 5. on the standby ,let gdb continue. We will see an error(ERROR: could not open relation with OIDXXX) in the standby client. Therefore, we can see that the query is executed first by the standby, and then the 'drop index concurrently' executed by the master. So no matter how fast we can make relcache invalid, there is no way to prevent this error from happening, because there is always a moment when the index has been dropped by the master. In other words, there is no way to predict the master's 'drop index concurrently' on standby. for the method two, you are right, deadlock is possible, and the most unbearable is that it will bring delay to the log apply on standb, resulting in inconsistent data between the master and the standby. All in all, I think this bug is a flaw in postgres design. We need to think carefully about how to handle it better. Even we can learn from other database products. I hope I can help you. Thank
Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
On Thu, Oct 24, 2019 at 11:42:04AM +0900, Michael Paquier wrote: > Please see the attached. Justin, does it fix your problems regarding > the locks? Confirmed. Thanks, Justin
Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
On Wed, Oct 23, 2019 at 07:18:33PM +0900, Michael Paquier wrote: > I can confirm that this is an issue related to session locks which are > not cleaned up when in an out-of-transaction state, state that can be > reached between a transaction commit or start while holding at least > one session lock within one single command of VACUUM, CIC or REINDEX > CONCURRENTLY. Please let me back-pedal a bit on this one after sleeping on it. Actually, if you look at CIC and VACUUM, those code paths are much more careful regarding the position of CHECK_FOR_INTERRUPTS() than REINDEX CONCURRENTLY is in the fact that they happen only within a transaction context. In the case of REINDEX CONCURRENTLY and the failure reported here, the current code is careless: it depends of course on the timing of statement_timeout, but session locks would remain behind when hitting an interruption at the beginning of phase 2 or 3 in indexcmds.c. So the answer is simple: by moving the interrupt checks within a transaction context, the problem gets solved. This also fixes a second issue as the original code would cause xact.c to generate some useless warnings. Please see the attached. Justin, does it fix your problems regarding the locks? For me it does. > The failure is actually pretty easy to reproduce if you > add an elog(ERROR) after a CommitTransactionCommand() call and then > shut down the connection. I am starting a new thread about that. The > problem is larger than it looks, and exists for a long time. I am still wondering if we could put more safeguards in this area though... -- Michael diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 4448b986b6..e1bd81a1d6 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -3057,11 +3057,16 @@ ReindexRelationConcurrently(Oid relationOid, int options) Oid newIndexId = lfirst_oid(lc2); Oid heapId; - CHECK_FOR_INTERRUPTS(); - /* Start new transaction for this index's concurrent build */ StartTransactionCommand(); + /* + * Check for user-requested abort. This is inside a transaction + * so as xact.c does not issue a useless WARNING, and ensures that + * session-level locks are cleaned up on abort. + */ + CHECK_FOR_INTERRUPTS(); + /* Set ActiveSnapshot since functions in the indexes may need it */ PushActiveSnapshot(GetTransactionSnapshot()); @@ -3101,10 +3106,15 @@ ReindexRelationConcurrently(Oid relationOid, int options) TransactionId limitXmin; Snapshot snapshot; - CHECK_FOR_INTERRUPTS(); - StartTransactionCommand(); + /* + * Check for user-requested abort. This is inside a transaction + * so as xact.c does not issue a useless WARNING, and ensures that + * session-level locks are cleaned up on abort. + */ + CHECK_FOR_INTERRUPTS(); + heapId = IndexGetRelation(newIndexId, false); /* @@ -3166,6 +3176,11 @@ ReindexRelationConcurrently(Oid relationOid, int options) Oid newIndexId = lfirst_oid(lc2); Oid heapId; + /* + * Check for user-requested abort. This is inside a transaction + * so as xact.c does not issue a useless WARNING, and ensures that + * session-level locks are cleaned up on abort. + */ CHECK_FOR_INTERRUPTS(); heapId = IndexGetRelation(oldIndexId, false); @@ -3221,7 +3236,13 @@ ReindexRelationConcurrently(Oid relationOid, int options) Oid oldIndexId = lfirst_oid(lc); Oid heapId; + /* + * Check for user-requested abort. This is inside a transaction + * so as xact.c does not issue a useless WARNING, and ensures that + * session-level locks are cleaned up on abort. + */ CHECK_FOR_INTERRUPTS(); + heapId = IndexGetRelation(oldIndexId, false); index_concurrently_set_dead(heapId, oldIndexId); } signature.asc Description: PGP signature
Re: 回复:回复:Bug about drop index concurrently
On Wed, Oct 23, 2019 at 02:38:45PM +0800, 李杰(慎追) wrote: I'm a bit confused. You shouldn't see any crashes and/or core files in this scenario, for two reasons. Firstly, I assume you're running a regular build without asserts. Secondly, I had to add an extra assert to trigger the failure. So what core are you talking about? Sorry, I should explain it more clearly. I saw the core file because I modified the postgres source code and added Assert to it. OK Also, it's not clear to me what do you mean by "bug in the standby" or no lock in the drop index concurrently. Can you explain? "bug in the standby" means that we built a master-slave instance, when we executed a large number of queries on the standby, we executed 'drop index concurrently' on the master so that get ‘error’ in the standby. Although it is not 100%, it will appear. no lock in the drop index concurrently ::: I think this is because there are not enough advanced locks when executing ‘ drop index concurrently’. OK, thanks for the clarification. Yes, it won't appear every time, it's likely timing-sensitive (I'll explain in a minute). Hmm, so you observe the issue with regular queries, not just EXPLAIN ANALYZE? yeah, we have seen this error frequently. That suggests you're doing a lot of 'drop index concurrently', right? Of course, we considered applying the method of waiting to detect the query lock on the master to the standby, but worried about affecting the standby application log delay, so we gave up that. I don't understand? What method? I analyzed this problem, I used to find out the cause of this problem, I also executed 'drop index concurrently' and ‘explain select * from xxx’ on the master, but the bug did not appear as expected. So I went to analyze the source code. I found that there is such a mechanism on the master that when the 'drop index concurrently' is execute, it wait will every transaction that saw the old index state has finished. source code is as follows follow as: WaitForLockers(heaplocktag, AccessExclusiveLock); Therefore, I think that if this method is also available in standby, then the error will not appear. but I worried about affecting the standby application log delay, so we gave up that. Yes, but we can't really do that, I'm afraid. We certainly can't do that on the master because we simply don't have the necessary information about locks from the standby, and we really don't want to have it, because with a busy standby that might be quite a bit of data (plust the standby would have to wait for the master to confirm each lock acquisition, I think which seems pretty terrible). On the standby, we don't really have an idea that the there's a drop index running - we only get information about AE locks, and a bunch of catalog updates. I don't think we have a way to determine this is a drop index in concurrent mode :-( More preciresly, the master sends information about AccessExclusiveLock in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when the standby replays that, it should acquire those locks. For regular DROP INDEX we send this: rmgr: Standby ... desc: LOCK xid db 16384 rel 16385 rmgr: Standby ... desc: LOCK xid db 16384 rel 20573 ... catalog changes ... rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST; rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache 6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608 while for DROP IDNEX CONCURRENTLY we send this rmgr: Heap... desc: INPLACE ... catalog update rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32 relcache 21288 relcache 16385 rmgr: Heap... desc: INPLACE ... catalog update rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32 relcache 21288 relcache 16385 rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288 ... catalog updates ... rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.042568 CEST; rels: base/16384/21288; inval msgs: catcache 32 catcache 7 catcache 6 catcache 50 catcache 49 relcache 21288 relcache 16385 snapshot 2608 So just a single lock on the index, but not the lock on the relation itself (which makes sense, because for DROP INDEX CONCURRENTLY we don't get an exclusive lock on the table). I'm not quite familiar with this part of the code, but the SELECT backends are clearly getting a stale list of indexes from relcache, and try to open an index which was already removed by the redo. We do acquire the lock on the index itself, but that's not sufficient :-( Not sure how to fix this. I wonder if we could invalidate the relcache for the relation at some point. Or maybe we could add additional information to the WAL to make the redo wait for all lock waiters, just like on the master. But that might be tricky because of deadlocks, and because the redo could easily get "stuck" waiting for a long-running queries. regards -- Tomas Vondra
Re: Transparent Data Encryption (TDE) and encrypted files
On Thu, Oct 10, 2019 at 10:40:37AM -0400, Stephen Frost wrote: > > Some people ask for indexable encrypted columns, but I tend to explain to > > them how impractical and inefficient that is. You can support hash indexes > > if you don't salt the encrypted data, but that greatly weakens the > > encryption by allowing attackers to use dictionary attacks and other brute > > force techniques efficiently. And you can't support b-tree > and < without > > very complex encryption schemes ( > > https://en.wikipedia.org/wiki/Homomorphic_encryption). > > I'm not sure why you wouldn't salt the hash..? That's pretty important, > imv, and, of course, you have to store the salt but that shouldn't be > that big of a deal, I wouldn't think. Agreed that you can't support > b-tree (even with complex encryption schemes..., I've read some papers > about how just is enough to be able to glean a good bit of info > from, not super relevant to the overall discussion here so I won't go > hunt them down right now, but if there's interest, I can try to do so). Yes. you can add salt to the value you store in the hash index, but when you are looking for a matching value, how do you know what salt to use to find it in the index? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: WIP: System Versioned Temporal Table
On 23/10/2019 17:56, Surafel Temesgen wrote: > > Hi all , > > Temporal table is one of the main new features added in sql standard > 2011. From that I will like to implement system versioned temporal > table which allows to keep past and present data so old data can be > queried. > Excellent! I've been wanting this feature for a long time now. We're the last major database to not have it. I tried my hand at doing it in core, but ended up having better success at an extension: https://github.com/xocolatl/periods/ > Am propose to implement it like below > > CREATE > > In create table only one table is create and both historical and > current data will be store in it. In order to make history and current > data co-exist row end time column will be added implicitly to primary > key. Regarding performance one can partition the table by row end time > column order to make history data didn't slowed performance. > If we're going to be implicitly adding stuff to the PK, we also need to add that stuff to the other unique constraints, no? And I think it would be better to add both the start and the end column to these keys. Most of the temporal queries will be accessing both. > INSERT > > In insert row start time column and row end time column behave like a > kind of generated stored column except they store current transaction > time and highest value supported by the data type which is +infinity > respectively. > You're forcing these columns to be timestamp without time zone. If you're going to force a datatype here, it should absolutely be timestamp with time zone. However, I would like to see it handle both kinds of timestamps as well as a simple date. > DELETE and UPDATE > > The old data is inserted with row end time column seated to current > transaction time > I don't see any error handling for transaction anomalies. In READ COMMITTED, you can easily end up with a case where the end time comes before the start time. I don't even see anything constraining start time to be strictly inferior to the end time. Such a constraint will be necessary for application-time periods (which your patch doesn't address at all but that's okay). > SELECT > > If the query didn’t contain a filter condition that include system > time column, a filter condition will be added in early optimization > that filter history data. > > Attached is WIP patch that implemented just the above and done on top > of commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet > so one can use regular filter condition for the time being > > NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM > TIME rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and > system time is not selected unless explicitly asked > Why aren't you following the standard syntax here? > Any enlightenment? > There are quite a lot of typos and other things that aren't written "the Postgres way". But before I comment on any of that, I'd like to see the features be implemented correctly according to the SQL standard.
WIP: System Versioned Temporal Table
Hi all , Temporal table is one of the main new features added in sql standard 2011. >From that I will like to implement system versioned temporal table which allows to keep past and present data so old data can be queried. Am propose to implement it like below CREATE In create table only one table is create and both historical and current data will be store in it. In order to make history and current data co-exist row end time column will be added implicitly to primary key. Regarding performance one can partition the table by row end time column order to make history data didn't slowed performance. INSERT In insert row start time column and row end time column behave like a kind of generated stored column except they store current transaction time and highest value supported by the data type which is +infinity respectively. DELETE and UPDATE The old data is inserted with row end time column seated to current transaction time SELECT If the query didn’t contain a filter condition that include system time column, a filter condition will be added in early optimization that filter history data. Attached is WIP patch that implemented just the above and done on top of commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet so one can use regular filter condition for the time being NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM TIME rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and system time is not selected unless explicitly asked Any enlightenment? regards Surafel diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c index 6bc4e4c036..c229c90e2d 100644 --- a/src/backend/access/common/tupdesc.c +++ b/src/backend/access/common/tupdesc.c @@ -167,6 +167,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) cpy->has_not_null = constr->has_not_null; cpy->has_generated_stored = constr->has_generated_stored; + cpy->is_system_versioned = constr->is_system_versioned; if ((cpy->num_defval = constr->num_defval) > 0) { @@ -484,6 +485,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) return false; if (constr1->has_generated_stored != constr2->has_generated_stored) return false; + if (constr1->is_system_versioned != constr2->is_system_versioned) + return false; n = constr1->num_defval; if (n != (int) constr2->num_defval) return false; @@ -864,6 +867,7 @@ BuildDescForRelation(List *schema) constr->has_not_null = true; constr->has_generated_stored = false; + constr->is_system_versioned= false; constr->defval = NULL; constr->missing = NULL; constr->num_defval = 0; diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index c9d024ead5..340f0340bd 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -330,6 +330,120 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot) MemoryContextSwitchTo(oldContext); } +/* + * Set row start time in row start time columns for a tuple. + */ +void +ExecSetRowStartTime(EState *estate, TupleTableSlot *slot) +{ + ResultRelInfo *resultRelInfo = estate->es_result_relation_info; + Relation rel = resultRelInfo->ri_RelationDesc; + TupleDesc tupdesc = RelationGetDescr(rel); + int natts = tupdesc->natts; + MemoryContext oldContext; + Datum *values; + bool *nulls; + + Assert(tupdesc->constr && tupdesc->constr->is_system_versioned); + + oldContext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); + + values = palloc(sizeof(*values) * natts); + nulls = palloc(sizeof(*nulls) * natts); + + slot_getallattrs(slot); + memcpy(nulls, slot->tts_isnull, sizeof(*nulls) * natts); + + for (int i = 0; i < natts; i++) + { + Form_pg_attribute attr = TupleDescAttr(tupdesc, i); + + /* + * We set infinity for row end time columns for a tuple because row end time is not yet known. + */ + if (attr->attgenerated == ATTRIBUTE_ROW_START_TIME) + { + Datum val; + + val = GetCurrentTransactionStartTimestamp(); + + values[i] = val; + nulls[i] = false; + } + else if (attr->attgenerated == ATTRIBUTE_ROW_END_TIME) + { + Datum val; + + val = DirectFunctionCall3(timestamp_in, + CStringGetDatum("infinity"), + ObjectIdGetDatum(InvalidOid), + Int32GetDatum(-1)); + + values[i] = val; + nulls[i] = false; + } + else + values[i] = datumCopy(slot->tts_values[i], attr->attbyval, attr->attlen); + } + + ExecClearTuple(slot); + memcpy(slot->tts_values, values, sizeof(*values) * natts); + memcpy(slot->tts_isnull, nulls, sizeof(*nulls) * natts); + ExecStoreVirtualTuple(slot); + ExecMaterializeSlot(slot); + + MemoryContextSwitchTo(oldContext); +} + +/* + * Set row end time in row end time columns for a tuple. + */ +void +ExecSetRowEndTime(EState *estate, TupleTableSlot *slot) +{ + ResultRelInfo *resultRelInfo = estate->es_result_relation_info; + Relation rel = resultRelInfo->ri_RelationDesc; + TupleDesc tupdesc =
Re: v12 pg_basebackup fails against older servers (take two)
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Tue, Oct 22, 2019 at 09:53:45AM -0400, Stephen Frost wrote: > > Yeah.. Something along those lines definitely seems like it'd be better > > as that would force anyone adding new options to explicitly say which > > server version the option makes sense for. Would it make sense to have a > > minimum and a maximum (and a "currently live" or some such indicator, so > > we aren't changing the max every release)? > > Yeah. A maximum may help to handle properly the cycling of deprecated > options in connstrs, so I see your point. Not sure that this > "currently-live" indicator is something to care about if we know > already the range of versions supported by a parameter and the > version of the backend for a live connection. My take is that it > would be more consistent to have a PG_MAJORVERSION_NUM for this > purpose in pg_config.h as well (I honestly don't like much the > existing tweaks for the major version numbers like "PG_VERSION_NUM / > 100" in pg_basebackup.c & co for example). If we were to have a > maximum, couldn't there also be issues when it comes to link a binary > with a version of libpq which has been compiled with a version of > Postgres older than the version of the binary? For example, imagine a > version of libpq compiled with v11, used to link to a pg_basebackup > from v12.. (@_@) Erm, your last concern is exactly why I was saying we'd have a 'currently live' indicator- so that it wouldn't be an issue to have an older library connecting from a new application to a newer database. > > The other thought I had was if we should, perhaps, be skipping settings > > whose values haven't been changed from the default value. Currently, we > > end up with a bunch of stuff that, in my experience anyway, just ends up > > being confusing to people, without any particular benefit, like > > 'sslcompression=0' when SSL wasn't used, or 'krbsrvname=postgres' when > > Kerberos/GSSAPI wasn't used... > > Couldn't this become a problem if we were to change the default for > some parameters? There has been a lot of talks for example about how > bad sslmode's default it for one, even if nobody has actually pulled > the trigger to change it. That really depends on if we think that users will expect the new-default behavior to be used, or the old-default to be. If the user didn't set anything explicitly when they ran the command in the first place, then it would seem like they intended and expected the defaults to be used. Perhaps that's an even better answer- just only put into the recovery.conf file what the user actually set instead of a bunch of other stuff... Thanks, Stephen signature.asc Description: PGP signature
Re: pgbench - refactor init functions with buffers
I am able to apply the v2 patch with "patch -p1 " - +static void +executeStatementExpect(PGconn *con, const char *sql, const ExecStatusType expected, bool errorOK) +{ I think some instances like this need 80 column alignment? - in initCreatePKeys(): + for (int i = 0; i < lengthof(DDLINDEXes); i++) + { + resetPQExpBuffer(); + appendPQExpBufferStr(, DDLINDEXes[i]); I think you can simply use printfPQExpBuffer() for the first append, similar to what you have used in createPartitions(), which is a combination of both reset and append. - The pgbench tap tests are also running fine. Regards, Jeevan Ladhe On Tue, Oct 22, 2019 at 8:57 PM Fabien COELHO wrote: > > >> The patch does not apply on master, needs rebase. > >> > >> Hmmm. "git apply pgbench-buffer-1.patch" works for me on current master. > >> > >>> Also, I got some whitespace errors. > >> > >> It possible, but I cannot see any. Could you be more specific? > > > > For me it failing, see below: > > > > $ git log -1 > > commit ad4b7aeb84434c958e2df76fa69b68493a889e4a > > Same for me, but it works: > >Switched to a new branch 'test' >sh> git apply ~/pgbench-buffer-2.patch >sh> git st > On branch test > Changes not staged for commit: ... > modified: src/bin/pgbench/pgbench.c > >sh> file ~/pgbench-buffer-2.patch >.../pgbench-buffer-2.patch: unified diff output, ASCII text > >sh> sha1sum ~/pgbench-buffer-2.patch >eab8167ef3ec5eca814c44b30e07ee5631914f07 ... > > I suspect that your mailer did or did not do something with the > attachment. Maybe try with "patch -p1 < foo.patch" at the root. > > -- > Fabien. >
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Wed, Oct 23, 2019 at 12:32 AM Alexey Kondratov wrote: > > On 22.10.2019 20:22, Tomas Vondra wrote: > > > > I think the patch should do the simplest thing possible, i.e. what it > > does today. Otherwise we'll never get it committed. > > > > I have to agree with Tomas, that keeping things as simple as possible > should be a main priority right now. Otherwise, the entire patch set > will pass next release cycle without being committed at least partially. > In the same time, it resolves important problem from my perspective. It > moves I/O overhead from primary to replica using large transactions > streaming, which is a nice to have feature I guess. > > Later it would be possible to replace logical apply worker with > bgworkers pool in a separated patch, if we decide that it is a viable > solution. Anyway, regarding the Amit's questions: > > - I doubt that maintaining a separate buffer on the apply side before > spilling to disk would help enough. We already have ReorderBuffer with > logical_work_mem limit, and if we exceeded that limit on the sender > side, then most probably we exceed it on the applier side as well, > I think on the sender side, the limit is for un-filtered changes (which means on the ReorderBuffer which has all the changes) whereas, on the receiver side, we will only have the requested changes which can make a difference? > excepting the case when this new buffer size will be significantly > higher then logical_work_mem to keep multiple open xacts. > I am not sure but I think we can have different controlling parameters on the subscriber-side. > - I still think that we should optimize database for commits, not > rollbacks. BGworkers pool is dramatically slower for rollbacks-only > load, though being at least twice as faster for commits-only. I do not > know how it will perform with real life load, but this drawback may be > inappropriate for such a general purpose database like Postgres. > > - Tomas' implementation of streaming with spilling does not have this > bias between commits/aborts. However, it has a noticeable performance > drop (~x5 slower compared with master [1]) for large transaction > consisting of many small rows. Although it is not of an order of > magnitude slower. > Did you ever identify the reason why it was slower in that case? I can see the numbers shared by you and Dilip which shows that the BGWorker pool is a really good idea and will work great for commit-mostly workload whereas the numbers without that are not very encouraging, maybe we have not benchmarked enough. This is the reason I am trying to see if we can do something to get the benefits similar to what is shown by your idea. I am not against doing something simple for the first version and then enhance it later, but it won't be good if we commit it with regression in some typical cases and depend on the user to use it when it seems favorable to its case. Also, sometimes it becomes difficult to generate enthusiasm to enhance the feature once the main patch is committed. I am not telling that always happens or will happen in this case. It is better if we put some energy and get things as good as possible in the first go itself. I am as much interested as you, Tomas or others are, otherwise, I wouldn't have spent a lot of time on this to disentangle it from 2PC patch which seems to get stalled due to lack of interest. > Another thing is it that about a year ago I have found some problems > with MVCC/visibility and fixed them somehow [1]. If I get it correctly > Tomas adapted some of those fixes into his patch set, but I think that > this part should be reviewed carefully again. > Agreed, I have read your emails and could see that you have done very good work on this project along with Tomas. But unfortunately, it didn't get committed. At this stage, we are working on just the first part of the patch which is to allow the data to spill once it crosses the logical_decoding_work_mem on the master side. I think we need more problems to discuss and solve once that is done. > I would be glad to check > it, but now I am a little bit confused with all the patch set variants > in the thread. Which is the last one? Is it still dependent on 2pc decoding? > I think the latest patches posted by Dilip are not dependent on logical decoding, but I haven't studied them yet. You can find those at [1][2]. As per discussion in this thread, we are also trying to see if we can make some part of the patch-series committed first, the latest patches corresponding to which are posted at [3]. [1] - https://www.postgresql.org/message-id/CAFiTN-vHoksqvV4BZ0479NhugGe4QHq_ezngNdDd-YRQ_2cwug%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAFiTN-vT%2B42xRbkw%3DhBnp44XkAyZaKZVA5hcvAMsYth3rk7vhg%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAFiTN-vkFB0RBEjVkLWhdgTYShSrSu3kCYObMghgXEwKA1FXRA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
On Sat, Oct 19, 2019 at 11:41:06AM +0900, Michael Paquier wrote: > FWIW, I have spent an hour or two poking at this issue the last couple > of days so I am not ignoring both, not as much as I would have liked > but well... My initial lookup leads me to think that something is > going wrong with the cleanup of the session-level lock on the parent > table taken in the first transaction doing the REINDEX CONCURRENTLY. I can confirm that this is an issue related to session locks which are not cleaned up when in an out-of-transaction state, state that can be reached between a transaction commit or start while holding at least one session lock within one single command of VACUUM, CIC or REINDEX CONCURRENTLY. The failure is actually pretty easy to reproduce if you add an elog(ERROR) after a CommitTransactionCommand() call and then shut down the connection. I am starting a new thread about that. The problem is larger than it looks, and exists for a long time. -- Michael signature.asc Description: PGP signature
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Tue, Oct 22, 2019 at 10:42 PM Tomas Vondra wrote: > > On Tue, Oct 22, 2019 at 10:30:16AM +0530, Dilip Kumar wrote: > > > >I have moved it out as a separate patch (0003) so that if we need that > >we need this for the streaming transaction then we can keep this. > >> > > I'm OK with moving it to a separate patch. That being said I think > ability to control memory usage for individual subscriptions is very > useful. Saying "We don't need such parameter" is essentially equivalent > to saying "One size fits all" and I think we know that's not true. > > Imagine a system with multiple subscriptions, some of them mostly > replicating OLTP changes, but one or two replicating tables that are > updated in batches. What we'd have is to allow higher limit for the > batch subscriptions, but much lower limit for the OLTP ones (which they > should never hit in practice). > This point is not clear to me. The changes are recorded in ReorderBuffer which doesn't have any filtering aka it will have all the changes irrespective of the subscriber. How will it make a difference to have different limits? > With a single global GUC, you'll either have a high value - risking > OOM when the OLTP subscriptions happen to decode a batch update, or a > low value affecting the batch subscriotions. > > It's not strictly necessary (and we already have such limit), so I'm OK > with treating it as an enhancement for the future. > I am fine too if its usage is clear. I might be missing something here. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Parallel leader process info in EXPLAIN
Hi, While working on some slides explaining EXPLAIN, I couldn't resist the urge to add the missing $SUBJECT. The attached 0001 patch gives the following: Gather ... time=0.146..33.077 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=4425 -> Parallel Seq Scan on public.t ... time=19.421..30.092 rows=0 loops=3) Filter: (t.i = 42) Rows Removed by Filter: 33 Leader: actual time=0.013..32.025 rows=1 loops=1 <--- NEW Buffers: shared hit=1546<--- NEW Worker 0: actual time=29.069..29.069 rows=0 loops=1 Buffers: shared hit=1126 Worker 1: actual time=29.181..29.181 rows=0 loops=1 Buffers: shared hit=1753 Without that, you have to deduce what work was done in the leader, but I'd rather just show it. The 0002 patch adjusts Sort for consistency with that scheme, so you get: Sort ... time=84.303..122.238 rows=33 loops=3) Output: t1.i Sort Key: t1.i Leader: Sort Method: external merge Disk: 5864kB <--- DIFFERENT Worker 0: Sort Method: external merge Disk: 3376kB Worker 1: Sort Method: external merge Disk: 4504kB Leader: actual time=119.624..165.949 rows=426914 loops=1 Worker 0: actual time=61.239..90.984 rows=245612 loops=1 Worker 1: actual time=72.046..109.782 rows=327474 loops=1 Without the "Leader" label, it's not really clear to the uninitiated whether you're looking at combined, average or single process numbers. Of course there are some more things that could be reported in a similar way eventually, such as filter counters and hash join details. For the XML/JSON/YAML formats, I decided to use a element with -1 to indicate the leader. Perhaps there should be a element instead? Thoughts? 0001-Show-parallel-leader-stats-in-EXPLAIN-output.patch Description: Binary data 0002-Improve-EXPLAIN-of-Sort-in-parallel-queries.patch Description: Binary data
Re: dropdb --force
On Tue, Oct 22, 2019 at 4:51 PM Pavel Stehule wrote: > > út 22. 10. 2019 v 5:09 odesílatel Amit Kapila > napsal: >> >> >> CountOtherDBBackends is called from other places as well, so I don't >> think it is advisable to change the sleep time in that function. >> Also, I don't want to add a parameter for it. I think you have a >> point that in some cases we might end up sleeping for 100ms when we >> could do with less sleeping time, but I think it is true to some >> extent today as well. I think we can anyway change it in the future >> if there is a problem with the sleep timing, but for now, I think we >> can just call CountOtherDBBackends after sending SIGTERM and call it >> good. You might want to add a futuristic note in the code. >> > > ok. > > I removed sleeping from TerminateOtherDBBackends(). > > If you want to change any logic there, please, do it without any hesitations. > Maybe I don't see, what you think. > Fair enough, I will see if I need to change anything. In the meantime, can you look into thread related to CountDBSubscriptions[1]? I think the problem reported there will be more serious after your patch, so it is better if we can fix it before this patch. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BqhLkCYG2oy9xug9ur_j%3DG2wQNRYAyd%2B-kZfZ1z42pLw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
回复:Bug about drop index concurrently
Ah ha ha , this is great, I am very ashamed of my English expression, did not let you clearly understand my mail. now, I am very glad that you can understand this. I sincerely hope that I can help you. I am also a postgres fan, a freshly graduated student. We have all confirmed that this bug will only appear on the standby and will not appear on the master.But it does affect the use of standby. For this bug, I proposed two options, one is to disable this feature (drop index concurrently), the other is to wait for the standby select like on the master, but it may affect the application delay of the log. Because this bug appears on the standby, I think both methods have advantages and disadvantages. So I hope that you can discuss it so much that it will help you. Sincerely adger -- 发件人:Tomas Vondra 发送时间:2019年10月23日(星期三) 06:49 收件人:李杰(慎追) 抄 送:pgsql-hackers 主 题:Re: Bug about drop index concurrently On Fri, Oct 18, 2019 at 05:00:54PM +0200, Tomas Vondra wrote: >Hi, > >I can trivially reproduce this - it's enough to create a master-standby >setup, and then do this on the master > > CREATE TABLE t (a int, b int); > INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i); > >and run pgbench with this script > > DROP INDEX CONCURRENTLY IF EXISTS t_a_idx; > CREATE INDEX t_a_idx ON t(a); > >while on the standby there's another pgbench running this script > > EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1; > >and it fails pretty fast for me. With an extra assert(false) added to >src/backend/access/common/relation.c I get a backtrace like this: > > Program terminated with signal SIGABRT, Aborted. > #0 0x7c32e458fe35 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: dnf debuginfo-install > glibc-2.29-22.fc30.x86_64 > (gdb) bt > #0 0x7c32e458fe35 in raise () from /lib64/libc.so.6 > #1 0x7c32e457a895 in abort () from /lib64/libc.so.6 > #2 0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc > "!(0)", errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", > lineNumber=64) at assert.c:54 > #3 0x0048d1bd in relation_open (relationId=38216, lockmode=1) at > relation.c:64 > #4 0x005082e4 in index_open (relationId=38216, lockmode=1) at > indexam.c:130 > #5 0x0080ac3f in get_relation_info (root=0x21698b8, > relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196 > #6 0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, > parent=0x0) at relnode.c:292 > #7 0x007d485d in add_base_rels_to_query (root=0x21698b8, > jtnode=0x2169478) at initsplan.c:114 > #8 0x007d48a3 in add_base_rels_to_query (root=0x21698b8, > jtnode=0x21693e0) at initsplan.c:122 > #9 0x007d8fad in query_planner (root=0x21698b8, > qp_callback=0x7ded97 , qp_extra=0x7fffa4834f10) at > planmain.c:168 > #10 0x007dc316 in grouping_planner (root=0x21698b8, > inheritance_update=false, tuple_fraction=0) at planner.c:2048 > #11 0x007da7ca in subquery_planner (glob=0x220d078, > parse=0x2168f78, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at > planner.c:1012 > #12 0x007d942c in standard_planner (parse=0x2168f78, > cursorOptions=256, boundParams=0x0) at planner.c:406 > #13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, > boundParams=0x0) at planner.c:275 > #14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, > cursorOptions=256, boundParams=0x0) at postgres.c:878 > #15 0x00658683 in ExplainOneQuery (query=0x2168f78, > cursorOptions=256, into=0x0, es=0x220cd90, queryString=0x21407b8 "explain > analyze select * from t where a = 1;", params=0x0, queryEnv=0x0) at > explain.c:367 > #16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, > queryString=0x21407b8 "explain analyze select * from t where a = 1;", > params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255 > #17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, > queryString=0x21407b8 "explain analyze select * from t where a = 1;", > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90, > completionTag=0x7fffa48355c0 "") at utility.c:675 > #18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, > queryString=0x21407b8 "explain analyze select * from t where a = 1;", > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90, > completionTag=0x7fffa48355c0 "") at utility.c:360 > #19 0x008e8a0c in PortalRunUtility (portal=0x219c278, > pstmt=0x21425c0, isTopLevel=true, setHoldSnapshot=true, dest=0x220cb90, > completionTag=0x7fffa48355c0 "") at pquery.c:1175 > #20 0x008e871a in FillPortalStore (portal=0x219c278, > isTopLevel=true) at pquery.c:1035 > #21 0x008e8075 in PortalRun (portal=0x219c278, >
回复:回复:Bug about drop index concurrently
> >I'm a bit confused. You shouldn't see any crashes and/or core files in >this scenario, for two reasons. Firstly, I assume you're running a >regular build without asserts. Secondly, I had to add an extra assert to >trigger the failure. So what core are you talking about? > Sorry, I should explain it more clearly. I saw the core file because I modified the postgres source code and added Assert to it. > >Also, it's not clear to me what do you mean by "bug in the standby" or >no lock in the drop index concurrently. Can you explain? > "bug in the standby" means that we built a master-slave instance, when we executed a large number of queries on the standby, we executed 'drop index concurrently' on the master so that get ‘error’ in the standby. Although it is not 100%, it will appear. no lock in the drop index concurrently ::: I think this is because there are not enough advanced locks when executing ‘ drop index concurrently’. >Hmm, so you observe the issue with regular queries, not just EXPLAIN >ANALYZE? yeah, we have seen this error frequently. >>Of course, we considered applying the method of waiting to detect the >>query lock on the master to the standby, but worried about affecting >>the standby application log delay, so we gave up that. >> >I don't understand? What method? > I analyzed this problem, I used to find out the cause of this problem, I also executed 'drop index concurrently' and ‘explain select * from xxx’ on the master, but the bug did not appear as expected. So I went to analyze the source code. I found that there is such a mechanism on the master that when the 'drop index concurrently' is execute, it wait will every transaction that saw the old index state has finished. source code is as follows follow as: WaitForLockers(heaplocktag, AccessExclusiveLock); Therefore, I think that if this method is also available in standby, then the error will not appear. but I worried about affecting the standby application log delay, so we gave up that. -- 发件人:Tomas Vondra 发送时间:2019年10月23日(星期三) 01:47 收件人:李杰(慎追) 抄 送:pgsql-hackers 主 题:Re: 回复:Bug about drop index concurrently On Mon, Oct 21, 2019 at 10:36:04AM +0800, 李杰(慎追) wrote: >Thanks for the quick reply. And sorry I haven’t got back to you sooner >. > >I have seen this backtrace in the core file, and I also looked at the >bug in the standby because there is no lock in the drop index >concurrently. > I'm a bit confused. You shouldn't see any crashes and/or core files in this scenario, for two reasons. Firstly, I assume you're running a regular build without asserts. Secondly, I had to add an extra assert to trigger the failure. So what core are you talking about? Also, it's not clear to me what do you mean by "bug in the standby" or no lock in the drop index concurrently. Can you explain? >However, when our business will perform a large number of queries in >the standby, this problem will occur more frequently. So we are trying >to solve this problem, and the solution we are currently dealing with >is to ban it. > Hmm, so you observe the issue with regular queries, not just EXPLAIN ANALYZE? >Of course, we considered applying the method of waiting to detect the >query lock on the master to the standby, but worried about affecting >the standby application log delay, so we gave up that. > I don't understand? What method? >If you have a better solution in the future, please push it to the new >version, or email it, thank you very much. > regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: v12 pg_basebackup fails against older servers (take two)
On Tue, Oct 22, 2019 at 09:53:45AM -0400, Stephen Frost wrote: > Yeah.. Something along those lines definitely seems like it'd be better > as that would force anyone adding new options to explicitly say which > server version the option makes sense for. Would it make sense to have a > minimum and a maximum (and a "currently live" or some such indicator, so > we aren't changing the max every release)? Yeah. A maximum may help to handle properly the cycling of deprecated options in connstrs, so I see your point. Not sure that this "currently-live" indicator is something to care about if we know already the range of versions supported by a parameter and the version of the backend for a live connection. My take is that it would be more consistent to have a PG_MAJORVERSION_NUM for this purpose in pg_config.h as well (I honestly don't like much the existing tweaks for the major version numbers like "PG_VERSION_NUM / 100" in pg_basebackup.c & co for example). If we were to have a maximum, couldn't there also be issues when it comes to link a binary with a version of libpq which has been compiled with a version of Postgres older than the version of the binary? For example, imagine a version of libpq compiled with v11, used to link to a pg_basebackup from v12.. (@_@) > The other thought I had was if we should, perhaps, be skipping settings > whose values haven't been changed from the default value. Currently, we > end up with a bunch of stuff that, in my experience anyway, just ends up > being confusing to people, without any particular benefit, like > 'sslcompression=0' when SSL wasn't used, or 'krbsrvname=postgres' when > Kerberos/GSSAPI wasn't used... Couldn't this become a problem if we were to change the default for some parameters? There has been a lot of talks for example about how bad sslmode's default it for one, even if nobody has actually pulled the trigger to change it. -- Michael signature.asc Description: PGP signature
Re: configure fails for perl check on CentOS8
Hello. At Mon, 21 Oct 2019 08:29:39 -0400, Andrew Dunstan wrote in > > On 10/20/19 7:36 PM, Tom Lane wrote: > > Andrew Dunstan writes: > >> On 10/20/19 1:23 PM, Tom Lane wrote: > >>> The right way to fix it, likely, is to add CFLAGS_SL while performing this > >>> particular autoconf test, as that would replicate the switches used for > >>> plperl (and it turns out that adding -fPIC does solve this problem). > >>> But the configure script doesn't currently know about CFLAGS_SL, so we'd > >>> have to do some refactoring to approach it that way. Moving that logic > >>> from the platform-specific Makefiles into configure doesn't seem > >>> unreasonable, but it would make the patch bigger. > >> Sounds like a plan. I agree it's annoying to have to do something large > >> for something so trivial. > > Turns out it's not really that bad. We just have to transfer the > > responsibility for setting CFLAGS_SL from the platform Makefiles > > to the platform template files. (As a bonus, it'd be possible to > > allow users to override CFLAGS_SL during configure, as they can > > do for CFLAGS. But I didn't mess with that here.) > > > > I checked that this fixes the Fedora build problem, but I've not > > really tested it on any other platform. Still, there's not that > > much to go wrong, one would think. > > > > LGTM However it's done, but it looks good to me and actually the problem is gone. Thaks! regards. -- Kyotaro Horiguchi NTT Open Source Software Center