Re: Remove page-read callback from XLogReaderState.

2019-10-23 Thread Kyotaro Horiguchi
Rebased.

I intentionally left duplicate code in XLogNeedData but changed my
mind to remove it. It makes the function small and clearer.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7c4ce152d248546c8f56057febae6b17b6fa71bb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 5 Sep 2019 20:21:55 +0900
Subject: [PATCH v9 1/4] Move callback-call from ReadPageInternal to
 XLogReadRecord.

The current WAL record reader reads page data using a call back
function.  Although it is not so problematic alone, it would be a
problem if we are going to do add tasks like encryption which is
performed on page data before WAL reader reads them. To avoid that the
record reader facility has to have a new code path corresponds to
every new callback, this patch separates page reader from WAL record
reading facility by modifying the current WAL record reader to a state
machine.

As the first step of that change, this patch moves the page reader
function out of ReadPageInternal, then the remaining tasks of the
function are taken over by the new function XLogNeedData. As the
result XLogPageRead directly calls the page reader callback function
according to the feedback from XLogNeedData.
---
 src/backend/access/transam/xlog.c |  16 +-
 src/backend/access/transam/xlogreader.c   | 273 ++
 src/backend/access/transam/xlogutils.c|  10 +-
 .../replication/logical/logicalfuncs.c|   2 +-
 src/backend/replication/walsender.c   |  10 +-
 src/bin/pg_rewind/parsexlog.c |  16 +-
 src/bin/pg_waldump/pg_waldump.c   |   8 +-
 src/include/access/xlogreader.h   |  23 +-
 src/include/access/xlogutils.h|   2 +-
 src/include/replication/logicalfuncs.h|   2 +-
 src/test/recovery/t/011_crash_recovery.pl |   1 +
 11 files changed, 213 insertions(+), 150 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b602e28f61..a7630c643a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -884,7 +884,7 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 		 int source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
-static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
+static bool	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
@@ -4249,7 +4249,6 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 	XLogRecord *record;
 	XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
 
-	/* Pass through parameters to XLogPageRead */
 	private->fetching_ckpt = fetching_ckpt;
 	private->emode = emode;
 	private->randAccess = (RecPtr != InvalidXLogRecPtr);
@@ -11540,7 +11539,7 @@ CancelBackup(void)
  * XLogPageRead() to try fetching the record from another source, or to
  * sleep and retry.
  */
-static int
+static bool
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 			 XLogRecPtr targetRecPtr, char *readBuf)
 {
@@ -11599,7 +11598,8 @@ retry:
 			readLen = 0;
 			readSource = 0;
 
-			return -1;
+			xlogreader->readLen = -1;
+			return false;
 		}
 	}
 
@@ -11694,7 +11694,8 @@ retry:
 		goto next_record_is_invalid;
 	}
 
-	return readLen;
+	xlogreader->readLen = readLen;
+	return true;
 
 next_record_is_invalid:
 	lastSourceFailed = true;
@@ -11708,8 +11709,9 @@ next_record_is_invalid:
 	/* In standby-mode, keep trying */
 	if (StandbyMode)
 		goto retry;
-	else
-		return -1;
+
+	xlogreader->readLen = -1;
+	return false;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index c8b0d2303d..66f3bc5597 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -34,8 +34,8 @@
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...)
 			pg_attribute_printf(2, 3);
 static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
-static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
-			 int reqLen);
+static bool XLogNeedData(XLogReaderState *state, XLogRecPtr pageptr,
+		 int reqLen, bool header_inclusive);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
@@ -244,7 +244,6 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 	uint32		targetRecOff;
 	uint32		pageHeaderSize;
 	bool		gotheader;
-	int			readOff;
 
 	/*
 	 * randAccess indicates whether to 

Re: Fix comment in XLogFileInit()

2019-10-23 Thread Fujii Masao
On Mon, Oct 21, 2019 at 5:28 PM Amit Kapila  wrote:
>
> On Mon, Oct 21, 2019 at 12:28 PM Fujii Masao  wrote:
> >
> > I found that the argument name of XLogFileInit() is wrong in its comment.
> > Attached is the patch that fixes that typo.
> >
>
> LGTM.

Thanks for checking! Committed.

Regards,

-- 
Fujii Masao




Re: pgbench - extend initialization phase control

2019-10-23 Thread Fujii Masao
On Thu, Oct 17, 2019 at 8:09 PM Fabien COELHO  wrote:
>
>
> Hello,
>
> > Failed regression test. It's necessary to change the first a in “allowed
> > step characters are” to uppercase A in the regression test of
> > 002_pgbench_no_server.pl.
>
> Argh. I think I ran the test, then stupidly updated the message afterwards
> to better match best practices, without rechecking:-(
>
> > The behavior of "g" is different between v12 and the patche, and
> > backward compatibility is lost. In v12, BEGIN and COMMIT are specified
> > only by choosing "g". It's a problem that backward compatibility is
> > lost.
>
> Somehow yes, but I do not see this as an actual problem from a functional
> point of view: it just means that if you use a 'dtgvp' with the newer
> version and if the inserts were to fail, then they are not under an
> explicit transaction, so previous inserts are not cleaned up. However,
> this is a pretty unlikely case, and anyway the error is reported, so any
> user would be expected not to go on after the initialization phase.
>
> So basically I do not see the very small regression for an unlikely corner
> case to induce any problem in practice.
>
> The benefit of controlling where begin/end actually occur is that it may
> have an impact on performance, and it allows to check that.

I still fail to understand the benefit of addition of () settings.
Could you clarify what case () settings are useful for? You are
thinking to execute all initialization SQL statements within
single transaction, e.g., -I (dtgp), for some reasons?

> > When using ( and ) with the -I, the documentation should indicate that 
> > double
> > quotes are required,
>
> Or single quotes, or backslash, if launch from the command line. I added a
> mention of escaping or protection in the doc in that case.

What about using, for example, b (BEGIN) and c (COMMIT) instead
to avoid such restriction?

> > and  "v" not be able to enclose in ( and ).
>
> That is a postgresql limitation, which may evolve. There could be others.
> I updated the doc to say that some commands may not work inside an
> explicit transaction.

I think that it's better to check whehter "v" is enclosed with () or not
at the beginning of pgbench, and report an error if it is. Otherwise,
if -I (dtgv) is specified, pgbench reports an error after time-consuming
data generation is performed, and of course that data generation is
rollbacked.

Regards,

-- 
Fujii Masao




Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could not find tuple for parent of relation ...

2019-10-23 Thread Michael Paquier
On Tue, Oct 15, 2019 at 11:40:47AM -0500, Justin Pryzby wrote:
> Not only can't I DROP the _ccold indexes, but also dropping the table doesn't
> cause them to be dropped, and then I can't even slash dee them anymore:

Yes, I can confirm the report.  In the case of this scenario the
reindex is waiting for the first transaction to finish before step 5,
the cancellation causing the follow-up process to not be done
(set_dead & the next ones).  So at this stage the swap has actually
happened.  I am still analyzing the report in depths, but you don't
have any problems with a plain index when interrupting at this stage,
and the old index can be cleanly dropped with the new one present, so
my first thoughts are that we are just missing some more dependency
cleanup at the swap phase when dealing with a partition index.
--
Michael


signature.asc
Description: PGP signature


Re: Fix of fake unlogged LSN initialization

2019-10-23 Thread Kyotaro Horiguchi
At Mon, 21 Oct 2019 14:03:47 +0900, Michael Paquier  wrote 
in 
> On Sat, Oct 19, 2019 at 05:03:00AM +, tsunakawa.ta...@fujitsu.com wrote:
> > The attached trivial patch fixes the initialization of the fake
> > unlogged LSN.  Currently, BootstrapXLOG() in initdb sets the initial
> > fake unlogged LSN to FirstNormalUnloggedLSN (=1000), but the
> > recovery and pg_resetwal sets it to 1.  The patch modifies the
> > latter two cases to match initdb. 
> > 
> > I don't know if this do actual harm, because the description of
> > FirstNormalUnloggedLSN doesn't give me any idea. 
> 
> From xlogdefs.h added by 9155580:
> /*
>  * First LSN to use for "fake" LSNs.
>  *
>  * Values smaller than this can be used for special per-AM purposes.
>  */
> #define FirstNormalUnloggedLSN  ((XLogRecPtr) 1000)
> 
> So it seems to me that you have caught a bug here, and that we had
> better back-patch to v12 so as recovery and pg_resetwal don't mess up
> with AMs using lower values than that.

+1

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




回复:回复:回复:Bug about drop index concurrently

2019-10-23 Thread 李杰(慎追)
>That suggests you're doing a lot of 'drop index concurrently', right?

Not completely, In the actual scene,  in fact, I didn't perform too much 'drop 
index concurrently' on the master. I just just execute a lot of queries on the 
standby. You know, it will have a certain probability every time you execute 
‘drop index concurrently’ on the master. Although it is small, it may appear.
>Yes, but we can't really do that, I'm afraid.
>
>We certainly can't do that on the master because we simply don't have
>the necessary information about locks from the standby, and we really
>don't want to have it, because with a busy standby that might be quite a
>bit of data (plust the standby would have to wait for the master to
>confirm each lock acquisition, I think which seems pretty terrible).
>
yeah,  we can't do this and will lose too much in order to achieve this.

>On the standby, we don't really have an idea that the there's a drop
>index running - we only get information about AE locks, and a bunch of
>catalog updates. I don't think we have a way to determine this is a drop
>index in concurrent mode :-(
>
>More preciresly, the master sends information about AccessExclusiveLock
>in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when
>the standby replays that, it should acquire those locks.
>
>For regular DROP INDEX we send this:
>
>rmgr: Standby ... desc: LOCK xid  db 16384 rel 16385 
>rmgr: Standby ... desc: LOCK xid  db 16384 rel 20573 
>... catalog changes ...
>rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST;
>   rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache
> 6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608
>
>while for DROP IDNEX CONCURRENTLY we send this
>
>rmgr: Heap... desc: INPLACE ... catalog update
>rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
>relcache 21288 relcache 16385
>rmgr: Heap... desc: INPLACE ... catalog update
>rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
>  relcache 21288 relcache 16385
>rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288 
>... catalog updates ...
>rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.042568 CEST;
>   rels: base/16384/21288; inval msgs: catcache 32 catcache 7 catcache 6
>   catcache 50 catcache 49 relcache 21288 relcache 16385 snapshot 2608
>
yeah, you are right, I got this .

>So just a single lock on the index, but not the lock on the relation
>itself (which makes sense, because for DROP INDEX CONCURRENTLY we don'>t get 
>an exclusive lock on the table).
>
>I'm not quite familiar with this part of the code, but the SELECT
>backends are clearly getting a stale list of indexes from relcache, and
>try to open an index which was already removed by the redo. We do
>acquire the lock on the index itself, but that's not sufficient :-(
>
>Not sure how to fix this. I wonder if we could invalidate the relcache
>for the relation at some point.
--method one --
 >Or maybe we could add additional
>information to the WAL to make the redo wait for all lock waiters, just
>like on the master. But that might be tricky because of deadlocks, and
>because  the redo could easily get "stuck" waiting for a long-running
>queries.
--method two --

for the method one , I don't think this can solve the problem at all. 

Because we can't predict on the standby when the master will 'drop index 
concurrently'. I also have a way to manually reproduce this bug in order to 
make it easier for you to understand the problem. You can try to follow the 
steps below:
1. We build a connection on the standby, then use gdb to attach to the backend 
process
2. We set a breakpoint at  plancat.c:196 in get_relation_info
3. We execute 'explain select ' on this backend, in gdb you will see that the 
breakpoint has hit, and the query will hang.
4.We execute 'drop index concurrently' on the master. (If 'drop index' will be 
blocked, because there is a query on our standby, the master can not get 
EXlock). 
5. on the standby ,let gdb continue.
We will see an error(ERROR:  could not open relation with OIDXXX) in the 
standby client.

Therefore, we can see that the query is executed first by the standby, and then 
the 'drop index concurrently' executed by the master. So no matter how fast we 
can make relcache invalid, there is no way to prevent this error from 
happening, because there is always a moment when the index has been dropped by 
the master. In other words, there is no way to predict the master's 'drop index 
concurrently' on standby.

for the method two, you are right, deadlock is possible, and the most 
unbearable is that it will bring delay to the log apply on standb, resulting in 
inconsistent data between the master and the standby.

All in all, I think this bug is a flaw in postgres design. We need to think 
carefully about how to handle it better. Even we can learn from other database 
products. I hope I can help you.

Thank 

Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-23 Thread Justin Pryzby
On Thu, Oct 24, 2019 at 11:42:04AM +0900, Michael Paquier wrote:
> Please see the attached.  Justin, does it fix your problems regarding
> the locks?

Confirmed.

Thanks,
Justin




Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-23 Thread Michael Paquier
On Wed, Oct 23, 2019 at 07:18:33PM +0900, Michael Paquier wrote:
> I can confirm that this is an issue related to session locks which are
> not cleaned up when in an out-of-transaction state, state that can be
> reached between a transaction commit or start while holding at least
> one session lock within one single command of VACUUM, CIC or REINDEX
> CONCURRENTLY.

Please let me back-pedal a bit on this one after sleeping on it.
Actually, if you look at CIC and VACUUM, those code paths are much
more careful regarding the position of CHECK_FOR_INTERRUPTS() than
REINDEX CONCURRENTLY is in the fact that they happen only within a
transaction context.  In the case of REINDEX CONCURRENTLY and the
failure reported here, the current code is careless: it depends of
course on the timing of statement_timeout, but session locks would
remain behind when hitting an interruption at the beginning of phase 2
or 3 in indexcmds.c.  So the answer is simple: by moving the interrupt
checks within a transaction context, the problem gets solved.  This
also fixes a second issue as the original code would cause xact.c to
generate some useless warnings.

Please see the attached.  Justin, does it fix your problems regarding
the locks?  For me it does.

> The failure is actually pretty easy to reproduce if you
> add an elog(ERROR) after a CommitTransactionCommand() call and then
> shut down the connection.  I am starting a new thread about that.  The
> problem is larger than it looks, and exists for a long time.

I am still wondering if we could put more safeguards in this area
though...
--
Michael
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 4448b986b6..e1bd81a1d6 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3057,11 +3057,16 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		Oid			newIndexId = lfirst_oid(lc2);
 		Oid			heapId;
 
-		CHECK_FOR_INTERRUPTS();
-
 		/* Start new transaction for this index's concurrent build */
 		StartTransactionCommand();
 
+		/*
+		 * Check for user-requested abort.  This is inside a transaction
+		 * so as xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
 		/* Set ActiveSnapshot since functions in the indexes may need it */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
@@ -3101,10 +3106,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		TransactionId limitXmin;
 		Snapshot	snapshot;
 
-		CHECK_FOR_INTERRUPTS();
-
 		StartTransactionCommand();
 
+		/*
+		 * Check for user-requested abort.  This is inside a transaction
+		 * so as xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
 		heapId = IndexGetRelation(newIndexId, false);
 
 		/*
@@ -3166,6 +3176,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		Oid			newIndexId = lfirst_oid(lc2);
 		Oid			heapId;
 
+		/*
+		 * Check for user-requested abort.  This is inside a transaction
+		 * so as xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
 		CHECK_FOR_INTERRUPTS();
 
 		heapId = IndexGetRelation(oldIndexId, false);
@@ -3221,7 +3236,13 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		Oid			oldIndexId = lfirst_oid(lc);
 		Oid			heapId;
 
+		/*
+		 * Check for user-requested abort.  This is inside a transaction
+		 * so as xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
 		CHECK_FOR_INTERRUPTS();
+
 		heapId = IndexGetRelation(oldIndexId, false);
 		index_concurrently_set_dead(heapId, oldIndexId);
 	}


signature.asc
Description: PGP signature


Re: 回复:回复:Bug about drop index concurrently

2019-10-23 Thread Tomas Vondra

On Wed, Oct 23, 2019 at 02:38:45PM +0800, 李杰(慎追) wrote:


I'm a bit confused. You shouldn't see any crashes and/or core files in
this scenario, for two reasons. Firstly, I assume you're running a
regular build without asserts. Secondly, I had to add an extra assert
to trigger the failure. So what core are you talking about?


Sorry, I should explain it more clearly.  I saw the core file because I
modified the postgres source code and added Assert to it.




OK


Also, it's not clear to me what do you mean by "bug in the standby" or
no lock in the drop index concurrently. Can you explain?


"bug in the standby" means that we built a master-slave instance, when
we executed a large number of queries on the standby, we executed 'drop
index concurrently' on the master so that get ‘error’ in the standby.
Although it is not 100%, it will appear.  no lock in the drop index
concurrently :::  I think this is because there are not enough advanced
locks when executing ‘ drop index concurrently’.



OK, thanks for the clarification. Yes, it won't appear every time, it's
likely timing-sensitive (I'll explain in a minute).


Hmm, so you observe the issue with regular queries, not just EXPLAIN
ANALYZE?


yeah, we have seen this error frequently.



That suggests you're doing a lot of 'drop index concurrently', right?


Of course, we considered applying the method of waiting to detect the
query lock on the master to the standby, but worried about affecting
the standby application log delay, so we gave up that.


I don't understand? What method?



I analyzed this problem, I used to find out the cause of this problem,
I also executed 'drop index concurrently' and ‘explain select * from
xxx’  on the master, but the bug did not appear as expected.  So I went
to analyze the source code. I found that there is such a mechanism on
the master that when the 'drop index concurrently' is execute, it wait
will every transaction that saw the old index state has finished.
source code is as follows follow as:

WaitForLockers(heaplocktag, AccessExclusiveLock);

Therefore, I think that if this method is also available in standby,
then the error will not appear. but I worried about affecting the
standby application log delay, so we gave up that.



Yes, but we can't really do that, I'm afraid.

We certainly can't do that on the master because we simply don't have
the necessary information about locks from the standby, and we really
don't want to have it, because with a busy standby that might be quite a
bit of data (plust the standby would have to wait for the master to
confirm each lock acquisition, I think which seems pretty terrible).

On the standby, we don't really have an idea that the there's a drop
index running - we only get information about AE locks, and a bunch of
catalog updates. I don't think we have a way to determine this is a drop
index in concurrent mode :-(

More preciresly, the master sends information about AccessExclusiveLock
in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when
the standby replays that, it should acquire those locks.

For regular DROP INDEX we send this:

rmgr: Standby ... desc: LOCK xid  db 16384 rel 16385 
rmgr: Standby ... desc: LOCK xid  db 16384 rel 20573 
... catalog changes ...

rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST;
   rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache
   6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608

while for DROP IDNEX CONCURRENTLY we send this

rmgr: Heap... desc: INPLACE ... catalog update
rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
 relcache 21288 relcache 16385
rmgr: Heap... desc: INPLACE ... catalog update
rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
 relcache 21288 relcache 16385
rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288 
... catalog updates ...

rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.042568 CEST;
  rels: base/16384/21288; inval msgs: catcache 32 catcache 7 catcache 6
  catcache 50 catcache 49 relcache 21288 relcache 16385 snapshot 2608

So just a single lock on the index, but not the lock on the relation
itself (which makes sense, because for DROP INDEX CONCURRENTLY we don't
get an exclusive lock on the table).

I'm not quite familiar with this part of the code, but the SELECT
backends are clearly getting a stale list of indexes from relcache, and
try to open an index which was already removed by the redo. We do
acquire the lock on the index itself, but that's not sufficient :-(

Not sure how to fix this. I wonder if we could invalidate the relcache
for the relation at some point. Or maybe we could add additional
information to the WAL to make the redo wait for all lock waiters, just
like on the master. But that might be tricky because of deadlocks, and
because  the redo could easily get "stuck" waiting for a long-running
queries.


regards

--
Tomas Vondra   

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-23 Thread Bruce Momjian
On Thu, Oct 10, 2019 at 10:40:37AM -0400, Stephen Frost wrote:
> > Some people ask for indexable encrypted columns, but I tend to explain to
> > them how impractical and inefficient that is. You can support hash indexes
> > if you don't salt the encrypted data, but that greatly weakens the
> > encryption by allowing attackers to use dictionary attacks and other brute
> > force techniques efficiently. And you can't support b-tree > and < without
> > very complex encryption schemes (
> > https://en.wikipedia.org/wiki/Homomorphic_encryption).
> 
> I'm not sure why you wouldn't salt the hash..?  That's pretty important,
> imv, and, of course, you have to store the salt but that shouldn't be
> that big of a deal, I wouldn't think.  Agreed that you can't support
> b-tree (even with complex encryption schemes..., I've read some papers
> about how just  is enough to be able to glean a good bit of info
> from, not super relevant to the overall discussion here so I won't go
> hunt them down right now, but if there's interest, I can try to do so).

Yes. you can add salt to the value you store in the hash index, but when
you are looking for a matching value, how do you know what salt to use
to find it in the index?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: WIP: System Versioned Temporal Table

2019-10-23 Thread Vik Fearing
On 23/10/2019 17:56, Surafel Temesgen wrote:
>
> Hi all ,
>
> Temporal table is one of the main new features added in sql standard
> 2011. From that I will like to implement system versioned temporal
> table which allows to keep past and present data so old data can be
> queried.
>

Excellent!  I've been wanting this feature for a long time now.  We're
the last major database to not have it.


I tried my hand at doing it in core, but ended up having better success
at an extension: https://github.com/xocolatl/periods/


> Am propose to implement it like below
>
> CREATE
>
> In create table only one table is create and both historical and
> current data will be store in it. In order to make history and current
> data co-exist row end time column will be added implicitly to primary
> key. Regarding performance one can partition the table by row end time
> column order to make history data didn't slowed performance.
>

If we're going to be implicitly adding stuff to the PK, we also need to
add that stuff to the other unique constraints, no?  And I think it
would be better to add both the start and the end column to these keys. 
Most of the temporal queries will be accessing both.


> INSERT
>
> In insert row start time column and row end time column behave like a
> kind of generated stored column except they store current transaction
> time and highest value supported by the data type which is +infinity
> respectively.
>

You're forcing these columns to be timestamp without time zone.  If
you're going to force a datatype here, it should absolutely be timestamp
with time zone.  However, I would like to see it handle both kinds of
timestamps as well as a simple date.


> DELETE and UPDATE
>
> The old data is inserted with row end time column seated to current
> transaction time
>

I don't see any error handling for transaction anomalies.  In READ
COMMITTED, you can easily end up with a case where the end time comes
before the start time.  I don't even see anything constraining start
time to be strictly inferior to the end time.  Such a constraint will be
necessary for application-time periods (which your patch doesn't address
at all but that's okay).


> SELECT
>
> If the query didn’t contain a filter condition that include system
> time column, a filter condition will be added in early optimization
> that filter history data.
>
> Attached is WIP patch that implemented just the above and done on top
> of commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet
> so one can use regular filter condition for the time being
>
> NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM
> TIME rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and
> system time is not selected unless explicitly asked
>

Why aren't you following the standard syntax here?


> Any enlightenment?
>

There are quite a lot of typos and other things that aren't written "the
Postgres way". But before I comment on any of that, I'd like to see the
features be implemented correctly according to the SQL standard.





WIP: System Versioned Temporal Table

2019-10-23 Thread Surafel Temesgen
Hi all ,

Temporal table is one of the main new features added in sql standard 2011.
>From that I will like to implement system versioned temporal table which
allows to keep past and present data so old data can be queried. Am propose
to implement it like below

CREATE

In create table only one table is create and both historical and current
data will be store in it. In order to make history and current data
co-exist row end time column will be added implicitly to primary key.
Regarding performance one can partition the table by row end time column
order to make history data didn't slowed performance.

INSERT

In insert row start time column and row end time column behave like a kind
of generated stored column except they store current transaction time and
highest value supported by the data type which is +infinity respectively.

DELETE and UPDATE

The old data is inserted with row end time column seated to current
transaction time

SELECT

If the query didn’t contain a filter condition that include system time
column, a filter condition will be added in early optimization that filter
history data.

Attached is WIP patch that implemented just the above and done on top of
commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet so one
can use regular filter condition for the time being

NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM TIME
rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and system
time is not selected unless explicitly asked

Any enlightenment?

regards

Surafel
diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 6bc4e4c036..c229c90e2d 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -167,6 +167,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
 
 		cpy->has_not_null = constr->has_not_null;
 		cpy->has_generated_stored = constr->has_generated_stored;
+		cpy->is_system_versioned = constr->is_system_versioned;
 
 		if ((cpy->num_defval = constr->num_defval) > 0)
 		{
@@ -484,6 +485,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 			return false;
 		if (constr1->has_generated_stored != constr2->has_generated_stored)
 			return false;
+		if (constr1->is_system_versioned != constr2->is_system_versioned)
+			return false;
 		n = constr1->num_defval;
 		if (n != (int) constr2->num_defval)
 			return false;
@@ -864,6 +867,7 @@ BuildDescForRelation(List *schema)
 
 		constr->has_not_null = true;
 		constr->has_generated_stored = false;
+		constr->is_system_versioned= false;
 		constr->defval = NULL;
 		constr->missing = NULL;
 		constr->num_defval = 0;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c9d024ead5..340f0340bd 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -330,6 +330,120 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldContext);
 }
 
+/*
+ * Set row start time in row start time columns for a tuple.
+ */
+void
+ExecSetRowStartTime(EState *estate, TupleTableSlot *slot)
+{
+	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+	TupleDesc	tupdesc = RelationGetDescr(rel);
+	int			natts = tupdesc->natts;
+	MemoryContext oldContext;
+	Datum	   *values;
+	bool	   *nulls;
+
+	Assert(tupdesc->constr && tupdesc->constr->is_system_versioned);
+
+	oldContext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+	values = palloc(sizeof(*values) * natts);
+	nulls = palloc(sizeof(*nulls) * natts);
+
+	slot_getallattrs(slot);
+	memcpy(nulls, slot->tts_isnull, sizeof(*nulls) * natts);
+
+	for (int i = 0; i < natts; i++)
+	{
+		Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+		/*
+		 * We set infinity for row end  time columns for a tuple because row end time is not yet known.
+		 */
+		if (attr->attgenerated == ATTRIBUTE_ROW_START_TIME)
+		{
+			Datum		val;
+
+			val = GetCurrentTransactionStartTimestamp();
+
+			values[i] = val;
+			nulls[i] = false;
+		}
+		else if (attr->attgenerated == ATTRIBUTE_ROW_END_TIME)
+		{
+			Datum		val;
+
+			val = DirectFunctionCall3(timestamp_in,
+	CStringGetDatum("infinity"),
+	ObjectIdGetDatum(InvalidOid),
+	Int32GetDatum(-1));
+
+			values[i] = val;
+			nulls[i] = false;
+		}
+		else
+			values[i] = datumCopy(slot->tts_values[i], attr->attbyval, attr->attlen);
+	}
+
+	ExecClearTuple(slot);
+	memcpy(slot->tts_values, values, sizeof(*values) * natts);
+	memcpy(slot->tts_isnull, nulls, sizeof(*nulls) * natts);
+	ExecStoreVirtualTuple(slot);
+	ExecMaterializeSlot(slot);
+
+	MemoryContextSwitchTo(oldContext);
+}
+
+/*
+ * Set row end time in row end time columns for a tuple.
+ */
+void
+ExecSetRowEndTime(EState *estate, TupleTableSlot *slot)
+{
+	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+	TupleDesc	tupdesc = 

Re: v12 pg_basebackup fails against older servers (take two)

2019-10-23 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Oct 22, 2019 at 09:53:45AM -0400, Stephen Frost wrote:
> > Yeah..  Something along those lines definitely seems like it'd be better
> > as that would force anyone adding new options to explicitly say which
> > server version the option makes sense for.  Would it make sense to have a
> > minimum and a maximum (and a "currently live" or some such indicator, so
> > we aren't changing the max every release)?
> 
> Yeah.  A maximum may help to handle properly the cycling of deprecated
> options in connstrs, so I see your point.  Not sure that this
> "currently-live" indicator is something to care about if we know
> already the range of versions supported by a parameter and the
> version of the backend for a live connection.  My take is that it
> would be more consistent to have a PG_MAJORVERSION_NUM for this
> purpose in pg_config.h as well (I honestly don't like much the
> existing tweaks for the major version numbers like "PG_VERSION_NUM / 
> 100" in pg_basebackup.c & co for example).  If we were to have a
> maximum, couldn't there also be issues when it comes to link a binary
> with a version of libpq which has been compiled with a version of
> Postgres older than the version of the binary?  For example, imagine a
> version of libpq compiled with v11, used to link to a pg_basebackup
> from v12..  (@_@)

Erm, your last concern is exactly why I was saying we'd have a
'currently live' indicator- so that it wouldn't be an issue to have an
older library connecting from a new application to a newer database.

> > The other thought I had was if we should, perhaps, be skipping settings
> > whose values haven't been changed from the default value.  Currently, we
> > end up with a bunch of stuff that, in my experience anyway, just ends up
> > being confusing to people, without any particular benefit, like
> > 'sslcompression=0' when SSL wasn't used, or 'krbsrvname=postgres' when
> > Kerberos/GSSAPI wasn't used...
> 
> Couldn't this become a problem if we were to change the default for
> some parameters?  There has been a lot of talks for example about how
> bad sslmode's default it for one, even if nobody has actually pulled
> the trigger to change it.

That really depends on if we think that users will expect the
new-default behavior to be used, or the old-default to be.  If the user
didn't set anything explicitly when they ran the command in the first
place, then it would seem like they intended and expected the defaults
to be used.  Perhaps that's an even better answer- just only put into
the recovery.conf file what the user actually set instead of a bunch of
other stuff...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgbench - refactor init functions with buffers

2019-10-23 Thread Jeevan Ladhe
I am able to apply the v2 patch with "patch -p1 "

-

+static void
+executeStatementExpect(PGconn *con, const char *sql, const ExecStatusType
expected, bool errorOK)
+{

I think some instances like this need 80 column alignment?

-

in initCreatePKeys():
+ for (int i = 0; i < lengthof(DDLINDEXes); i++)
+ {
+ resetPQExpBuffer();
+ appendPQExpBufferStr(, DDLINDEXes[i]);

I think you can simply use printfPQExpBuffer() for the first append,
similar to
what you have used in createPartitions(), which is a combination of both
reset
and append.

-

The pgbench tap tests are also running fine.

Regards,
Jeevan Ladhe

On Tue, Oct 22, 2019 at 8:57 PM Fabien COELHO  wrote:

>
> >> The patch does not apply on master, needs rebase.
> >>
> >> Hmmm. "git apply pgbench-buffer-1.patch" works for me on current master.
> >>
> >>> Also, I got some whitespace errors.
> >>
> >> It possible, but I cannot see any. Could you be more specific?
> >
> > For me it failing, see below:
> >
> > $ git log -1
> > commit ad4b7aeb84434c958e2df76fa69b68493a889e4a
>
> Same for me, but it works:
>
>Switched to a new branch 'test'
>sh> git apply ~/pgbench-buffer-2.patch
>sh> git st
> On branch test
> Changes not staged for commit: ...
>  modified:   src/bin/pgbench/pgbench.c
>
>sh> file ~/pgbench-buffer-2.patch
>.../pgbench-buffer-2.patch: unified diff output, ASCII text
>
>sh> sha1sum ~/pgbench-buffer-2.patch
>eab8167ef3ec5eca814c44b30e07ee5631914f07 ...
>
> I suspect that your mailer did or did not do something with the
> attachment. Maybe try with "patch -p1 < foo.patch" at the root.
>
> --
> Fabien.
>


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-23 Thread Amit Kapila
On Wed, Oct 23, 2019 at 12:32 AM Alexey Kondratov
 wrote:
>
> On 22.10.2019 20:22, Tomas Vondra wrote:
> >
> > I think the patch should do the simplest thing possible, i.e. what it
> > does today. Otherwise we'll never get it committed.
> >
>
> I have to agree with Tomas, that keeping things as simple as possible
> should be a main priority right now. Otherwise, the entire patch set
> will pass next release cycle without being committed at least partially.
> In the same time, it resolves important problem from my perspective. It
> moves I/O overhead from primary to replica using large transactions
> streaming, which is a nice to have feature I guess.
>
> Later it would be possible to replace logical apply worker with
> bgworkers pool in a separated patch, if we decide that it is a viable
> solution. Anyway, regarding the Amit's questions:
>
> - I doubt that maintaining a separate buffer on the apply side before
> spilling to disk would help enough. We already have ReorderBuffer with
> logical_work_mem limit, and if we exceeded that limit on the sender
> side, then most probably we exceed it on the applier side as well,
>

I think on the sender side, the limit is for un-filtered changes
(which means on the ReorderBuffer which has all the changes) whereas,
on the receiver side, we will only have the requested changes which
can make a difference?

> excepting the case when this new buffer size will be significantly
> higher then logical_work_mem to keep multiple open xacts.
>

I am not sure but I think we can have different controlling parameters
on the subscriber-side.

> - I still think that we should optimize database for commits, not
> rollbacks. BGworkers pool is dramatically slower for rollbacks-only
> load, though being at least twice as faster for commits-only. I do not
> know how it will perform with real life load, but this drawback may be
> inappropriate for such a general purpose database like Postgres.
>
> - Tomas' implementation of streaming with spilling does not have this
> bias between commits/aborts. However, it has a noticeable performance
> drop (~x5 slower compared with master [1]) for large transaction
> consisting of many small rows. Although it is not of an order of
> magnitude slower.
>

Did you ever identify the reason why it was slower in that case?  I
can see the numbers shared by you and Dilip which shows that the
BGWorker pool is a really good idea and will work great for
commit-mostly workload whereas the numbers without that are not very
encouraging, maybe we have not benchmarked enough.  This is the reason
I am trying to see if we can do something to get the benefits similar
to what is shown by your idea.

I am not against doing something simple for the first version and then
enhance it later, but it won't be good if we commit it with regression
in some typical cases and depend on the user to use it when it seems
favorable to its case.  Also, sometimes it becomes difficult to
generate enthusiasm to enhance the feature once the main patch is
committed.  I am not telling that always happens or will happen in
this case.  It is better if we put some energy and get things as good
as possible in the first go itself.  I am as much interested as you,
Tomas or others are, otherwise, I wouldn't have spent a lot of time on
this to disentangle it from 2PC patch which seems to get stalled due
to lack of interest.

> Another thing is it that about a year ago I have found some problems
> with MVCC/visibility and fixed them somehow [1]. If I get it correctly
> Tomas adapted some of those fixes into his patch set, but I think that
> this part should be reviewed carefully again.
>

Agreed, I have read your emails and could see that you have done very
good work on this project along with Tomas.  But unfortunately, it
didn't get committed.  At this stage, we are working on just the first
part of the patch which is to allow the data to spill once it crosses
the logical_decoding_work_mem on the master side.  I think we need
more problems to discuss and solve once that is done.

> I would be glad to check
> it, but now I am a little bit confused with all the patch set variants
> in the thread. Which is the last one? Is it still dependent on 2pc decoding?
>

I think the latest patches posted by Dilip are not dependent on
logical decoding, but I haven't studied them yet.  You can find those
at [1][2].  As per discussion in this thread, we are also trying to
see if we can make some part of the patch-series committed first, the
latest patches corresponding to which are posted at [3].

[1] - 
https://www.postgresql.org/message-id/CAFiTN-vHoksqvV4BZ0479NhugGe4QHq_ezngNdDd-YRQ_2cwug%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAFiTN-vT%2B42xRbkw%3DhBnp44XkAyZaKZVA5hcvAMsYth3rk7vhg%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAFiTN-vkFB0RBEjVkLWhdgTYShSrSu3kCYObMghgXEwKA1FXRA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-23 Thread Michael Paquier
On Sat, Oct 19, 2019 at 11:41:06AM +0900, Michael Paquier wrote:
> FWIW, I have spent an hour or two poking at this issue the last couple
> of days so I am not ignoring both, not as much as I would have liked
> but well...  My initial lookup leads me to think that something is
> going wrong with the cleanup of the session-level lock on the parent
> table taken in the first transaction doing the REINDEX CONCURRENTLY.

I can confirm that this is an issue related to session locks which are
not cleaned up when in an out-of-transaction state, state that can be
reached between a transaction commit or start while holding at least
one session lock within one single command of VACUUM, CIC or REINDEX
CONCURRENTLY.  The failure is actually pretty easy to reproduce if you
add an elog(ERROR) after a CommitTransactionCommand() call and then
shut down the connection.  I am starting a new thread about that.  The
problem is larger than it looks, and exists for a long time.
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-23 Thread Amit Kapila
On Tue, Oct 22, 2019 at 10:42 PM Tomas Vondra
 wrote:
>
> On Tue, Oct 22, 2019 at 10:30:16AM +0530, Dilip Kumar wrote:
> >
> >I have moved it out as a separate patch (0003) so that if we need that
> >we need this for the streaming transaction then we can keep this.
> >>
>
> I'm OK with moving it to a separate patch. That being said I think
> ability to control memory usage for individual subscriptions is very
> useful. Saying "We don't need such parameter" is essentially equivalent
> to saying "One size fits all" and I think we know that's not true.
>
> Imagine a system with multiple subscriptions, some of them mostly
> replicating OLTP changes, but one or two replicating tables that are
> updated in batches. What we'd have is to allow higher limit for the
> batch subscriptions, but much lower limit for the OLTP ones (which they
> should never hit in practice).
>

This point is not clear to me.  The changes are recorded in
ReorderBuffer which doesn't have any filtering aka it will have all
the changes irrespective of the subscriber.  How will it make a
difference to have different limits?

> With a single global GUC, you'll either have a high value - risking
> OOM when the OLTP subscriptions happen to decode a batch update, or a
> low value affecting the batch subscriotions.
>
> It's not strictly necessary (and we already have such limit), so I'm OK
> with treating it as an enhancement for the future.
>

I am fine too if its usage is clear.  I might be missing something here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Parallel leader process info in EXPLAIN

2019-10-23 Thread Thomas Munro
Hi,

While working on some slides explaining EXPLAIN, I couldn't resist the
urge to add the missing $SUBJECT.  The attached 0001 patch gives the
following:

Gather  ... time=0.146..33.077 rows=1 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=4425
  ->  Parallel Seq Scan on public.t ... time=19.421..30.092 rows=0 loops=3)
Filter: (t.i = 42)
Rows Removed by Filter: 33
Leader: actual time=0.013..32.025 rows=1 loops=1  <--- NEW
  Buffers: shared hit=1546<--- NEW
Worker 0: actual time=29.069..29.069 rows=0 loops=1
  Buffers: shared hit=1126
Worker 1: actual time=29.181..29.181 rows=0 loops=1
  Buffers: shared hit=1753

Without that, you have to deduce what work was done in the leader, but
I'd rather just show it.

The 0002 patch adjusts Sort for consistency with that scheme, so you get:

Sort  ... time=84.303..122.238 rows=33 loops=3)
   Output: t1.i
   Sort Key: t1.i
   Leader:  Sort Method: external merge  Disk: 5864kB   <--- DIFFERENT
   Worker 0:  Sort Method: external merge  Disk: 3376kB
   Worker 1:  Sort Method: external merge  Disk: 4504kB
   Leader: actual time=119.624..165.949 rows=426914 loops=1
   Worker 0: actual time=61.239..90.984 rows=245612 loops=1
   Worker 1: actual time=72.046..109.782 rows=327474 loops=1

Without the "Leader" label, it's not really clear to the uninitiated
whether you're looking at combined, average or single process numbers.

Of course there are some more things that could be reported in a
similar way eventually, such as filter counters and hash join details.

For the XML/JSON/YAML formats, I decided to use a  element
with -1 to indicate the leader.
Perhaps there should be a  element instead?

Thoughts?


0001-Show-parallel-leader-stats-in-EXPLAIN-output.patch
Description: Binary data


0002-Improve-EXPLAIN-of-Sort-in-parallel-queries.patch
Description: Binary data


Re: dropdb --force

2019-10-23 Thread Amit Kapila
On Tue, Oct 22, 2019 at 4:51 PM Pavel Stehule  wrote:
>
> út 22. 10. 2019 v 5:09 odesílatel Amit Kapila  
> napsal:
>>
>>
>> CountOtherDBBackends is called from other places as well, so I don't
>> think it is advisable to change the sleep time in that function.
>> Also, I don't want to add a parameter for it.  I think you have a
>> point that in some cases we might end up sleeping for 100ms when we
>> could do with less sleeping time, but I think it is true to some
>> extent today as well.  I think we can anyway change it in the future
>> if there is a problem with the sleep timing, but for now, I think we
>> can just call CountOtherDBBackends after sending SIGTERM and call it
>> good.  You might want to add a futuristic note in the code.
>>
>
> ok.
>
> I removed sleeping from TerminateOtherDBBackends().
>
> If you want to change any logic there, please, do it without any hesitations. 
> Maybe I don't see, what you think.
>

Fair enough, I will see if I need to change anything.  In the
meantime, can you look into thread related to CountDBSubscriptions[1]?
 I think the problem reported there will be more serious after your
patch, so it is better if we can fix it before this patch.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BqhLkCYG2oy9xug9ur_j%3DG2wQNRYAyd%2B-kZfZ1z42pLw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




回复:Bug about drop index concurrently

2019-10-23 Thread 李杰(慎追)
Ah ha ha , this is great, I am very ashamed of my English expression, did not 
let you clearly understand my mail.

now, I am very glad that you can understand this. I sincerely hope that I can 
help you. I am also a postgres fan, a freshly graduated student.

We have all confirmed that this bug will only appear on the standby and will 
not appear on the master.But it does affect the use of standby.

For this bug, I proposed two options, one is to disable this feature (drop 
index concurrently), the other is to wait for the standby select like on the 
master, but it may affect the application delay of the log. Because this bug 
appears on the standby, I think both methods have advantages and disadvantages. 
So I hope that you can discuss it so much that it will help you.

Sincerely 

adger



--
发件人:Tomas Vondra 
发送时间:2019年10月23日(星期三) 06:49
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: Bug about drop index concurrently

On Fri, Oct 18, 2019 at 05:00:54PM +0200, Tomas Vondra wrote:
>Hi,
>
>I can trivially reproduce this - it's enough to create a master-standby
>setup, and then do this on the master
>
> CREATE TABLE t (a int, b int);
> INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i);
>
>and run pgbench with this script
>
> DROP INDEX CONCURRENTLY IF EXISTS t_a_idx;
> CREATE INDEX t_a_idx ON t(a);
>
>while on the standby there's another pgbench running this script
>
> EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
>
>and it fails pretty fast for me. With an extra assert(false) added to
>src/backend/access/common/relation.c I get a backtrace like this:
>
>   Program terminated with signal SIGABRT, Aborted.
>   #0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
>   Missing separate debuginfos, use: dnf debuginfo-install 
> glibc-2.29-22.fc30.x86_64
>   (gdb) bt
>   #0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
>   #1  0x7c32e457a895 in abort () from /lib64/libc.so.6
>   #2  0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc 
> "!(0)", errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", 
> lineNumber=64) at assert.c:54
>   #3  0x0048d1bd in relation_open (relationId=38216, lockmode=1) at 
> relation.c:64
>   #4  0x005082e4 in index_open (relationId=38216, lockmode=1) at 
> indexam.c:130
>   #5  0x0080ac3f in get_relation_info (root=0x21698b8, 
> relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196
>   #6  0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, 
> parent=0x0) at relnode.c:292
>   #7  0x007d485d in add_base_rels_to_query (root=0x21698b8, 
> jtnode=0x2169478) at initsplan.c:114
>   #8  0x007d48a3 in add_base_rels_to_query (root=0x21698b8, 
> jtnode=0x21693e0) at initsplan.c:122
>   #9  0x007d8fad in query_planner (root=0x21698b8, 
> qp_callback=0x7ded97 , qp_extra=0x7fffa4834f10) at 
> planmain.c:168
>   #10 0x007dc316 in grouping_planner (root=0x21698b8, 
> inheritance_update=false, tuple_fraction=0) at planner.c:2048
>   #11 0x007da7ca in subquery_planner (glob=0x220d078, 
> parse=0x2168f78, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at 
> planner.c:1012
>   #12 0x007d942c in standard_planner (parse=0x2168f78, 
> cursorOptions=256, boundParams=0x0) at planner.c:406
>   #13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, 
> boundParams=0x0) at planner.c:275
>   #14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, 
> cursorOptions=256, boundParams=0x0) at postgres.c:878
>   #15 0x00658683 in ExplainOneQuery (query=0x2168f78, 
> cursorOptions=256, into=0x0, es=0x220cd90, queryString=0x21407b8 "explain 
> analyze select * from t where a = 1;", params=0x0, queryEnv=0x0) at 
> explain.c:367
>   #16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, 
> queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
> params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255
>   #17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, 
> queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
>   completionTag=0x7fffa48355c0 "") at utility.c:675
>   #18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, 
> queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
>   completionTag=0x7fffa48355c0 "") at utility.c:360
>   #19 0x008e8a0c in PortalRunUtility (portal=0x219c278, 
> pstmt=0x21425c0, isTopLevel=true, setHoldSnapshot=true, dest=0x220cb90, 
> completionTag=0x7fffa48355c0 "") at pquery.c:1175
>   #20 0x008e871a in FillPortalStore (portal=0x219c278, 
> isTopLevel=true) at pquery.c:1035
>   #21 0x008e8075 in PortalRun (portal=0x219c278, 
> 

回复:回复:Bug about drop index concurrently

2019-10-23 Thread 李杰(慎追)
>
>I'm a bit confused. You shouldn't see any crashes and/or core files in
>this scenario, for two reasons. Firstly, I assume you're running a
>regular build without asserts. Secondly, I had to add an extra assert to
>trigger the failure. So what core are you talking about?
>
Sorry, I should explain it more clearly.
I saw the core file because I modified the postgres source code and added 
Assert to it.
>
>Also, it's not clear to me what do you mean by "bug in the standby" or
>no lock in the drop index concurrently. Can you explain?
>
"bug in the standby" means that we built a master-slave instance, when we 
executed a large number of queries on the standby, we executed 'drop index 
concurrently' on the master so that get ‘error’ in the standby. Although it is 
not 100%, it will appear.
no lock in the drop index concurrently :::  I think this is because there are 
not enough advanced locks when executing ‘ drop index concurrently’.

>Hmm, so you observe the issue with regular queries, not just EXPLAIN
>ANALYZE?

yeah, we have seen this error frequently.

>>Of course, we considered applying the method of waiting to detect the
>>query lock on the master to the standby, but worried about affecting
>>the standby application log delay, so we gave up that.
>>
>I don't understand? What method?
>

I analyzed this problem, I used to find out the cause of this problem, I also 
executed 'drop index concurrently' and ‘explain select * from xxx’  on the 
master, but the bug did not appear as expected.
So I went to analyze the source code. I found that there is such a mechanism on 
the master that when the 'drop index concurrently' is execute, it wait will 
every transaction that saw the old index state has finished. source code is as 
follows follow as:



WaitForLockers(heaplocktag, AccessExclusiveLock);

Therefore, I think that if this method is also available in standby, then the 
error will not appear. but I worried about affecting the standby application 
log delay, so we gave up that.






--
发件人:Tomas Vondra 
发送时间:2019年10月23日(星期三) 01:47
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: 回复:Bug about drop index concurrently

On Mon, Oct 21, 2019 at 10:36:04AM +0800, 李杰(慎追) wrote:
>Thanks for the quick reply.  And sorry I haven’t got back to you sooner
>.
>
>I have seen this backtrace in the core file, and I also looked at the
>bug in the standby because there is no lock in the drop index
>concurrently.
>

I'm a bit confused. You shouldn't see any crashes and/or core files in
this scenario, for two reasons. Firstly, I assume you're running a
regular build without asserts. Secondly, I had to add an extra assert to
trigger the failure. So what core are you talking about?

Also, it's not clear to me what do you mean by "bug in the standby" or
no lock in the drop index concurrently. Can you explain?

>However, when our business will perform a large number of queries in
>the standby, this problem will occur more frequently. So we are trying
>to solve this problem, and the solution we are currently dealing with
>is to ban it.
>

Hmm, so you observe the issue with regular queries, not just EXPLAIN
ANALYZE?

>Of course, we considered applying the method of waiting to detect the
>query lock on the master to the standby, but worried about affecting
>the standby application log delay, so we gave up that.
>

I don't understand? What method?

>If you have a better solution in the future, please push it to the new
>version, or email it, thank you very much.
>


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: v12 pg_basebackup fails against older servers (take two)

2019-10-23 Thread Michael Paquier
On Tue, Oct 22, 2019 at 09:53:45AM -0400, Stephen Frost wrote:
> Yeah..  Something along those lines definitely seems like it'd be better
> as that would force anyone adding new options to explicitly say which
> server version the option makes sense for.  Would it make sense to have a
> minimum and a maximum (and a "currently live" or some such indicator, so
> we aren't changing the max every release)?

Yeah.  A maximum may help to handle properly the cycling of deprecated
options in connstrs, so I see your point.  Not sure that this
"currently-live" indicator is something to care about if we know
already the range of versions supported by a parameter and the
version of the backend for a live connection.  My take is that it
would be more consistent to have a PG_MAJORVERSION_NUM for this
purpose in pg_config.h as well (I honestly don't like much the
existing tweaks for the major version numbers like "PG_VERSION_NUM / 
100" in pg_basebackup.c & co for example).  If we were to have a
maximum, couldn't there also be issues when it comes to link a binary
with a version of libpq which has been compiled with a version of
Postgres older than the version of the binary?  For example, imagine a
version of libpq compiled with v11, used to link to a pg_basebackup
from v12..  (@_@)

> The other thought I had was if we should, perhaps, be skipping settings
> whose values haven't been changed from the default value.  Currently, we
> end up with a bunch of stuff that, in my experience anyway, just ends up
> being confusing to people, without any particular benefit, like
> 'sslcompression=0' when SSL wasn't used, or 'krbsrvname=postgres' when
> Kerberos/GSSAPI wasn't used...

Couldn't this become a problem if we were to change the default for
some parameters?  There has been a lot of talks for example about how
bad sslmode's default it for one, even if nobody has actually pulled
the trigger to change it.
--
Michael


signature.asc
Description: PGP signature


Re: configure fails for perl check on CentOS8

2019-10-23 Thread Kyotaro Horiguchi
Hello.

At Mon, 21 Oct 2019 08:29:39 -0400, Andrew Dunstan 
 wrote in 
> 
> On 10/20/19 7:36 PM, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> On 10/20/19 1:23 PM, Tom Lane wrote:
> >>> The right way to fix it, likely, is to add CFLAGS_SL while performing this
> >>> particular autoconf test, as that would replicate the switches used for
> >>> plperl (and it turns out that adding -fPIC does solve this problem).
> >>> But the configure script doesn't currently know about CFLAGS_SL, so we'd
> >>> have to do some refactoring to approach it that way.  Moving that logic
> >>> from the platform-specific Makefiles into configure doesn't seem
> >>> unreasonable, but it would make the patch bigger.
> >> Sounds like a plan. I agree it's annoying to have to do something large
> >> for something so trivial.
> > Turns out it's not really that bad.  We just have to transfer the
> > responsibility for setting CFLAGS_SL from the platform Makefiles
> > to the platform template files.  (As a bonus, it'd be possible to
> > allow users to override CFLAGS_SL during configure, as they can
> > do for CFLAGS.  But I didn't mess with that here.)
> >
> > I checked that this fixes the Fedora build problem, but I've not
> > really tested it on any other platform.  Still, there's not that
> > much to go wrong, one would think.
> >
> 
> LGTM

However it's done, but it looks good to me and actually the problem is
gone. Thaks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center