Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-05 Thread Masahiko Sawada
On Sat, Nov 4, 2017 at 7:04 AM, Alexander Korotkov
 wrote:
> On Wed, Nov 1, 2017 at 5:55 AM, Masahiko Sawada 
> wrote:
>>
>> On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
>>  wrote:
>> > On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada 
>> > wrote:
>> >>
>> >> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas 
>> >> wrote:
>> >> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> >> >  wrote:
>> >> >> Hello. I made some bugfixes and rewrite the patch.
>> >> >
>> >> > I don't think it's a good idea to deliberately leave the state of the
>> >> > standby different from the state of the  master on the theory that it
>> >> > won't matter.  I feel like that's something that's likely to come
>> >> > back
>> >> > to bite us.
>> >>
>> >> I agree with Robert. What happen if we intentionally don't apply the
>> >> truncation WAL and switched over? If we insert a tuple on the new
>> >> master server to a block that has been truncated on the old master,
>> >> the WAL apply on the new standby will fail? I guess there are such
>> >> corner cases causing failures of WAL replay after switch-over.
>> >
>> >
>> > Yes, that looks dangerous.  One approach to cope that could be teaching
>> > heap
>> > redo function to handle such these situations.  But I expect this
>> > approach
>> > to be criticized for bad design.  And I understand fairness of this
>> > criticism.
>> >
>> > However, from user prospective of view, current behavior of
>> > hot_standby_feedback is just broken, because it both increases bloat and
>> > doesn't guarantee that read-only query on standby wouldn't be cancelled
>> > because of vacuum.  Therefore, we should be looking for solution: if one
>> > approach isn't good enough, then we should look for another approach.
>> >
>> > I can propose following alternative approach: teach read-only queries on
>> > hot
>> > standby to tolerate concurrent relation truncation.  Therefore, when
>> > non-existent heap page is accessed on hot standby, we can know that it
>> > was
>> > deleted by concurrent truncation and should be assumed to be empty.  Any
>> > thoughts?
>> >
>>
>> You also meant that the applying WAL for AccessExclusiveLock is always
>> skipped on standby servers to allow scans to access the relation?
>
>
> Definitely not every AccessExclusiveLock WAL records should be skipped, but
> only whose were emitted during heap truncation.  There are other cases when
> AccessExclusiveLock WAL records are emitted, for instance, during DDL
> operations.  But, I'd like to focus on AccessExclusiveLock WAL records
> caused by VACUUM for now.  It's kind of understandable for users that DDL
> might cancel read-only query on standby.  So, if you're running long report
> query then you should wait with your DDL.  But VACUUM is a different story.
> It runs automatically when you do normal DML queries.
>
> AccessExclusiveLock WAL records by VACUUM could be either not emitted, or
> somehow distinguished and skipped on standby.  I haven't thought out that
> level of detail for now.
>

I understood. I'm concerned the fact that we cannot distinguish that
AccessExclusiveLock WAL came from the vacuum truncation or other
operation required AccessExclusiveLock so far. So I agree that we need
to invent a way for that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-05 Thread Robert Haas
On Fri, Nov 3, 2017 at 5:57 PM, Alexander Korotkov
 wrote:
> On Thu, Nov 2, 2017 at 5:56 AM, Robert Haas  wrote:
>> > I can propose following alternative approach: teach read-only queries on
>> > hot
>> > standby to tolerate concurrent relation truncation.  Therefore, when
>> > non-existent heap page is accessed on hot standby, we can know that it
>> > was
>> > deleted by concurrent truncation and should be assumed to be empty.  Any
>> > thoughts?
>>
>> Sounds like it might break MVCC?
>
> I don't know why it might be broken.  VACUUM truncates heap only when tail
> to be truncated is already empty.  When applying truncate WAL record,
> previous WAL records deleting all those tuples in the tail are already
> applied.  Thus, if even MVCC is broken and we will miss some tuples after
> heap truncation, they were anyway were gone before heap truncation.

Ah - I was thinking of the TRUNCATE command, rather than truncation by
VACUUM.  Your argument makes sense, although the case where the
relation is truncated and later re-extended might need some thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-03 Thread Alexander Korotkov
On Wed, Nov 1, 2017 at 5:55 AM, Masahiko Sawada 
wrote:

> On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
>  wrote:
> > On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada 
> > wrote:
> >>
> >> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas 
> >> wrote:
> >> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
> >> >  wrote:
> >> >> Hello. I made some bugfixes and rewrite the patch.
> >> >
> >> > I don't think it's a good idea to deliberately leave the state of the
> >> > standby different from the state of the  master on the theory that it
> >> > won't matter.  I feel like that's something that's likely to come back
> >> > to bite us.
> >>
> >> I agree with Robert. What happen if we intentionally don't apply the
> >> truncation WAL and switched over? If we insert a tuple on the new
> >> master server to a block that has been truncated on the old master,
> >> the WAL apply on the new standby will fail? I guess there are such
> >> corner cases causing failures of WAL replay after switch-over.
> >
> >
> > Yes, that looks dangerous.  One approach to cope that could be teaching
> heap
> > redo function to handle such these situations.  But I expect this
> approach
> > to be criticized for bad design.  And I understand fairness of this
> > criticism.
> >
> > However, from user prospective of view, current behavior of
> > hot_standby_feedback is just broken, because it both increases bloat and
> > doesn't guarantee that read-only query on standby wouldn't be cancelled
> > because of vacuum.  Therefore, we should be looking for solution: if one
> > approach isn't good enough, then we should look for another approach.
> >
> > I can propose following alternative approach: teach read-only queries on
> hot
> > standby to tolerate concurrent relation truncation.  Therefore, when
> > non-existent heap page is accessed on hot standby, we can know that it
> was
> > deleted by concurrent truncation and should be assumed to be empty.  Any
> > thoughts?
> >
>
> You also meant that the applying WAL for AccessExclusiveLock is always
> skipped on standby servers to allow scans to access the relation?


Definitely not every AccessExclusiveLock WAL records should be skipped, but
only whose were emitted during heap truncation.  There are other cases when
AccessExclusiveLock WAL records are emitted, for instance, during DDL
operations.  But, I'd like to focus on AccessExclusiveLock WAL records
caused by VACUUM for now.  It's kind of understandable for users that DDL
might cancel read-only query on standby.  So, if you're running long report
query then you should wait with your DDL.  But VACUUM is a different
story.  It runs automatically when you do normal DML queries.

AccessExclusiveLock WAL records by VACUUM could be either not emitted, or
somehow distinguished and skipped on standby.  I haven't thought out that
level of detail for now.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-03 Thread Alexander Korotkov
On Thu, Nov 2, 2017 at 5:56 AM, Robert Haas  wrote:

> On Tue, Oct 31, 2017 at 2:47 PM, Alexander Korotkov
>  wrote:
> > However, from user prospective of view, current behavior of
> > hot_standby_feedback is just broken, because it both increases bloat and
> > doesn't guarantee that read-only query on standby wouldn't be cancelled
> > because of vacuum.  Therefore, we should be looking for solution: if one
> > approach isn't good enough, then we should look for another approach.
> >
> > I can propose following alternative approach: teach read-only queries on
> hot
> > standby to tolerate concurrent relation truncation.  Therefore, when
> > non-existent heap page is accessed on hot standby, we can know that it
> was
> > deleted by concurrent truncation and should be assumed to be empty.  Any
> > thoughts?
>
> Sounds like it might break MVCC?


I don't know why it might be broken.  VACUUM truncates heap only when tail
to be truncated is already empty.  When applying truncate WAL record,
previous WAL records deleting all those tuples in the tail are already
applied.  Thus, if even MVCC is broken and we will miss some tuples after
heap truncation, they were anyway were gone before heap truncation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-01 Thread Robert Haas
On Tue, Oct 31, 2017 at 2:47 PM, Alexander Korotkov
 wrote:
> However, from user prospective of view, current behavior of
> hot_standby_feedback is just broken, because it both increases bloat and
> doesn't guarantee that read-only query on standby wouldn't be cancelled
> because of vacuum.  Therefore, we should be looking for solution: if one
> approach isn't good enough, then we should look for another approach.
>
> I can propose following alternative approach: teach read-only queries on hot
> standby to tolerate concurrent relation truncation.  Therefore, when
> non-existent heap page is accessed on hot standby, we can know that it was
> deleted by concurrent truncation and should be assumed to be empty.  Any
> thoughts?

Sounds like it might break MVCC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-31 Thread Masahiko Sawada
On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
 wrote:
> On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada 
> wrote:
>>
>> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas 
>> wrote:
>> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> >  wrote:
>> >> Hello. I made some bugfixes and rewrite the patch.
>> >
>> > I don't think it's a good idea to deliberately leave the state of the
>> > standby different from the state of the  master on the theory that it
>> > won't matter.  I feel like that's something that's likely to come back
>> > to bite us.
>>
>> I agree with Robert. What happen if we intentionally don't apply the
>> truncation WAL and switched over? If we insert a tuple on the new
>> master server to a block that has been truncated on the old master,
>> the WAL apply on the new standby will fail? I guess there are such
>> corner cases causing failures of WAL replay after switch-over.
>
>
> Yes, that looks dangerous.  One approach to cope that could be teaching heap
> redo function to handle such these situations.  But I expect this approach
> to be criticized for bad design.  And I understand fairness of this
> criticism.
>
> However, from user prospective of view, current behavior of
> hot_standby_feedback is just broken, because it both increases bloat and
> doesn't guarantee that read-only query on standby wouldn't be cancelled
> because of vacuum.  Therefore, we should be looking for solution: if one
> approach isn't good enough, then we should look for another approach.
>
> I can propose following alternative approach: teach read-only queries on hot
> standby to tolerate concurrent relation truncation.  Therefore, when
> non-existent heap page is accessed on hot standby, we can know that it was
> deleted by concurrent truncation and should be assumed to be empty.  Any
> thoughts?
>

You also meant that the applying WAL for AccessExclusiveLock is always
skipped on standby servers to allow scans to access the relation?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-31 Thread Alexander Korotkov
On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada 
wrote:

> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas 
> wrote:
> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
> >  wrote:
> >> Hello. I made some bugfixes and rewrite the patch.
> >
> > I don't think it's a good idea to deliberately leave the state of the
> > standby different from the state of the  master on the theory that it
> > won't matter.  I feel like that's something that's likely to come back
> > to bite us.
>
> I agree with Robert. What happen if we intentionally don't apply the
> truncation WAL and switched over? If we insert a tuple on the new
> master server to a block that has been truncated on the old master,
> the WAL apply on the new standby will fail? I guess there are such
> corner cases causing failures of WAL replay after switch-over.
>

Yes, that looks dangerous.  One approach to cope that could be teaching
heap redo function to handle such these situations.  But I expect this
approach to be criticized for bad design.  And I understand fairness of
this criticism.

However, from user prospective of view, current behavior of
hot_standby_feedback
is just broken, because it both increases bloat and doesn't guarantee that
read-only query on standby wouldn't be cancelled because of vacuum.
Therefore, we should be looking for solution: if one approach isn't good
enough, then we should look for another approach.

I can propose following alternative approach: teach read-only queries on
hot standby to tolerate concurrent relation truncation.  Therefore, when
non-existent heap page is accessed on hot standby, we can know that it was
deleted by concurrent truncation and should be assumed to be empty.  Any
thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-30 Thread Masahiko Sawada
On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas  wrote:
> On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>  wrote:
>> Hello. I made some bugfixes and rewrite the patch.
>
> I don't think it's a good idea to deliberately leave the state of the
> standby different from the state of the  master on the theory that it
> won't matter.  I feel like that's something that's likely to come back
> to bite us.

I agree with Robert. What happen if we intentionally don't apply the
truncation WAL and switched over? If we insert a tuple on the new
master server to a block that has been truncated on the old master,
the WAL apply on the new standby will fail? I guess there are such
corner cases causing failures of WAL replay after switch-over.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-30 Thread Robert Haas
On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
 wrote:
> Hello. I made some bugfixes and rewrite the patch.

I don't think it's a good idea to deliberately leave the state of the
standby different from the state of the  master on the theory that it
won't matter.  I feel like that's something that's likely to come back
to bite us.

Giving LockAcquireExtended() an argument that causes some
AccessExclusiveLocks not all to be logged also seems pretty ugly,
especially because there are no comments whatsoever explaining the
rationale.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-24 Thread Alexander Korotkov
On Tue, Oct 24, 2017 at 10:56 AM, Ivan Kartyshov  wrote:

> Hello. I made some bugfixes and rewrite the patch.
>
> Simon Riggs писал 2017-09-05 14:44:
>
>> As Alexander says, simply skipping truncation if standby is busy isn't
>> a great plan.
>>
>> If we defer an action on standby replay, when and who will we apply
>> it? What happens if the standby is shutdown or crashes while an action
>> is pending.
>>
>
> After crash standby server will go to the nearest checkout and will replay
> all his wal`s to reach consistent state and then open for read-only load.
> (all collected wal`s will be replayed in right way, I meant wal`s that
> truncates table and wal`s that make table grow)


--- a/src/backend/storage/lmgr/lock.c
> +++ b/src/backend/storage/lmgr/lock.c
> @@ -49,6 +49,8 @@
>  #include "utils/ps_status.h"
>  #include "utils/resowner_private.h"
>
> +#include 
> +#include 
>
>  /* This configuration variable is used to set the lock table size */
>  int max_locks_per_xact; /* set by guc.c */


Why do you need these includes?

+ FreeFakeRelcacheEntry(rel);
> + } else
> + {
> + XLogFlush(lsn);
> + }


This code violates our coding style.  According to it, "else" shouldn't
share its line with braces.

Also, patch definitely needs some general comment explaining what's going
on.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-24 Thread Ivan Kartyshov

Hello. I made some bugfixes and rewrite the patch.

Simon Riggs писал 2017-09-05 14:44:

As Alexander says, simply skipping truncation if standby is busy isn't
a great plan.

If we defer an action on standby replay, when and who will we apply
it? What happens if the standby is shutdown or crashes while an action
is pending.


After crash standby server will go to the nearest checkout and will 
replay
all his wal`s to reach consistent state and then open for read-only 
load.

(all collected wal`s will be replayed in right way, I meant wal`s that
truncates table and wal`s that make table grow)

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 9a5fde0..68decab 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -31,6 +31,9 @@
 #include "storage/smgr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "storage/lmgr.h"
+#include "storage/procarray.h"
+#include "access/transam.h"
 
 /*
  * We keep a list of all relations (represented as RelFileNode values)
@@ -498,50 +501,56 @@ smgr_redo(XLogReaderState *record)
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
-		/*
-		 * Forcibly create relation if it doesn't exist (which suggests that
-		 * it was dropped somewhere later in the WAL sequence).  As in
-		 * XLogReadBufferForRedo, we prefer to recreate the rel and replay the
-		 * log as best we can until the drop is seen.
-		 */
-		smgrcreate(reln, MAIN_FORKNUM, true);
-
-		/*
-		 * Before we perform the truncation, update minimum recovery point to
-		 * cover this WAL record. Once the relation is truncated, there's no
-		 * going back. The buffer manager enforces the WAL-first rule for
-		 * normal updates to relation files, so that the minimum recovery
-		 * point is always updated before the corresponding change in the data
-		 * file is flushed to disk. We have to do the same manually here.
-		 *
-		 * Doing this before the truncation means that if the truncation fails
-		 * for some reason, you cannot start up the system even after restart,
-		 * until you fix the underlying situation so that the truncation will
-		 * succeed. Alternatively, we could update the minimum recovery point
-		 * after truncation, but that would leave a small window where the
-		 * WAL-first rule could be violated.
-		 */
-		XLogFlush(lsn);
-
-		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
+		if (!ConditionalLockRelationOid(reln->smgr_rnode.node.relNode, AccessExclusiveLock))
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
+			/*
+			 * Forcibly create relation if it doesn't exist (which suggests that
+			 * it was dropped somewhere later in the WAL sequence).  As in
+			 * XLogReadBufferForRedo, we prefer to recreate the rel and replay the
+			 * log as best we can until the drop is seen.
+			 */
+			smgrcreate(reln, MAIN_FORKNUM, true);
+
+			/*
+			 * Before we perform the truncation, update minimum recovery point to
+			 * cover this WAL record. Once the relation is truncated, there's no
+			 * going back. The buffer manager enforces the WAL-first rule for
+			 * normal updates to relation files, so that the minimum recovery
+			 * point is always updated before the corresponding change in the data
+			 * file is flushed to disk. We have to do the same manually here.
+			 *
+			 * Doing this before the truncation means that if the truncation fails
+			 * for some reason, you cannot start up the system even after restart,
+			 * until you fix the underlying situation so that the truncation will
+			 * succeed. Alternatively, we could update the minimum recovery point
+			 * after truncation, but that would leave a small window where the
+			 * WAL-first rule could be violated.
+			 */
+			XLogFlush(lsn);
 
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
-		}
+			if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
+			{
+smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
+
+/* Also tell xlogutils.c about it */
+XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			}
 
-		/* Truncate FSM and VM too */
-		rel = CreateFakeRelcacheEntry(xlrec->rnode);
+			/* Truncate FSM and VM too */
+			rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
-		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
-			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
-		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
-			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+			if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
+smgrexists(reln, FSM_FORKNUM))
+FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
+smgrexists(reln, VISIBILITYMAP_FORKNUM))
+visibilitymap_truncate(rel, xlrec->blkno);
 
-		FreeFakeRelcacheEntry(rel);
+			FreeFakeRelcacheEntry(rel);
+		} else
+		{
+			

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-15 Thread i . kartyshov

Alexander Korotkov писал 2017-09-05 00:33:

I've following comments on this patch:
1) You shouldn't use ">=" to compare xids.  You should use
TransactionIdFollowsOrEquals() function which handles transaction id
wraparound correctly.
I fixed it and as an additional I add GUC parameter that could turn off 
this behavior. These parameter can be set in the postgresql.conf 
configuration file, or with the help of the ALTER SYSTEM command. For 
the changes to take effect, call the pg_reload_conf() function or reload 
the database server.

Changes are in vacuum_lazy_truncate_v3.patch


2) As I understood, this patch makes heap truncate only when no
concurrent transactions are running on both master and replicas with
hot_standby_feedback enabled.  For busy system, it would be literally
"never do heap truncate after vacuum".
Yes, the difficulty is that if we have a lot of replicas and requests 
for them will overlap each other, then truncation will not occur on the 
loaded system



Perhaps, it's certainly bad idea to disable replaying
AccessExclusiveLock *every time*, we should skip taking
AccessExclusiveLock only while writing XLOG_SMGR_TRUNCATE record.
Besides that, this code violates out coding style.
In patch (standby_truncate_skip_v2.patch) fixed this behaviour and now 
it skip writing XLOG_STANDBY_LOCK records (with in AccessExclusiveLock) 
only while autovacuum truncations the table.


Amit Kapila писал 2017-09-05 12:52:

I think one deficiency of this solution is that it will keep on
generating extra WAL even if standby doesn't need it (as standby has
successfully truncated the relation).  I don't know if we can just get
away by saying that an additional WAL record per vacuum cycle is
harmless especially when that doesn't serve any purpose (like for the
cases when standby is always able to truncate the heap on first WAL
record).  Am I missing some part of solution which avoids it?
You are right, and this implementation generating extra WALs and it 
could have some cases. If you have any solutions to avoid it, I`ll be 
glade to hear them.


Simon Riggs писал 2017-09-05 14:44:

If we defer an action on standby replay, when and who will we apply
it? What happens if the standby is shutdown or crashes while an action
is pending.

Perhaps altering the way truncation requires an AccessExclusiveLock
would help workloads on both master and standby? If a Relation knew it
had a pending truncation then scans wouldn't need to go past newsize.
Once older lockers have gone we could simply truncate without the
lock. Would need a few pushups but seems less scary then trying to
make pending standby actions work well enough to commit.
Yes, it is the first idea that come to my mind and the most difficult 
one. I think that in a few days I'll finish the patch with such ideas of 
implementation.



--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 9a5fde0..a6dc369 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -31,6 +31,9 @@
 #include "storage/smgr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "storage/lmgr.h"
+#include "storage/procarray.h"
+#include "access/transam.h"
 
 /*
  * We keep a list of all relations (represented as RelFileNode values)
@@ -495,9 +498,21 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		LOCKTAG		locktag;
+		VirtualTransactionId *backends;
+		VirtualTransactionId *backends2;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
+		SET_LOCKTAG_RELATION(locktag, reln->smgr_rnode.node.dbNode,
+			 reln->smgr_rnode.node.relNode);
+
+		backends = GetLockConflicts(, AccessExclusiveLock);
+		backends2 = GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid);
+
+		if (!VirtualTransactionIdIsValid(*backends) && !VirtualTransactionIdIsValid(*backends2))
+		{
+
 		/*
 		 * Forcibly create relation if it doesn't exist (which suggests that
 		 * it was dropped somewhere later in the WAL sequence).  As in
@@ -542,6 +557,10 @@ smgr_redo(XLogReaderState *record)
 			visibilitymap_truncate(rel, xlrec->blkno);
 
 		FreeFakeRelcacheEntry(rel);
+		} else
+		{
+			XLogFlush(lsn);
+		}
 	}
 	else
 		elog(PANIC, "smgr_redo: unknown op code %u", info);
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 45b1859..3c25907 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -61,6 +61,8 @@
 #include "utils/pg_rusage.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "catalog/storage_xlog.h"
+#include "access/rmgr.h"
 
 
 /*
@@ -317,6 +319,32 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		new_rel_pages = vacrelstats->old_rel_pages;
 		new_rel_tuples = vacrelstats->old_rel_tuples;
 	}
+	else
+	{
+		if 

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-05 Thread Simon Riggs
On 4 September 2017 at 09:06, Alexander Korotkov
 wrote:

> Aborting read-only query on standby because of vacuum on master seems to be
> rather unexpected behaviour for user when hot standby feedback is on.
> I think we should work on this problem for v11.

Happy to help. My suggestion would be to discuss a potential theory of
operation and then code a patch.

As Alexander says, simply skipping truncation if standby is busy isn't
a great plan.

If we defer an action on standby replay, when and who will we apply
it? What happens if the standby is shutdown or crashes while an action
is pending.

Perhaps altering the way truncation requires an AccessExclusiveLock
would help workloads on both master and standby? If a Relation knew it
had a pending truncation then scans wouldn't need to go past newsize.
Once older lockers have gone we could simply truncate without the
lock. Would need a few pushups but seems less scary then trying to
make pending standby actions work well enough to commit.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-05 Thread Amit Kapila
On Tue, Sep 5, 2017 at 3:03 AM, Alexander Korotkov
 wrote:
> On Mon, Sep 4, 2017 at 2:04 PM,  wrote:
>>
>> Our clients complain about this issue and therefore I want to raise the
>> discussion and suggest several solutions to this problem:
>>
>> I. Why does PG use Fatal when Error is enough to release lock that rises
>> lock conflict?
>> "If (RecoveryConflictPending && DoingCommandRead)"
>>
>> II. Do we really need to truncate the table on hot standby exactly at the
>> same time when truncate on master occurs?
>>
>> In my case conflict happens when the autovacuum truncates table tbl1 on
>> master while backend on replica is performing a long transaction involving
>> the same table tbl1. This happens because truncate takes an
>> AccessExclusiveLock. To tackle this issue we have several options:
>>
>> 1. We can postpone the truncate on the master until all the replicas have
>> finished their transactions (in this case, feedback requests to the master
>> should be sent frequently)
>> Patch 1
>> vacuum_lazy_truncate.patch
>
>
> I've following comments on this patch:
> 1) You shouldn't use ">=" to compare xids.  You should use
> TransactionIdFollowsOrEquals() function which handles transaction id
> wraparound correctly.
> 2) As I understood, this patch makes heap truncate only when no concurrent
> transactions are running on both master and replicas with
> hot_standby_feedback enabled.  For busy system, it would be literally "never
> do heap truncate after vacuum".
>
>> 2. Maybe there is an option somehow not to send AccessExclusiveLock and
>> not to truncate table on the replica right away. We could try to wait a
>> little and truncate tbl1 on replica again.
>>
>> Here is a patch that makes replica skip truncate WAL record if some
>> transaction using the same table is already running on replica. And it also
>> forces master to send truncate_wal to replica with actual table length every
>> time autovacuum starts.
>> Patch 2
>> standby_truncate_skip_v1.patch
>>
>> In this case the transaction which is running for several hours won’t be
>> interrupted because of truncate. Even if for some reason we haven’t
>> truncated this table tbl1 right away, nothing terrible will happen. The next
>> truncate wal record will reduce the file size by the actual length (if some
>> inserts or updates have been performed on master).
>
>
> Since you wrote this patch under on my request, let me clarify its idea
> little more.
>
> Currently, lazy_truncate_heap() is very careful on getting
> AccessExclusiveLock to truncate heap.  It doesn't want either block other
> transaction or wait for this lock too long.  If lock isn't acquired after
> some number of tries lazy_truncate_heap() gives up.  However, once
> lazy_truncate_heap() acquires AccessExclusiveLock is acquired on master, it
> would be replayed on replicas where it will conflict with read-only
> transactions if any.  That leads to unexpected behaviour when
> hot_standby_feedback is on.
>
> Idea of fixing that is to apply same logic of getting AccessExclusiveLock on
> standby as on master: give up if somebody is holding conflicting lock for
> long enough.  That allows standby to have more free pages at the end of heap
> than master have.  That shouldn't lead to errors since those pages are
> empty, but allows standby to waste some extra space.  In order to mitigate
> this deficiency, we're generating XLOG_SMGR_TRUNCATE records more frequent:
> on finish of every vacuum.  Therefore, if even standby gets some extra space
> of empty pages, it would be corrected during further vacuum cycles.
>

I think one deficiency of this solution is that it will keep on
generating extra WAL even if standby doesn't need it (as standby has
successfully truncated the relation).  I don't know if we can just get
away by saying that an additional WAL record per vacuum cycle is
harmless especially when that doesn't serve any purpose (like for the
cases when standby is always able to truncate the heap on first WAL
record).  Am I missing some part of solution which avoids it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-04 Thread Alexander Korotkov
On Mon, Sep 4, 2017 at 2:04 PM,  wrote:

> Our clients complain about this issue and therefore I want to raise the
> discussion and suggest several solutions to this problem:
>
> I. Why does PG use Fatal when Error is enough to release lock that rises
> lock conflict?
> "If (RecoveryConflictPending && DoingCommandRead)"
>
> II. Do we really need to truncate the table on hot standby exactly at the
> same time when truncate on master occurs?
>
> In my case conflict happens when the autovacuum truncates table tbl1 on
> master while backend on replica is performing a long transaction involving
> the same table tbl1. This happens because truncate takes an
> AccessExclusiveLock. To tackle this issue we have several options:
>
> 1. We can postpone the truncate on the master until all the replicas have
> finished their transactions (in this case, feedback requests to the master
> should be sent frequently)
> Patch 1
> vacuum_lazy_truncate.patch
>

I've following comments on this patch:
1) You shouldn't use ">=" to compare xids.  You should use
TransactionIdFollowsOrEquals()
function which handles transaction id wraparound correctly.
2) As I understood, this patch makes heap truncate only when no concurrent
transactions are running on both master and replicas with
hot_standby_feedback enabled.  For busy system, it would be literally
"never do heap truncate after vacuum".

2. Maybe there is an option somehow not to send AccessExclusiveLock and not
> to truncate table on the replica right away. We could try to wait a little
> and truncate tbl1 on replica again.
>
> Here is a patch that makes replica skip truncate WAL record if some
> transaction using the same table is already running on replica. And it also
> forces master to send truncate_wal to replica with actual table length
> every time autovacuum starts.
> Patch 2
> standby_truncate_skip_v1.patch
>
> In this case the transaction which is running for several hours won’t be
> interrupted because of truncate. Even if for some reason we haven’t
> truncated this table tbl1 right away, nothing terrible will happen. The
> next truncate wal record will reduce the file size by the actual length (if
> some inserts or updates have been performed on master).
>

Since you wrote this patch under on my request, let me clarify its idea
little more.

Currently, lazy_truncate_heap() is very careful on getting
AccessExclusiveLock to truncate heap.  It doesn't want either block other
transaction or wait for this lock too long.  If lock isn't acquired after
some number of tries lazy_truncate_heap() gives up.  However, once
lazy_truncate_heap() acquires AccessExclusiveLock is acquired on master, it
would be replayed on replicas where it will conflict with read-only
transactions if any.  That leads to unexpected behaviour when
hot_standby_feedback is on.

Idea of fixing that is to apply same logic of getting AccessExclusiveLock
on standby as on master: give up if somebody is holding conflicting lock
for long enough.  That allows standby to have more free pages at the end of
heap than master have.  That shouldn't lead to errors since those pages are
empty, but allows standby to waste some extra space.  In order to mitigate
this deficiency, we're generating XLOG_SMGR_TRUNCATE records more frequent:
on finish of every vacuum.  Therefore, if even standby gets some extra
space of empty pages, it would be corrected during further vacuum cycles.

@@ -397,7 +425,6 @@ lazy_vacuum_rel(Relation onerel, int options,
> VacuumParams *params,
>   appendStringInfo(, _("avg read rate: %.3f MB/s, avg write rate:
> %.3f MB/s\n"),
>   read_rate, write_rate);
>   appendStringInfo(, _("system usage: %s"), pg_rusage_show());
> -
>   ereport(LOG,
>   (errmsg_internal("%s", buf.data)));
>   pfree(buf.data);
> @@ -1820,7 +1847,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
> *vacrelstats)
>   vacrelstats->pages_removed += old_rel_pages - new_rel_pages;
>   vacrelstats->rel_pages = new_rel_pages;
>
> - ereport(elevel,
> + ereport(WARNING,
>   (errmsg("\"%s\": truncated %u to %u pages",
>   RelationGetRelationName(onerel),
>   old_rel_pages, new_rel_pages),


Ivan, what these changes are supposed to do?

diff --git a/src/backend/storage/lmgr/lock.c
> b/src/backend/storage/lmgr/lock.c
> index 2b26173..f04888e 100644
> --- a/src/backend/storage/lmgr/lock.c
> +++ b/src/backend/storage/lmgr/lock.c
> @@ -816,7 +816,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
>   XLogStandbyInfoActive())
>   {
>   LogAccessExclusiveLockPrepare();
> - log_lock = true;
> +// log_lock = true;
>   }
>
>   /*


Perhaps, it's certainly bad idea to disable replaying AccessExclusiveLock
*every time*, we should skip taking AccessExclusiveLock only while writing
XLOG_SMGR_TRUNCATE record.  Besides that, this code violates out coding
style.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-04 Thread Alexander Korotkov
On Mon, Sep 4, 2017 at 4:51 PM, Alex Ignatov 
wrote:

> In this situation this parameter (max_standby_streaming_delay) wont help
> because if you have subsequent statement on standby (following info is from
> documentation and from our experience ): Thus, if one query has resulted in
> significant delay, subsequent conflicting queries will have much less grace
> time until the standby server has caught up again. And you never now how to
> set this parameter exept to -1 which mean up to infinity delayed standby.
>
> On our experience only autovacuum on master took AccesExclusiveLock that
> raise this Fatal message on standby. After this AccessExclusive reached
> standby and max_standby_streaming_delay > -1 you definitely sooner or
> later  get this Fatal on recovery .
> With this patch we try to get rid of AccessEclusiveLock applied on standby
> while we have active statement on it.
>

+1,
Aborting read-only query on standby because of vacuum on master seems to be
rather unexpected behaviour for user when hot standby feedback is on.
I think we should work on this problem for v11.  Backpatching documentation
for previous releases would be good too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-04 Thread Alex Ignatov


-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
Sent: Monday, September 4, 2017 3:32 PM
To: i.kartys...@postgrespro.ru
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / 
proof of concept

On Mon, Sep 4, 2017 at 4:34 PM,  <i.kartys...@postgrespro.ru> wrote:
> Our clients complain about this issue and therefore I want to raise 
> the discussion and suggest several solutions to this problem:
>
> I. Why does PG use Fatal when Error is enough to release lock that 
> rises lock conflict?
> "If (RecoveryConflictPending && DoingCommandRead)"
>
> II. Do we really need to truncate the table on hot standby exactly at 
> the same time when truncate on master occurs?
>
> In my case conflict happens when the autovacuum truncates table tbl1 
> on master while backend on replica is performing a long transaction 
> involving the same table tbl1. This happens because truncate takes an 
> AccessExclusiveLock. To tackle this issue we have several options:
>
> 1. We can postpone the truncate on the master until all the replicas 
> have finished their transactions (in this case, feedback requests to 
> the master should be sent frequently) Patch 1 
> vacuum_lazy_truncate.patch
>
> 2. Maybe there is an option somehow not to send AccessExclusiveLock 
> and not to truncate table on the replica right away. We could try to 
> wait a little and truncate tbl1 on replica again.
>

Can max_standby_streaming_delay help in this situation (point number - 2)?


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Hello!
In this situation this parameter (max_standby_streaming_delay) wont help 
because if you have subsequent statement on standby (following info is from 
documentation and from our experience ): Thus, if one query has resulted in 
significant delay, subsequent conflicting queries will have much less grace 
time until the standby server has caught up again. And you never now how to set 
this parameter exept to -1 which mean up to infinity delayed standby. 

On our experience only autovacuum on master took AccesExclusiveLock that raise 
this Fatal message on standby. After this AccessExclusive reached standby and 
max_standby_streaming_delay > -1 you definitely sooner or later  get this Fatal 
on recovery . 
With this patch we try to get rid of AccessEclusiveLock applied on standby 
while we have active statement on it.



--
Alex Ignatov 
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-04 Thread Amit Kapila
On Mon, Sep 4, 2017 at 4:34 PM,   wrote:
> Our clients complain about this issue and therefore I want to raise the
> discussion and suggest several solutions to this problem:
>
> I. Why does PG use Fatal when Error is enough to release lock that rises
> lock conflict?
> "If (RecoveryConflictPending && DoingCommandRead)"
>
> II. Do we really need to truncate the table on hot standby exactly at the
> same time when truncate on master occurs?
>
> In my case conflict happens when the autovacuum truncates table tbl1 on
> master while backend on replica is performing a long transaction involving
> the same table tbl1. This happens because truncate takes an
> AccessExclusiveLock. To tackle this issue we have several options:
>
> 1. We can postpone the truncate on the master until all the replicas have
> finished their transactions (in this case, feedback requests to the master
> should be sent frequently)
> Patch 1
> vacuum_lazy_truncate.patch
>
> 2. Maybe there is an option somehow not to send AccessExclusiveLock and not
> to truncate table on the replica right away. We could try to wait a little
> and truncate tbl1 on replica again.
>

Can max_standby_streaming_delay help in this situation (point number - 2)?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers