subject:"Re\: \[HACKERS\] WIP\: long transactions on hot standby feedback replica \/ proof of concept"

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2019-04-22 Thread Alexander Korotkov

On Thu, Nov 29, 2018 at 3:44 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Fri, Aug 24, 2018 at 5:53 PM Alexander Korotkov 
> >  wrote:
> >
> > Given I've no feedback on this idea yet, I'll try to implement a PoC
> > patch for that.  It doesn't look to be difficult.  And we'll see how
> > does it work.
>
> Unfortunately, current version of the patch doesn't pass the tests and fails 
> on
> initdb. But maybe you already have this PoC you were talking about, that will
> also incorporate the feedback from this thread? For now I'll move it to the
> next CF.

Finally, I managed to write a PoC.

If look at the list of problems I've enumerated in [1], this PoC is
aimed for 1 and 3.

> 1) Data corruption on file truncation error (explained in [1]).
> 2) Expensive scanning of the whole shared buffers before file truncation.
> 3) Cancel of read-only queries on standby even if hot_standby_feedback
> is on, caused by replication of AccessExclusiveLock.

2 is pretty independent problem and could be addressed later.

Basically, this patch does following:
1. Introduces new flag BM_DIRTY_BARRIER, which prevents dirty buffer
from being written out.
2. Implements two-phase truncation of node buffers.  First phase is
prior to file truncation and marks past truncation point dirty buffers
as BM_DIRTY_BARRIER.  Second phase is post file truncation and
actually wipes out past truncation point buffers.
3. On exception happen during file truncation, BM_DIRTY_BARRIER flag
will be released from buffers.  Thus, no data corruption should
happens here.  If file truncation was partially complete, then file
might be extended by write of dirty buffer.  I'm not sure how likely
is it, but extension could lead to the errors again.  But this still
shouldn't cause a data corruption.
4. Having too many buffers marked as BM_DIRTY_BARRIER, would paralyze
buffer manager.  This is why we're keeping not more than NBuffers/2 to
be marked as BM_DIRTY_BARRIER.  If limit is exceeded, then dirty
buffers are just written at the first phase.
5. lazy_truncate_heap() now takes ExclusiveLock instead of
AccessExclusiveLock.  This part is not really complete.  At least, we
need to ensure that past truncation point reads, caused by real-only
queries concurrent to truncation, don't lead to real errors.

Any thoughts?

1. 
https://www.postgresql.org/message-id/CAPpHfdtD3U2DpGZQJNe21s9s1s-Va7NRNcP1isvdCuJxzYypcg%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-POC-fix-relation-truncation-1.patch
Description: Binary data

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2019-01-31 Thread Andres Freund

Hi,

On 2018-11-29 13:45:15 +0100, Dmitry Dolgov wrote:
> > On Fri, Aug 24, 2018 at 5:53 PM Alexander Korotkov 
> >  wrote:
> >
> > Given I've no feedback on this idea yet, I'll try to implement a PoC
> > patch for that.  It doesn't look to be difficult.  And we'll see how
> > does it work.
> 
> Unfortunately, current version of the patch doesn't pass the tests and fails 
> on
> initdb. But maybe you already have this PoC you were talking about, that will
> also incorporate the feedback from this thread? For now I'll move it to the
> next CF.

Given that nothing has happened since, I'm going to close this CF entry
with returned as feedback.

Greetings,

Andres Freund

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-11-29 Thread Dmitry Dolgov

> On Fri, Aug 24, 2018 at 5:53 PM Alexander Korotkov 
>  wrote:
>
> Given I've no feedback on this idea yet, I'll try to implement a PoC
> patch for that.  It doesn't look to be difficult.  And we'll see how
> does it work.

Unfortunately, current version of the patch doesn't pass the tests and fails on
initdb. But maybe you already have this PoC you were talking about, that will
also incorporate the feedback from this thread? For now I'll move it to the
next CF.

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-09-02 Thread Robert Haas

On Mon, Aug 27, 2018 at 11:38 AM, Alexander Korotkov
 wrote:
> The aspect I'm more concerned here about is whether we miss ability
> for detecting some of IO errors, if we don't distinguish new pages
> from pages whose tuples were removed by vacuum.

My main concern is correctness.  If we ever have valid-looking buffers
in shared_buffers after the corresponding data has been truncated away
on disk, we've got to make sure that nobody ever confuses one of them
with an actually-valid buffer.  Reading over your algorithm, I can't
convince myself that you have that case nailed down tightly enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-27 Thread Alexander Korotkov

Hi!

Thank you for feedback.

On Sun, Aug 26, 2018 at 4:09 AM Robert Haas  wrote:
> On Tue, Aug 21, 2018 at 9:10 AM, Alexander Korotkov
>  wrote:
> > After heap truncation using this algorithm, shared buffers may contain
> > past-OEF buffers.  But those buffers are empty (no used items) and
> > clean.  So, real-only queries shouldn't hint those buffers dirty
> > because there are no used items.  Normally, these buffers will be just
> > evicted away from the shared buffer arena.  If relation extension will
> > happen short after heap truncation then some of those buffers could be
> > found after relation extension.  I think this situation could be
> > handled.  For instance, we can teach vacuum to claim page as new once
> > all the tuples were gone.
>
> I think this all sounds pretty dangerous and fragile, especially in
> view of the pluggable storage work.  If we start to add new storage
> formats, deductions based on the specifics of the current heap's
> hint-bit behavior may turn out not to be valid.  Now maybe you could
> speculate that it won't matter because perhaps truncation will work
> differently in other storage formats too, but it doesn't sound to me
> like we'd be wise to bet on it working out that way.

Hmm, I'm not especially concerned about pluggable storages here.
Pluggable storages are deciding themselves how do they manage vacuum
including relation truncation if needed.  They might reuse or not
reuse function for relation truncation, which we have for heap.  The
thing we should do for that relation truncation function is
understandable and predictable interface.  So, if relation truncation
function cuts relation tailing pages, which are previously cleaned as
new.  For me, that looks fair enough.

The aspect I'm more concerned here about is whether we miss ability
for detecting some of IO errors, if we don't distinguish new pages
from pages whose tuples were removed by vacuum.

> IIRC, Andres had some patches revising shared buffer management that
> allowed the ending block number to be maintained in shared memory.
> I'm waving my hands here, but with that kind of a system you can
> imagine that maybe there could also be a flag bit indicating whether a
> truncation is in progress.  So, reduce the number of page and set the
> bit; then zap all the pages above that value that are still present in
> shared_buffers; then clear the bit.  Or maybe we don't even need the
> bit, but I think we do need some kind of race-free mechanism to make
> sure that we never try to read pages that either have been truncated
> away or in the process of being truncated away.

If we would have some pre-relation shared memory information, then we
can make it work even without special write barrier bit.  Instead we
can place a mark to the whole relation, which would say "please hold
on with writes past following pending truncation point".  Also having
ending block number in the shared memory can save us from trying to
read block past EOF.  So, I'm sure that when Andres work for revising
shared buffer management will be complete, we would be able to solve
these problems better.  But there is also a question of time.  As I
get, revising shared buffer management could not realistically get
committed to PostgreSQL 12.  And we have pretty nasty set of problems
here.  For me it would be nice to do something with them during this
release cycle.  But for sure, we should keep in the mind how this
solution should be revising once we have new shared buffer management.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-25 Thread Robert Haas

On Tue, Aug 21, 2018 at 9:10 AM, Alexander Korotkov
 wrote:
> After heap truncation using this algorithm, shared buffers may contain
> past-OEF buffers.  But those buffers are empty (no used items) and
> clean.  So, real-only queries shouldn't hint those buffers dirty
> because there are no used items.  Normally, these buffers will be just
> evicted away from the shared buffer arena.  If relation extension will
> happen short after heap truncation then some of those buffers could be
> found after relation extension.  I think this situation could be
> handled.  For instance, we can teach vacuum to claim page as new once
> all the tuples were gone.

I think this all sounds pretty dangerous and fragile, especially in
view of the pluggable storage work.  If we start to add new storage
formats, deductions based on the specifics of the current heap's
hint-bit behavior may turn out not to be valid.  Now maybe you could
speculate that it won't matter because perhaps truncation will work
differently in other storage formats too, but it doesn't sound to me
like we'd be wise to bet on it working out that way.

IIRC, Andres had some patches revising shared buffer management that
allowed the ending block number to be maintained in shared memory.
I'm waving my hands here, but with that kind of a system you can
imagine that maybe there could also be a flag bit indicating whether a
truncation is in progress.  So, reduce the number of page and set the
bit; then zap all the pages above that value that are still present in
shared_buffers; then clear the bit.  Or maybe we don't even need the
bit, but I think we do need some kind of race-free mechanism to make
sure that we never try to read pages that either have been truncated
away or in the process of being truncated away.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-24 Thread Alexander Korotkov

On Tue, Aug 21, 2018 at 4:10 PM Alexander Korotkov
 wrote:
> After reading [1] and [2] I got that there are at least 3 different
> issues with heap truncation:
> 1) Data corruption on file truncation error (explained in [1]).
> 2) Expensive scanning of the whole shared buffers before file truncation.
> 3) Cancel of read-only queries on standby even if hot_standby_feedback
> is on, caused by replication of AccessExclusiveLock.
>
> It seems that fixing any of those issues requires redesign of heap
> truncation.  So, ideally redesign of heap truncation should fix all
> the issues of above.  Or at least it should be understood how the rest
> of issues can be fixed later using the new design.
>
> I would like to share some my sketchy thoughts about new heap
> truncation design.  Let's imagine we introduced dirty_barrier buffer
> flag, which prevents dirty buffer from being written (and
> correspondingly evicted).  Then truncation algorithm could look like
> this:
>
> 1) Acquire ExclusiveLock on relation.
> 2) Calculate truncation point using count_nondeletable_pages(), while
> simultaneously placing dirty_barrier flag on dirty buffers and saving
> their numbers to array.  Assuming no writes are performing
> concurrently, no to-be-truncated-away pages should be written from
> this point.
> 3) Truncate data files.
> 4) Iterate past truncation point buffers and clean dirty and
> dirty_barrier flags from them (using numbers we saved to array on step
> #2).
> 5) Release relation lock.
> *) On exception happen after step #2, iterate past truncation point
> buffers and clean dirty_barrier flags from them (using numbers we
> saved to array on step #2)
>
> After heap truncation using this algorithm, shared buffers may contain
> past-OEF buffers.  But those buffers are empty (no used items) and
> clean.  So, real-only queries shouldn't hint those buffers dirty
> because there are no used items.  Normally, these buffers will be just
> evicted away from the shared buffer arena.  If relation extension will
> happen short after heap truncation then some of those buffers could be
> found after relation extension.  I think this situation could be
> handled.  For instance, we can teach vacuum to claim page as new once
> all the tuples were gone.
>
> We're taking only exclusive lock here.  And assuming we will teach our
> scans to treat page-past-OEF situation as no-visible-tuples-found,
> concurrent read-only queries will work concurrently with heap
> truncate.  Also we don't have to scan whole shared buffers, only past
> truncation point buffers are scanned at step #2.  Later flags are
> cleaned only from truncation point dirty buffers.  Data corruption on
> truncation error also shouldn't happen as well, because we don't
> forget to write any dirty buffers before insure that data files were
> successfully truncated.
>
> The problem I see with this approach so far is placing too many
> dirty_barrier flags can affect concurrent activity.  In order to cope
> that we may, for instance, truncate relation in multiple iterations
> when we find too many past truncation point dirty buffers.
>
> Any thoughts?

Given I've no feedback on this idea yet, I'll try to implement a PoC
patch for that.  It doesn't look to be difficult.  And we'll see how
does it work.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-21 Thread Alexander Korotkov

On Fri, Aug 17, 2018 at 9:55 PM Tom Lane  wrote:
> But then you are injecting bad pages into the shared buffer arena.
> In any case, depending on that behavior seems like a bad idea, because
> it's a pretty questionable kluge in itself.
>
> Another point is that the truncation code attempts to remove all
> to-be-truncated-away pages from the shared buffer arena, but that only
> works if nobody else is loading such pages into shared buffers
> concurrently.  In the presence of concurrent scans, we might be left
> with valid-looking buffers for pages that have been truncated away
> on-disk.  That could cause all sorts of fun later.  Yeah, the buffers
> should contain only dead tuples ... but, for example, they might not
> be hinted dead.  If somebody sets one of those hint bits and then
> writes the buffer back out to disk, you've got real problems.

I didn't yet get how that's possible... count_nondeletable_pages()
ensures that every to-be-truncated-away page doesn't contain any used
item.  From reading [1] I got that we might have even live tuples if
truncation was failed.  But if truncation wasn't failed, we shouldn't
get this hint problem.  Please, correct me if I'm wrong.

After reading [1] and [2] I got that there are at least 3 different
issues with heap truncation:
1) Data corruption on file truncation error (explained in [1]).
2) Expensive scanning of the whole shared buffers before file truncation.
3) Cancel of read-only queries on standby even if hot_standby_feedback
is on, caused by replication of AccessExclusiveLock.

It seems that fixing any of those issues requires redesign of heap
truncation.  So, ideally redesign of heap truncation should fix all
the issues of above.  Or at least it should be understood how the rest
of issues can be fixed later using the new design.

I would like to share some my sketchy thoughts about new heap
truncation design.  Let's imagine we introduced dirty_barrier buffer
flag, which prevents dirty buffer from being written (and
correspondingly evicted).  Then truncation algorithm could look like
this:

1) Acquire ExclusiveLock on relation.
2) Calculate truncation point using count_nondeletable_pages(), while
simultaneously placing dirty_barrier flag on dirty buffers and saving
their numbers to array.  Assuming no writes are performing
concurrently, no to-be-truncated-away pages should be written from
this point.
3) Truncate data files.
4) Iterate past truncation point buffers and clean dirty and
dirty_barrier flags from them (using numbers we saved to array on step
#2).
5) Release relation lock.
*) On exception happen after step #2, iterate past truncation point
buffers and clean dirty_barrier flags from them (using numbers we
saved to array on step #2)

After heap truncation using this algorithm, shared buffers may contain
past-OEF buffers.  But those buffers are empty (no used items) and
clean.  So, real-only queries shouldn't hint those buffers dirty
because there are no used items.  Normally, these buffers will be just
evicted away from the shared buffer arena.  If relation extension will
happen short after heap truncation then some of those buffers could be
found after relation extension.  I think this situation could be
handled.  For instance, we can teach vacuum to claim page as new once
all the tuples were gone.

We're taking only exclusive lock here.  And assuming we will teach our
scans to treat page-past-OEF situation as no-visible-tuples-found,
concurrent read-only queries will work concurrently with heap
truncate.  Also we don't have to scan whole shared buffers, only past
truncation point buffers are scanned at step #2.  Later flags are
cleaned only from truncation point dirty buffers.  Data corruption on
truncation error also shouldn't happen as well, because we don't
forget to write any dirty buffers before insure that data files were
successfully truncated.

The problem I see with this approach so far is placing too many
dirty_barrier flags can affect concurrent activity.  In order to cope
that we may, for instance, truncate relation in multiple iterations
when we find too many past truncation point dirty buffers.

Any thoughts?

Links.
1. 
https://www.postgresql.org/message-id/flat/5BBC590AE8DF4ED1A170E4D48F1B53AC%40tunaPC
2. 
https://www.postgresql.org/message-id/flat/CAHGQGwE5UqFqSq1%3DkV3QtTUtXphTdyHA-8rAj4A%3DY%2Be4kyp3BQ%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Tom Lane

Alexander Korotkov  writes:
> On Fri, Aug 17, 2018 at 9:55 PM Tom Lane  wrote:
>> Another point is that the truncation code attempts to remove all
>> to-be-truncated-away pages from the shared buffer arena, but that only
>> works if nobody else is loading such pages into shared buffers
>> concurrently.  In the presence of concurrent scans, we might be left
>> with valid-looking buffers for pages that have been truncated away
>> on-disk.  That could cause all sorts of fun later.  Yeah, the buffers
>> should contain only dead tuples ... but, for example, they might not
>> be hinted dead.  If somebody sets one of those hint bits and then
>> writes the buffer back out to disk, you've got real problems.

> Thank you for the explanation.  I see that injecting past OEF pages
> into shared buffers doesn't look good.  I start thinking about letting
> caller of ReadBuffer() (or its variation) handle past OEF situation.

That'd still have the same race condition, though: between the time
we start to drop the doomed pages from shared buffers, and the time
we actually perform ftruncate, concurrent scans could re-load such
pages into shared buffers.

Could it work to ftruncate first and flush shared buffers after?
Probably not, I think the write-back-dirty-hint-bits scenario
breaks that one.

If this were easy, we'd have fixed it years ago :-(.  It'd sure
be awfully nice not to need AEL during autovacuum, even transiently;
but I'm not sure how we get there without adding an unpleasant amount
of substitute locking in table scans.

regards, tom lane

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Alexander Korotkov

On Fri, Aug 17, 2018 at 9:55 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Fri, Aug 17, 2018 at 8:38 PM Tom Lane  wrote:
> >> Alexander Korotkov  writes:
> >>> Yes, that's correct.  On standby read-only queries can tolerate
> >>> concurrent heap truncation.
>
> >> Uh, what???
>
> > VACUUM truncates heap relation only after deletion of all the tuples
> > from tail pages.
>
> Right; the problem I'm worried about is possible accesses to
> already-removed pages by concurrent scans.
>
> > And in md.c we already have logic to return zeroed pages, when trying
> > to read past relation in recovery.
>
> But then you are injecting bad pages into the shared buffer arena.
> In any case, depending on that behavior seems like a bad idea, because
> it's a pretty questionable kluge in itself.
>
> Another point is that the truncation code attempts to remove all
> to-be-truncated-away pages from the shared buffer arena, but that only
> works if nobody else is loading such pages into shared buffers
> concurrently.  In the presence of concurrent scans, we might be left
> with valid-looking buffers for pages that have been truncated away
> on-disk.  That could cause all sorts of fun later.  Yeah, the buffers
> should contain only dead tuples ... but, for example, they might not
> be hinted dead.  If somebody sets one of those hint bits and then
> writes the buffer back out to disk, you've got real problems.
>
> Perhaps there's some gold to be mined by treating truncation locks
> differently from other AELs, but I don't think you can just ignore
> them on the standby, any more than they can be ignored on the master.

Thank you for the explanation.  I see that injecting past OEF pages
into shared buffers doesn't look good.  I start thinking about letting
caller of ReadBuffer() (or its variation) handle past OEF situation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Tom Lane

Alexander Korotkov  writes:
> On Fri, Aug 17, 2018 at 8:38 PM Tom Lane  wrote:
>> Alexander Korotkov  writes:
>>> Yes, that's correct.  On standby read-only queries can tolerate
>>> concurrent heap truncation.

>> Uh, what???

> VACUUM truncates heap relation only after deletion of all the tuples
> from tail pages.

Right; the problem I'm worried about is possible accesses to
already-removed pages by concurrent scans.

> And in md.c we already have logic to return zeroed pages, when trying
> to read past relation in recovery.

But then you are injecting bad pages into the shared buffer arena.
In any case, depending on that behavior seems like a bad idea, because
it's a pretty questionable kluge in itself.

Another point is that the truncation code attempts to remove all
to-be-truncated-away pages from the shared buffer arena, but that only
works if nobody else is loading such pages into shared buffers
concurrently.  In the presence of concurrent scans, we might be left
with valid-looking buffers for pages that have been truncated away
on-disk.  That could cause all sorts of fun later.  Yeah, the buffers
should contain only dead tuples ... but, for example, they might not
be hinted dead.  If somebody sets one of those hint bits and then
writes the buffer back out to disk, you've got real problems.

Perhaps there's some gold to be mined by treating truncation locks
differently from other AELs, but I don't think you can just ignore
them on the standby, any more than they can be ignored on the master.

regards, tom lane

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Alexander Korotkov

On Fri, Aug 17, 2018 at 8:38 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Fri, Aug 17, 2018 at 6:41 PM Andres Freund  wrote:
> >> There's another patch, which I thought Alexander was referring to, that
> >> does something a bit smarger.  On a super short skim it seems to
> >> introduce a separate type of AEL lock that's not replicated, by my
> >> reading?
>
> > Yes, that's correct.  On standby read-only queries can tolerate
> > concurrent heap truncation.
>
> Uh, what???

VACUUM truncates heap relation only after deletion of all the tuples
from tail pages.  So, on standby heap truncation record would be
replayed only after heap tuples deletion records are replayed.
Therefore, if query on standby should see some of those tuples, WAL
replay stop (or query cancel) should happened before corresponding
tuples being deleted by our recovery conflict with snapshot logic.
When we're going to replay heap truncation record, no query should see
records in the tail pages.

And in md.c we already have logic to return zeroed pages, when trying
to read past relation in recovery.

/*
 * Short read: we are at or past EOF, or we read a partial block at
 * EOF.  Normally this is an error; upper levels should never try to
 * read a nonexistent block.  However, if zero_damaged_pages is ON or
 * we are InRecovery, we should instead return zeroes without
 * complaining.  This allows, for example, the case of trying to
 * update a block that was later truncated away.
 */
if (zero_damaged_pages || InRecovery)
MemSet(buffer, 0, BLCKSZ);
else
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("could not read block %u in file \"%s\":
read only %d of %d bytes",
blocknum, FilePathName(v->mdfd_vfd),
nbytes, BLCKSZ)));

But, I'm concerned if FileSeek() could return error.  And also what
_mdfd_getseg() would do on truncated segment.  It seems that in
recovery, it will automatically extend the relation.  That
unacceptable for this purpose.  So, sorry for bothering, this patch
definitely needs to be revised.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Tom Lane

Alexander Korotkov  writes:
> On Fri, Aug 17, 2018 at 6:41 PM Andres Freund  wrote:
>> There's another patch, which I thought Alexander was referring to, that
>> does something a bit smarger.  On a super short skim it seems to
>> introduce a separate type of AEL lock that's not replicated, by my
>> reading?

> Yes, that's correct.  On standby read-only queries can tolerate
> concurrent heap truncation.

Uh, what???

regards, tom lane

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Alexander Korotkov

On Fri, Aug 17, 2018 at 6:41 PM Andres Freund  wrote:
> On 2018-08-17 11:35:40 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2018-08-17 18:00:20 +0300, Alexander Korotkov wrote:
> > >> So, do we have any objections to committing this?
> >
> > > I think this needs more review by other senior hackers in the community.
> >
> > TBH it sounds like a horrible hack.  Disable vacuum truncation?
>
> There's another patch, which I thought Alexander was referring to, that
> does something a bit smarger.  On a super short skim it seems to
> introduce a separate type of AEL lock that's not replicated, by my
> reading?

Yes, that's correct.  On standby read-only queries can tolerate
concurrent heap truncation.  So, there is no point to replicate AEL to
standby.  Previous versions of patch tries to do that by introducing
some flag.  But it appears that correct way to do this is just new
non-replicated type of AEL lock.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Andres Freund

On 2018-08-17 11:35:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-08-17 18:00:20 +0300, Alexander Korotkov wrote:
> >> So, do we have any objections to committing this?
> 
> > I think this needs more review by other senior hackers in the community.
> 
> TBH it sounds like a horrible hack.  Disable vacuum truncation?

There's another patch, which I thought Alexander was referring to, that
does something a bit smarger.  On a super short skim it seems to
introduce a separate type of AEL lock that's not replicated, by my
reading?

Greetings,

Andres Freund

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Tom Lane

Andres Freund  writes:
> On 2018-08-17 18:00:20 +0300, Alexander Korotkov wrote:
>> So, do we have any objections to committing this?

> I think this needs more review by other senior hackers in the community.

TBH it sounds like a horrible hack.  Disable vacuum truncation?
That can't be a good idea.  If it were something we were doing
behind the scenes as a temporary expedient, maybe we could hold
our noses and do it for awhile.  But once you introduce a reloption
based on this, you can't ever get rid of it.

I think we need to spend more effort looking for a less invasive,
less compromised solution.

regards, tom lane

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Andres Freund

Hi,

On 2018-08-17 18:00:20 +0300, Alexander Korotkov wrote:
> So, do we have any objections to committing this?

I think this needs more review by other senior hackers in the community.

Greetings,

Andres Freund

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-17 Thread Alexander Korotkov

On Thu, Aug 16, 2018 at 2:16 PM Alexander Korotkov
 wrote:
> On Tue, Aug 14, 2018 at 12:05 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Feb 28, 2018 at 11:24 PM, Ivan Kartyshov
> >  wrote:
> > > The main goal of my changes is to let long read-only transactions run on
> > > replica if hot_standby_feedback is turned on.
> >
> > FWIW the idea proposed on the thread[1], which allows us to disable
> > the heap truncation by vacuum per tables, might help this issue. Since
> > the original problem on that thread is a performance problem an
> > alternative proposal would be better, but I think the way would be
> > helpful for this issue too and is simple. A downside of that idea is
> > that there is additional work for user to configure the reloption to
> > each tables.
>
> Thank you for pointing this thread.  It's possible to cope this
> downside to introduce GUC which would serve as default, when reloption
> is not defined.  But real downside is inability to truncate the heap.
> So, options are good if they provide workaround for users, but that
> shouldn't stop us from fixing issues around heap truncation.

So, do we have any objections to committing this?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-16 Thread Alexander Korotkov

Hi!

On Tue, Aug 14, 2018 at 12:05 PM Masahiko Sawada  wrote:
>
> On Wed, Feb 28, 2018 at 11:24 PM, Ivan Kartyshov
>  wrote:
> > The main goal of my changes is to let long read-only transactions run on
> > replica if hot_standby_feedback is turned on.
>
> FWIW the idea proposed on the thread[1], which allows us to disable
> the heap truncation by vacuum per tables, might help this issue. Since
> the original problem on that thread is a performance problem an
> alternative proposal would be better, but I think the way would be
> helpful for this issue too and is simple. A downside of that idea is
> that there is additional work for user to configure the reloption to
> each tables.

Thank you for pointing this thread.  It's possible to cope this
downside to introduce GUC which would serve as default, when reloption
is not defined.  But real downside is inability to truncate the heap.
So, options are good if they provide workaround for users, but that
shouldn't stop us from fixing issues around heap truncation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-14 Thread Masahiko Sawada

On Wed, Feb 28, 2018 at 11:24 PM, Ivan Kartyshov
 wrote:
> Thank you for your valuable comments. I've made a few adjustments.
>

Thank you for working on this!

> The main goal of my changes is to let long read-only transactions run on
> replica if hot_standby_feedback is turned on.

FWIW the idea proposed on the thread[1], which allows us to disable
the heap truncation by vacuum per tables, might help this issue. Since
the original problem on that thread is a performance problem an
alternative proposal would be better, but I think the way would be
helpful for this issue too and is simple. A downside of that idea is
that there is additional work for user to configure the reloption to
each tables.

[1] 
https://www.postgresql.org/message-id/CAHGQGwE5UqFqSq1%3DkV3QtTUtXphTdyHA-8rAj4A%3DY%2Be4kyp3BQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-11 Thread Alexander Korotkov

Hi!

On Fri, Aug 10, 2018 at 5:07 PM Alexander Korotkov
 wrote:
> On Thu, Aug 9, 2018 at 11:26 PM Ivan Kartyshov
>  wrote:
> > Alexander Korotkov писал 2018-06-20 20:54:
> > > Thinking about that more I found that adding vacuum mark as an extra
> > > argument to LockAcquireExtended is also wrong.  It would be still hard
> > > to determine if we should log the lock in LogStandbySnapshot().  We'll
> > > have to add that flag to shared memory table of locks.  And that looks
> > > like a kludge.
> > >
> > > Therefore, I'd like to propose another approach: introduce new lock
> > > level.  So, AccessExclusiveLock will be split into two
> > > AccessExclusiveLocalLock and AccessExclusiveLock.  In spite of
> > > AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to
> > > standby, and used for heap truncation.
> > >
> > > I expect some resistance to my proposal, because fixing this "small
> > > bug" doesn't deserve new lock level.  But current behavior of
> > > hot_standby_feedback is buggy.  From user prospective,
> > > hot_standby_feedback option is just non-worker, which causes master to
> > > bloat without protection for standby queries from cancel.  We need to
> > > fix that, but I don't see other proper way to do that except adding
> > > new lock level...
> >
> > Your offer is very interesting, it made patch smaller and more
> > beautiful.
> > So I update patch and made changes accordance with the proposed concept
> > of
> > special AccessExclusiveLocalLock.
>
> > I would like to here you opinion over this implementation.
>
> In general this patch looks good for me.  It seems that comments and
> documentation need minor enhancements.  I'll make them and post the
> revised patch.

Find the revised patch attached.  It contains some enhancements in
comments, formatting and documentation.  BTW, I decided that we should
enumerate ACCESS EXCLUSIVE LOCAL lock before ACCESS EXCLUSIVE lock,
because we enumerate lock on ascending strength.  So, since ACCESS
EXCLUSIVE is WAL-logged, it's definitely "stronger".

I think that introduction of new lock level is significant change and
can't be backpatched.  But I think it worth to backpatch a note to the
documentation, which clarifies why hot_standby_feedback might have
unexpected behavior.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


hsfeedback_locallock-2.patch
Description: Binary data


hsfeedback_backpatch_note.patch
Description: Binary data

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-10 Thread Alexander Korotkov

Hi!

On Thu, Aug 9, 2018 at 11:26 PM Ivan Kartyshov
 wrote:
> Alexander Korotkov писал 2018-06-20 20:54:
> > Thinking about that more I found that adding vacuum mark as an extra
> > argument to LockAcquireExtended is also wrong.  It would be still hard
> > to determine if we should log the lock in LogStandbySnapshot().  We'll
> > have to add that flag to shared memory table of locks.  And that looks
> > like a kludge.
> >
> > Therefore, I'd like to propose another approach: introduce new lock
> > level.  So, AccessExclusiveLock will be split into two
> > AccessExclusiveLocalLock and AccessExclusiveLock.  In spite of
> > AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to
> > standby, and used for heap truncation.
> >
> > I expect some resistance to my proposal, because fixing this "small
> > bug" doesn't deserve new lock level.  But current behavior of
> > hot_standby_feedback is buggy.  From user prospective,
> > hot_standby_feedback option is just non-worker, which causes master to
> > bloat without protection for standby queries from cancel.  We need to
> > fix that, but I don't see other proper way to do that except adding
> > new lock level...
>
> Your offer is very interesting, it made patch smaller and more
> beautiful.
> So I update patch and made changes accordance with the proposed concept
> of
> special AccessExclusiveLocalLock.

> I would like to here you opinion over this implementation.

In general this patch looks good for me.  It seems that comments and
documentation need minor enhancements.  I'll make them and post the
revised patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-09 Thread Ivan Kartyshov


Hello, thank you for your review.

Alexander Korotkov писал 2018-06-20 20:54:

Thinking about that more I found that adding vacuum mark as an extra
argument to LockAcquireExtended is also wrong.  It would be still hard
to determine if we should log the lock in LogStandbySnapshot().  We'll
have to add that flag to shared memory table of locks.  And that looks
like a kludge.

Therefore, I'd like to propose another approach: introduce new lock
level.  So, AccessExclusiveLock will be split into two
AccessExclusiveLocalLock and AccessExclusiveLock.  In spite of
AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to
standby, and used for heap truncation.

I expect some resistance to my proposal, because fixing this "small
bug" doesn't deserve new lock level.  But current behavior of
hot_standby_feedback is buggy.  From user prospective,
hot_standby_feedback option is just non-worker, which causes master to
bloat without protection for standby queries from cancel.  We need to
fix that, but I don't see other proper way to do that except adding
new lock level...


Your offer is very interesting, it made patch smaller and more 
beautiful.
So I update patch and made changes accordance with the proposed concept 
of

special AccessExclusiveLocalLock.


I don't yet understand what is the problem here and why this patch
should solve that.  As I get idea of this patch, GetOldestXmin() will
initialize xmin of walsender with lowest xmin of replication slot.
But xmin of replication slots will be anyway taken into account by
GetSnapshotData() called by vacuum.


How to test:
Make async replica, turn on feedback, reduce 
max_standby_streaming_delay

and aggressive autovacuum.


You forgot to mention, that one should setup the replication using
replication slot.  Naturally, if replication slot isn't exist, then
master shouldn't keep dead tuples for disconnected standby.  Because
master doesn't know if standby will reconnect or is it gone forever.


I tried to solve hot_standby_feedback without replication slots,
so I found a little window of possibility of case, when vacuum on
master make heap truncation of pages that standby needs for just started
transaction.

How to test:
Make async replica, turn on feedback and reduce
max_standby_streaming_delay.
Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0



Test1:
Here we will do a load on the master and simulation of  a long
transaction with repeated 1 second SEQSCANS on the replica (by  calling
pg_sleep 1 second duration every 6 seconds).
MASTERREPLICA
 hot_standby = on
 max_standby_streaming_delay = 1s
 hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
 start
 BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
 SELECT pg_sleep(value) FROM test;
 \watch 6



---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.



On Patched version lazy_vacuum_truncation passed without fatal errors.



Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or
increase max_standby_streaming_delay.



Test2:
Here we will do a load on the master and simulation of  a long
transaction on the replica (by  taking LOCK on table)
MASTERREPLICA
 hot_standby = on
 max_standby_streaming_delay = 1s
 hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1)
id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
 start
 BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
 LOCK TABLE test IN ACCESS SHARE MODE;
 select * from test;
 \watch 6



---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

I would like to here you opinion over this implementation.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 73934e5..73975cc 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -854,8 +854,8 @@ ERROR:  could not serialize access due to read/write dependencies among transact


 
- Conflicts with the ACCESS EXCLUSIVE lock
- mode only.
+ Conflicts with the ACCESS EXCLUSIVE and
+

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-07-29 Thread Thomas Munro

On Thu, Mar 1, 2018 at 3:24 AM, Ivan Kartyshov
 wrote:
> Could you give me your ideas over these patches.

Hi Ivan,

Not sure if this is expected at this stage or not, but just in case
you didn't know... the tests under src/test/subscription seem to be
broken:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/409358734

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-06-20 Thread Alexander Korotkov

Hi!

Thank you for your work on this issue.

On Wed, Feb 28, 2018 at 5:25 PM Ivan Kartyshov
 wrote:
> Thank you for your valuable comments. I've made a few adjustments.
>
> The main goal of my changes is to let long read-only transactions run on
> replica if hot_standby_feedback is turned on.
>
>
> Patch1 - hsfeedback_av_truncate.patch is made to stop
> ResolveRecoveryConflictWithLock occurs on replica, after autovacuum lazy
> truncates heap on master cutting some pages at the end. When
> hot_standby_feedback is on, we know that the autovacuum does not remove
> anything superfluous, which could be needed on standby, so there is no
> need to rise any ResolveRecoveryConflict*.
>
> 1) Add to xl_standby_locks and xl_smgr_truncate isautovacuum flag, which
> tells us that autovacuum generates them.
>
> 2) When autovacuum decides to trim the table (using lazy_truncate_heap),
> it takes AccessExclusiveLock and sends this lock to the replica, but
> replica should ignore  AccessExclusiveLock if hot_standby_feedback=on.
>
> 3) When autovacuum truncate wal message is replayed on a replica, it
> takes ExclusiveLock on a table, so as not to interfere with read-only
> requests.

I see that you use IsAutoVacuumWorkerProcess() function to determine
if this access exclusive lock is caused by vacuum.  And I see at least
two issues with that.

1) That wouldn't work for manually run vacuums.  I understand stat
query cancel caused by autovacuum is probably most annoying thing.
But unnecessary partial bug fix is undesirable.
2) Access exclusive locks are logged not only immediately by the
process taken that, but also all such locks are logged in
LogStandbySnapshot().  LogStandbySnapshot() is called by checkpointer,
bgwriter and others.  So, IsAutoVacuumWorkerProcess() wouldn't help in
such cases.  I understand that heap truncation is typically fast. And
probability that some process will dump all the access exclusive locks
while truncation is in progress is low.  But nevertheless we need to
handle that properly.

Based on this, I think we should give up with using
IsAutoVacuumWorkerProcess() to determine locks caused by vacuum.

Thinking about that more I found that adding vacuum mark as an extra
argument to LockAcquireExtended is also wrong.  It would be still hard
to determine if we should log the lock in LogStandbySnapshot().  We'll
have to add that flag to shared memory table of locks.  And that looks
like a kludge.

Therefore, I'd like to propose another approach: introduce new lock
level.  So, AccessExclusiveLock will be split into two
AccessExclusiveLocalLock and AccessExclusiveLock.  In spite of
AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to
standby, and used for heap truncation.

I expect some resistance to my proposal, because fixing this "small
bug" doesn't deserve new lock level.  But current behavior of
hot_standby_feedback is buggy.  From user prospective,
hot_standby_feedback option is just non-worker, which causes master to
bloat without protection for standby queries from cancel.  We need to
fix that, but I don't see other proper way to do that except adding
new lock level...

> Patch2 - hsfeedback_noninvalide_xmin.patch
> When walsender is initialized, its xmin in PROCARRAY is set to
> GetOldestXmin() in order to prevent autovacuum running on master from
> truncating relation and removing some pages that are required by
> replica. This might happen if master's autovacuum and replica's query
> started simultaneously. And the replica has not yet reported its xmin
> value.

I don't yet understand what is the problem here and why this patch
should solve that.  As I get idea of this patch, GetOldestXmin() will
initialize xmin of walsender with lowest xmin of replication slot.
But xmin of replication slots will be anyway taken into account by
GetSnapshotData() called by vacuum.

> How to test:
> Make async replica, turn on feedback, reduce max_standby_streaming_delay
> and aggressive autovacuum.

You forgot to mention, that one should setup the replication using
replication slot.  Naturally, if replication slot isn't exist, then
master shouldn't keep dead tuples for disconnected standby.  Because
master doesn't know if standby will reconnect or is it gone forever.

> autovacuum = on
> autovacuum_max_workers = 1
> autovacuum_naptime = 1s
> autovacuum_vacuum_threshold = 1
> autovacuum_vacuum_cost_delay = 0
>
> Test:
> Here we will start replica and begi repeatable read transaction on
> table, then we stop replicas postmaster to prevent starting walreceiver
> worker (on master startup) and sending master it`s transaction xmin over
> hot_standby_feedback message.
> MASTERREPLICA
> start
> CREATE TABLE test AS (SELECT id, 1 AS value FROM
> generate_series(1,1000) id);
> stop
>  start

I guess you meant start of standby session here, not postmaster.
Otherwise, I don't understand how table test will reach standby.

>  BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
>  SELECT

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-02-28 Thread Ivan Kartyshov


Thank you for your valuable comments. I've made a few adjustments.

The main goal of my changes is to let long read-only transactions run on 
replica if hot_standby_feedback is turned on.



Patch1 - hsfeedback_av_truncate.patch is made to stop 
ResolveRecoveryConflictWithLock occurs on replica, after autovacuum lazy 
truncates heap on master cutting some pages at the end. When 
hot_standby_feedback is on, we know that the autovacuum does not remove 
anything superfluous, which could be needed on standby, so there is no 
need to rise any ResolveRecoveryConflict*.


1) Add to xl_standby_locks and xl_smgr_truncate isautovacuum flag, which 
tells us that autovacuum generates them.


2) When autovacuum decides to trim the table (using lazy_truncate_heap), 
it takes AccessExclusiveLock and sends this lock to the replica, but 
replica should ignore  AccessExclusiveLock if hot_standby_feedback=on.


3) When autovacuum truncate wal message is replayed on a replica, it 
takes ExclusiveLock on a table, so as not to interfere with read-only 
requests.


We have two cases of resolving ResolveRecoveryConflictWithLock if timers 
 (max_standby_streaming_delay and max_standby_archive_delay) have run 
out:
backend is idle in transaction (waiting input) - in this case backend 
will be sent SIGTERM
backend transaction is running query - in this case running transaction 
will be aborted


How to test:
Make async replica, turn on feedback and reduce 
max_standby_streaming_delay.

Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0

Test1:
Here we will do a load on the master and simulation of  a long 
transaction with repeated 1 second SEQSCANS on the replica (by  calling 
pg_sleep 1 second duration every 6 seconds).

MASTERREPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;
\watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or 
increase max_standby_streaming_delay.


Test2:
Here we will do a load on the master and simulation of  a long 
transaction on the replica (by  taking LOCK on table)

MASTERREPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) 
id);

pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
LOCK TABLE test IN ACCESS SHARE MODE;
select * from test;
\watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Test3:
Here we do a load on the master and simulation of  a long transaction 
with repeated 1 second SEQSCANS on the replica (by  calling pg_sleep 1 
second duration every 6 seconds).

MASTERREPLICA
hot_standby = on
max_standby_streaming_delay = 4s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 200 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

This way we can make transactions with SEQSCAN, INDEXSCAN or BITMAPSCAN


Patch2 - hsfeedback_noninvalide_xmin.patch
When walsender is initialized, its xmin in PROCARRAY is set to 
GetOldestXmin() in order to prevent autovacuum running on master from 
truncating relation and removing some pages that are required by 
replica. This might happen if master's autovacuum and replica's query 
started simultaneously. And the replica has not yet reported its xmin 
value.


How to test:
Make async replica, turn on feedback, reduce max_standby_streaming_delay 
and aggressive

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

26 matches

Site Navigation

Mail list logo

Footer information