Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2019-08-30 Thread Nikhil Sontakke
Hi,

> > This patch has been around for some time now, the last version fails to
> > apply cleanly and in-depth reviews have happened.  I am moving that to
> > the next CF, waiting on its author.
>
> Unfortunately, nothing was changed since then, so there is already some amount
> of unaddressed review feedback. I'll move this patch to "Returned with
> feedback".
>

Craig Ringer mentioned about this thread to me recently.

This effort has seen decent reviews from Craig, Andres and Michael
already. So, I thought of refreshing it to work against latest master
HEAD.

PFA, main patch as well as the test patch (I named the test patch v17
to be consistent with the main patch). The major grouse with the test
patch AFAICS was the use of non-Windows compliant timersub() function.
I have now used INSTR_TIME_SET_CURRENT/INSTR_TIME_SUBTRACT family of
portable macros for the same.

Please let me know on what we think of the above.

Regards,
Nikhil
-- 
Nikhil Sontakke
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


0002-libpq_batch_tests_community_master.v17.patch
Description: Binary data


0001-libpq_batch_support_commmunity_master.v17.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2019-01-04 Thread Nikhil Sontakke
Hi Arseny,

> I hadn't checked whether my concerns where addressed in the latest
> version though.
>

I'd like to believe that the latest patch set tries to address some
(if not all) of your concerns. Can you please take a look and let me
know?

Regards,
Nikhil

--
Nikhil Sontakke
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: [HACKERS] logical decoding of two-phase transactions

2019-01-04 Thread Nikhil Sontakke
Hi Tomas,

> Thanks for the updated patch - I've started working on a review, with
> the hope of getting it committed sometime in 2019-01. But the patch
> bit-rotted again a bit (probably due to d3c09b9b), which broke the last
> part. Can you post a fixed version?
>

PFA, updated patch set.

Regards,
Nikhil

> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Nikhil Sontakke
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.Jan4.patch
Description: Binary data


0002-Support-decoding-of-two-phase-transactions-at-PREPAR.Jan4.patch
Description: Binary data


0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.Jan4.patch
Description: Binary data


0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-11-29 Thread Nikhil Sontakke
Hi,


>
> PFA, latest patchset which implements the above.
>

The newly added test_decoding test was failing due to a slight
expected output mismatch. The attached patch-set corrects that.

Regards,
Nikhil

> Regards,
> Nikhil
> > regards
> >
> > --
> > Tomas Vondra  http://www.2ndQuadrant.com
> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
>  Nikhil Sontakke   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.Nov30.patch
Description: Binary data


0002-Support-decoding-of-two-phase-transactions-at-PREPAR.Nov30.patch
Description: Binary data


0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.Nov30.patch
Description: Binary data


0004-Teach-test_decoding-plugin-to-work-with-2PC.Nov30.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-11-29 Thread Nikhil Sontakke
Hi Tomas,

> Any progress on the issues discussed in the last couple of messages?
> That is:
>
> 1) removing of the sleep() from tests
>

Done. Now the test_decoding plugin takes a new option "check-xid". We
will pass the XID which is going to be aborted via this option. The
test_decoding plugin will wait for this XID to abort and exit when
that happens. This removes any arbitrary sleep dependencies.


> 2) changes to systable_getnext() wrt. TransactionIdIsInProgress()
>

Done.

> 3) adding asserts / checks to codepaths not going through systable_*
>

Done. All the heap_* get api calls now assert that they are not being
invoked with a valid
CheckXidAlive value.

> 4) (not) adding this as a per-plugin option
>
> 5) handling cases where the downstream does not have 2PC enabled
>
struct OutputPluginOptions now has an enable_twophase field which will
be set by the plugin at init time similar to the way output_type is
set to binary/text now.

> I guess it'd be good an updated patch or further discussion before
> continuing the review efforts.
>

PFA, latest patchset which implements the above.

Regards,
Nikhil
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patch
Description: Binary data


0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patch
Description: Binary data


0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patch
Description: Binary data


0004-Teach-test_decoding-plugin-to-work-with-2PC.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-08-07 Thread Nikhil Sontakke
er
>   must be stable.
>
> - filter_prepare_cb callback existence is checked in both decode.c and
>   in filter_prepare_cb_wrapper.
>
> +   /*
> +* The transaction may or may not exist (during restarts for example).
> +* Anyways, 2PC transactions do not contain any reorderbuffers. So 
> allow
> +* it to be created below.
> +*/
>
>
> Code around looks sane, but I think that ReorderBufferTXN for our xact
> must *not* exist at this moment: if we are going to COMMIT/ABORT
> PREPARED it, it must have been replayed and RBTXN purged immediately
> after. Also, instead of misty '2PC transactions do not contain any
> reorderbuffers' I would say something like 'create dummy
> ReorderBufferTXN to pass it to the callback'.
>
> - In DecodeAbort:
> +   /*
> +* If it's ROLLBACK PREPARED then handle it via callbacks.
> +*/
> +   if (TransactionIdIsValid(xid) &&
> +   !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) 
> &&
> +
>
> How xid can be invalid here?
>
>
> - It might be worthwile to put the check
> +   !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) 
> &&
> +   parsed->dbId == ctx->slot->data.database &&
> +   !FilterByOrigin(ctx, origin_id) &&
>
> which appears 3 times now into separate function.
>
>
> +* two-phase transactions - we either have to have all of them or 
> none.
> +* The filter_prepare callback is optional, but can only be defined 
> when
>
> Kind of controversial (all of them or none, but optional), might be
> formulated more accurately.
>
>
> +   /*
> +* Capabilities of the output plugin.
> +*/
> +   boolenable_twophase;
>
> I would rename this to 'supports_twophase' since this is not an option
> but a description of the plugin capabilities.
>
>
> +   /* filter_prepare is optional, but requires two-phase decoding */
> +   if ((ctx->callbacks.filter_prepare_cb != NULL) && 
> (!ctx->enable_twophase))
> +   ereport(ERROR,
> +   (errmsg("Output plugin does not support 
> two-phase decoding, but "
> +   "registered
> filter_prepared callback.")));
>
> Don't think we need to check that...
>
>
> +* Otherwise call either PREPARE (for twophase transactions) 
> or COMMIT
> +* (for regular ones).
> +*/
> +   if (rbtxn_rollback(txn))
> +   rb->abort(rb, txn, commit_lsn);
>
> This is the dead code since we don't have decoding of in-progress xacts
> yet.
>

Yes, the above check can be done away with it.


>
> Third patch:
> +/*
> + * An xid value pointing to a possibly ongoing or a prepared transaction.
> + * Currently used in logical decoding.  It's possible that such transactions
> + * can get aborted while the decoding is ongoing.
> + */
>
> I would explain here that this xid is checked for abort after each
> catalog scan, and sent for the details to SetupHistoricSnapshot.
>
>
> +   /*
> +* If CheckXidAlive is valid, then we check if it aborted. If it did, 
> we
> +* error out
> +*/
> +   if (TransactionIdIsValid(CheckXidAlive) &&
> +   !TransactionIdIsInProgress(CheckXidAlive) &&
> +   !TransactionIdDidCommit(CheckXidAlive))
> +   ereport(ERROR,
> +   (errcode(ERRCODE_TRANSACTION_ROLLBACK),
> +errmsg("transaction aborted during system 
> catalog scan")));
>
> Probably centralize checks in one function? As well as 'We don't expect
> direct calls to heap_fetch...' ones.
>
>
> P.S. Looks like you have torn the thread chain: In-Reply-To header of
> mail [1] is missing. Please don't do that.
>

That wasn't me. I was also annoyed and surprised to see a new email
thread separate from the earlier containing 100 or so messages.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-08-02 Thread Nikhil Sontakke
>> They can be, but currently they might not be. So this requires at least
>> big fat warning in docs and description on how to access user catalogs
>> from plugins correctly (ie to always use systable_* API on them). It
>> would be nice if we could check for it in Assert builds at least.
>

Ok, modified the sgml documentation for the above.

> Yea, I agree. I think we should just consider putting similar checks in
> the general scan APIs. With an unlikely() and the easy predictability of
> these checks, I think we should be fine, overhead-wise.
>

Ok, added unlikely() checks in the heap_* scan APIs.

Revised patchset attached.

Regards,
Nikhils

> Greetings,
>
> Andres Freund



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patch
Description: Binary data


0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patch
Description: Binary data


0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patch
Description: Binary data


0004-Teach-test_decoding-plugin-to-work-with-2PC.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-08-01 Thread Nikhil Sontakke
Hi,

PFA, latest patchset which incorporates the additional feedback.

>>> There's an additional test case in
>>> 0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
>>> uses a sleep in the "change" plugin API to allow a concurrent rollback
>>> on the 2PC being currently decoded. Andres generally doesn't like this
>>> approach :-), but there are no timing/interlocking issues now, and the
>>> sleep just helps us do a concurrent rollback, so it might be ok now,
>>> all things considered. Anyways, it's an additional patch for now.
>>
>> Yea, I still don't think it's ok. The tests won't be reliable.  There's
>> ways to make this reliable, e.g. by forcing a lock to be acquired that's
>> externally held or such. Might even be doable just with a weird custom
>> datatype.
>>
>
> Ok, I will look at ways to do away with the sleep.
>

The attached patchset implements a non-sleep based approached by
sending the 2PC XID to the pg_logical_slot_get_changes() function as
an option for the test_decoding plugin. So, an example invocation
will now look like:

SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');

The test_decoding pg_decode_change() API if it sees a valid xid
argument will wait for it to be aborted. Another backend can then come
in and merrily abort this ongoing 2PC in the background. Once it's
aborted, the pg_decode_change API will go ahead and will hit an ERROR
in the systable scan APIs. That should take care of Andres' concern
about using sleep in the tests. The relevant tap test has been added
to this patchset.

>>> @@ -423,6 +423,16 @@ systable_getnext(SysScanDesc sysscan)
>>>   else
>>>   htup = heap_getnext(sysscan->scan, ForwardScanDirection);
>>>
>>> + /*
>>> +  * If CheckXidAlive is valid, then we check if it aborted. If it did, 
>>> we
>>> +  * error out
>>> +  */
>>> + if (TransactionIdIsValid(CheckXidAlive) &&
>>> + TransactionIdDidAbort(CheckXidAlive))
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_TRANSACTION_ROLLBACK),
>>> +  errmsg("transaction aborted during system 
>>> catalog scan")));
>>> +
>>>   return htup;
>>>  }
>>
>> Don't we have to check TransactionIdIsInProgress() first? C.f. header
>> comments in tqual.c.  Note this is also not guaranteed to be correct
>> after a crash (where no clog entry will exist for an aborted xact), but
>> we probably shouldn't get here in that case - but better be safe.
>>
>> I suspect it'd be better reformulated as
>>   TransactionIdIsValid(CheckXidAlive) &&
>>   !TransactionIdIsInProgress(CheckXidAlive) &&
>>   !TransactionIdDidCommit(CheckXidAlive)
>>
>> What do you think?
>>

Modified the checks are per the above suggestion.

> I was wondering if anything else would be needed for user-defined
> catalog tables..
>

We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.


>
> Hmm, lemme see if we can do it outside of the plugin. But note that a
> plugin might want to decode some 2PC at prepare time and another are
> "commit prepared" time.
>

The test_decoding pg_decode_filter_prepare() API implements a simple
filter strategy now. If the GID contains a substring "nodecode", then
it filters out decoding of such a 2PC at prepare time. Have added
steps to test this in the relevant test case in this patch.

I believe this patchset handles all pending issues along with relevant
test cases. Comments, further feedback appreciated.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patch
Description: Binary data


0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patch
Description: Binary data


0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patch
Description: Binary data


0004-Teach-test_decoding-plugin-to-work-with-2PC.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-07-27 Thread Nikhil Sontakke
>> PFA, latest patchset, which completely removes the earlier
>> LogicalLock/LogicalUnLock implementation using groupDecode stuff and
>> uses the newly suggested approach of checking the currently decoded
>> XID for abort in systable_* API functions. Much simpler to code and
>> easier to test as well.
>
> So, leaving the fact that it might not actually be correct aside ;), you
> seem to be ok with the approach?
>

;-)

Yes, I do like the approach. Do you think there are other locations
other than systable_* APIs which might need such checks?


>> There's an additional test case in
>> 0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
>> uses a sleep in the "change" plugin API to allow a concurrent rollback
>> on the 2PC being currently decoded. Andres generally doesn't like this
>> approach :-), but there are no timing/interlocking issues now, and the
>> sleep just helps us do a concurrent rollback, so it might be ok now,
>> all things considered. Anyways, it's an additional patch for now.
>
> Yea, I still don't think it's ok. The tests won't be reliable.  There's
> ways to make this reliable, e.g. by forcing a lock to be acquired that's
> externally held or such. Might even be doable just with a weird custom
> datatype.
>

Ok, I will look at ways to do away with the sleep.


>> diff --git a/src/backend/access/index/genam.c 
>> b/src/backend/access/index/genam.c
>> index 9d08775687..67c5810bf7 100644
>> --- a/src/backend/access/index/genam.c
>> +++ b/src/backend/access/index/genam.c
>> @@ -423,6 +423,16 @@ systable_getnext(SysScanDesc sysscan)
>>   else
>>   htup = heap_getnext(sysscan->scan, ForwardScanDirection);
>>
>> + /*
>> +  * If CheckXidAlive is valid, then we check if it aborted. If it did, 
>> we
>> +  * error out
>> +  */
>> + if (TransactionIdIsValid(CheckXidAlive) &&
>> + TransactionIdDidAbort(CheckXidAlive))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_TRANSACTION_ROLLBACK),
>> +  errmsg("transaction aborted during system 
>> catalog scan")));
>> +
>>   return htup;
>>  }
>
> Don't we have to check TransactionIdIsInProgress() first? C.f. header
> comments in tqual.c.  Note this is also not guaranteed to be correct
> after a crash (where no clog entry will exist for an aborted xact), but
> we probably shouldn't get here in that case - but better be safe.
>
> I suspect it'd be better reformulated as
>   TransactionIdIsValid(CheckXidAlive) &&
>   !TransactionIdIsInProgress(CheckXidAlive) &&
>   !TransactionIdDidCommit(CheckXidAlive)
>
> What do you think?
>

tqual.c does seem to mention this for a non-MVCC snapshot, so might as
well do it this ways. The caching of fetched XID should not make these
checks too expensive anyways.

>
> I think it'd also be good to add assertions to codepaths not going
> through systable_* asserting that
> !TransactionIdIsValid(CheckXidAlive). Alternatively we could add an
> if (unlikely(TransactionIdIsValid(CheckXidAlive)) && ...)
> branch to those too.
>

I was wondering if anything else would be needed for user-defined
catalog tables..

>
>
>> From 80fc576bda483798919653991bef6dc198625d90 Mon Sep 17 00:00:00 2001
>> From: Nikhil Sontakke 
>> Date: Wed, 13 Jun 2018 16:31:15 +0530
>> Subject: [PATCH 4/5] Teach test_decoding plugin to work with 2PC
>>
>> Includes a new option "enable_twophase". Depending on this options
>> value, PREPARE TRANSACTION will either be decoded or treated as
>> a single phase commit later.
>
> FWIW, I don't think I'm ok with doing this on a per-plugin-option basis.
> I think this is something that should be known to the outside of the
> plugin. More similar to how binary / non-binary support works.  Should
> also be able to inquire the output plugin whether it's supported (cf
> previous similarity).
>

Hmm, lemme see if we can do it outside of the plugin. But note that a
plugin might want to decode some 2PC at prepare time and another are
"commit prepared" time.

We also need to take care to not break logical replication if the
other node is running non-2PC enabled code. We tried to optimize the
COMMIT/ABORT handling by adding sub flags to the existing protocol. I
will test that as well.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-07-26 Thread Nikhil Sontakke
Hi,

>> I think we even can just do something like a global
>> TransactionId check_if_transaction_is_alive = InvalidTransactionId;
>> and just set it up during decoding. And then just check it whenever it's
>> not set tot InvalidTransactionId.
>>
>>
>
> Ok. I will work on something along these lines and re-submit the set of 
> patches.
>
PFA, latest patchset, which completely removes the earlier
LogicalLock/LogicalUnLock implementation using groupDecode stuff and
uses the newly suggested approach of checking the currently decoded
XID for abort in systable_* API functions. Much simpler to code and
easier to test as well.

Out of the patchset, the specific patch which focuses on the above
systable_* API based XID checking implementation is part of
0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patch. So,
it might help to take a look at this patch first for any additional
feedback on this approach.

There's an additional test case in
0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
uses a sleep in the "change" plugin API to allow a concurrent rollback
on the 2PC being currently decoded. Andres generally doesn't like this
approach :-), but there are no timing/interlocking issues now, and the
sleep just helps us do a concurrent rollback, so it might be ok now,
all things considered. Anyways, it's an additional patch for now.

Comments, feedback appreciated.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patch
Description: Binary data


0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patch
Description: Binary data


0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patch
Description: Binary data


0004-Teach-test_decoding-plugin-to-work-with-2PC.patch
Description: Binary data


0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-07-23 Thread Nikhil Sontakke
Hi Andres,

>> We can find out if the snapshot is a logical decoding one by virtue of
>> its "satisfies" function pointing to HeapTupleSatisfiesHistoricMVCC.
> 
> I think we even can just do something like a global
> TransactionId check_if_transaction_is_alive = InvalidTransactionId;
> and just set it up during decoding. And then just check it whenever it's
> not set tot InvalidTransactionId.
> 
> 

Ok. I will work on something along these lines and re-submit the set of 
patches. 

Regards, 
Nikhils


Re: [HACKERS] logical decoding of two-phase transactions

2018-07-23 Thread Nikhil Sontakke
Hi Andres,

>> > what I'm proposing is that that various catalog access functions throw a
>> > new class of error, something like "decoding aborted transactions".
>>
>> When will this error be thrown by the catalog functions? How will it
>> determine that it needs to throw this error?
>
> The error check would have to happen at the end of most systable_*
> functions. They'd simply do something like
>
> if (decoding_in_progress_xact && TransactionIdDidAbort(xid_of_aborted))
>ereport(ERROR, (errcode(DECODING_ABORTED_XACT), errmsg("oops")));
>
> i.e. check whether the transaction to be decoded still is in
> progress. As that would happen before any potentially wrong result can
> be returned (as the check happens at the tail end of systable_*),
> there's no issue with wrong state in the syscache etc.
>

Oh, ok. The systable_* functions use the passed in snapshot and return
tuples matching to it. They do not typically have access to the
current XID being worked upon..

We can find out if the snapshot is a logical decoding one by virtue of
its "satisfies" function pointing to HeapTupleSatisfiesHistoricMVCC.

>
>> The catalog scan will NOT error out but will return metadata which
>> causes the insert-decoding change apply callback to error out.
>
> Why would it not throw an error?
>

In your scheme, it will throw an error, indeed. We'd need to make the
"being-currently-decoded-XID" visible to these systable_* functions
and then this scheme will work.

Regards,
Nikhils

> Greetings,
>
> Andres Freund



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-07-20 Thread Nikhil Sontakke
Hi Andres,


> So what if we, at the begin / end of cache miss handling, re-check if
> the to-be-decoded transaction is still in-progress (or has
> committed). And we throw an error if that happened. That error is then
> caught in reorderbuffer, the in-progress-xact aborted callback is
> called, and processing continues (there's a couple nontrivial details
> here, but it should be doable).
>
> The biggest issue is what constitutes a "cache miss". It's fairly
> trivial to do this for syscache / relcache, but that's not sufficient:
> there's plenty cases where catalogs are accessed without going through
> either. But as far as I can tell if we declared that all historic
> accesses have to go through systable_beginscan* - which'd imo not be a
> crazy restriction - we could put the checks at that layer.
>

Documenting that historic accesses go through systable_* APIs does
seem reasonable. In our earlier discussions, we felt asking plugin
writers to do anything along these lines was too onerous and
cumbersome to expect.

> That'd require that an index lookup can't crash if the corresponding
> heap entry doesn't exist (etc), but that's something we need to handle
> anyway.  The issue that multiple separate catalog lookups need to be
> coherent (say Robert's pg_class exists, but pg_attribute doesn't
> example) is solved by virtue of the the pg_attribute lookups failing if
> the transaction aborted.
>
> Am I missing something here?
>

Are you suggesting we have a:

PG_TRY()
{
Catalog_Access();
}
PG_CATCH()
{
Abort_Handling();
}

here?

Regards,
Nikhils



Re: [HACKERS] logical decoding of two-phase transactions

2018-07-19 Thread Nikhil Sontakke
 little large issue :-)

> To try to be a little clearer about my overall position, I am
> suggesting that you (1) abandon the current approach and (2) make sure
> that everything is done by making sufficient preparations in advance
> of any abort rather than trying to cope after it's already started.  I
> am also suggesting that, to get there, it might be helpful to (a)
> contemplate communication and active cooperation between the running
> process and the decoding process(es), but it might turn out not to be
> needed and I don't know exactly what needs to be communicated, (b)
> consider whether it there's a reasonable way to make it look to other
> parts of the system like the aborted transaction is still running, but
> this also might turn out not to be the right approach, (c) consider
> whether logical decoding already does or can be made to use historical
> catalog snapshots that only see command IDs prior to the current one
> so that incompletely-made changes by the last CID aren't seen if an
> abort happens.  I think there is a good chance that a full solution
> involves more than one of these things, and maybe some other things I
> haven't thought about.  These are ideas, not a plan.
>

I will think more on the above lines and see if we can get something workable..

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

2018-06-17 Thread Nikhil Sontakke
Hi Alvaro,


>> There was a slight oversight in the twophase_gid length calculation in
>> the XactLogCommitRecord() code path in the cf5a1890592 commit. The
>> corresponding XactLogAbortRecord() code path was ok. PFA, a small
>> patch to fix the oversight.
>
> Forgot to add: maybe it would be useful to have tests in core where
> these omissions become evident.  Do you have some?
>
Thanks for the commit.

I do have some tests. They are part of the "logical decoding of 2PC"
patch which adds the needed infrastructure to *actually* use this code
present in the core as of now. I am going to submit it in the upcoming
commitfest.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

2018-06-14 Thread Nikhil Sontakke
Hi Heikki,


>
> Pushed, thanks for the review!
>

There was a slight oversight in the twophase_gid length calculation in
the XactLogCommitRecord() code path in the cf5a1890592 commit. The
corresponding XactLogAbortRecord() code path was ok. PFA, a small
patch to fix the oversight.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


gid_length.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2018-04-10 Thread Nikhil Sontakke
>> Anyways, will now wait for the next commitfest/opportunity to try to
>> get this in.
>
> It looks like this patch should be in the Needs Review state so I have
> done that and moved it to the next CF.
>
Thanks David,

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-04-09 Thread Nikhil Sontakke
> I object.  And I'm negatively surprised that this is even considered.
>

I am also a bit surprised..

> This is a complicated patch that has been heavily reworked in the last
> few days to, among other things, address objections that have first been
> made months ago ([1]). There we nontrivial bugs less than a day ago. It
> has not received a lot of reviews since these changes. This isn't an
> area you've previously been involved in to a significant degree.
>

I thought all the points that you had raised in [1] had been met with
satisfactorily. Let me know if that's not the case. The last few days,
the focus was on making the decodegroup locking implementation a bit
more robust.

Anyways, will now wait for the next commitfest/opportunity to try to
get this in.

>
> [1] 
> http://archives.postgresql.org/message-id/20180209211025.d7jxh43fhqnevhji%40alap3.anarazel.de

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-04-05 Thread Nikhil Sontakke
Hi Tomas,

> 1) There's a race condition in LogicalLockTransaction. The code does
> roughly this:
>
>   if (!BecomeDecodeGroupMember(...))
>  ... bail out ...
>
>   Assert(MyProc->decodeGroupLeader);
>   lwlock = LockHashPartitionLockByProc(MyProc->decodeGroupLeader);
>   ...
>
> but AFAICS there is no guarantee that the transaction does not commit
> (or even abort) right after the become decode group member. In which
> case LogicalDecodeRemoveTransaction might have already reset our pointer
> to a leader to NULL. In which case the Assert() and lock will fail.
>
> I've initially thought this can be fixed by setting decodeLocked=true in
> BecomeDecodeGroupMember, but that's not really true - that would fix the
> race for aborts, but not commits. LogicalDecodeRemoveTransaction skips
> the wait for commits entirely, and just resets the flags anyway.
>
> So this needs a different fix, I think. BecomeDecodeGroupMember also
> needs the leader PGPROC pointer, but it does not have the issue because
> it gets it as a parameter. I think the same thing would work for here
> too - that is, use the AssignDecodeGroupLeader() result instead.
>

That's a good catch. One of the earlier patches had a check for this
(it also had an ill-placed assert above though) which we removed as
part of the ongoing review.

Instead of doing the above, we can just re-check if the
decodeGroupLeader pointer has become NULL and if so, re-assert that
the leader has indeed gone away before returning false. I propose a
diff like below.

/*

 * If we were able to add ourself, then Abort processing will

-* interlock with us.

+* interlock with us. If the leader was done in the meanwhile

+* it could have removed us and gone away as well.

 */

-   Assert(MyProc->decodeGroupLeader);

+   if (MyProc->decodeGroupLeader == NULL)

+   {

+   Assert(BackendXidGetProc(txn->xid) == NULL);

+   return false

+   }



>
> 2) BecomeDecodeGroupMember sets the decodeGroupLeader=NULL when the
> leader does not match the parameters, despite enforcing it by Assert()
> at the beginning. Let's remove that assignment.
>

Ok, done.

>
> 3) I don't quite understand why BecomeDecodeGroupMember does the
> cross-check using PID. In which case would it help?
>

When I wrote this support, I had written it with the intention of
supporting both 2PC (in which case pid is 0) and in-progress regular
transactions. That's why the presence of PID in these functions. The
current use case is just for 2PC, so we could remove it.

>
> 4) AssignDecodeGroupLeader still sets pid, which is never read. Remove.
>

Ok, will do.

>
> 5) ReorderBufferCommitInternal does elog(LOG) about interrupting the
> decoding of aborted transaction only in one place. There are about three
> other places where we check LogicalLockTransaction. Seems inconsistent.
>

Note that I have added it for the OPTIONAL test_decoding test cases
(which AFAIK we don't plan to commit in that state) which demonstrate
concurrent rollback interlocking with the lock/unlock APIs. The first
ELOG was enough to catch the interaction. If we think these elogs
should be present in the code, then yes, I can add it elsewhere as
well as part of an earlier patch.

>
> 6) The comment before LogicalLockTransaction is somewhat inaccurate,
> because it talks about adding/removing the backend to the group, but
> that's not what's happening. We join the group on the first call and
> then we only tweak the decodeLocked flag.
>

True.

>
> 7) I propose minor changes to a couple of comments.
>

Ok, I am looking at your provided patch and incorporating relevant
changes from it. WIll submit a patch set soon.

Regards,
Nikhils

-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-04-03 Thread Nikhil Sontakke
Hi Tomas,

>> Unfortunately, this does segfault for me in `make check` almost
immediately. Try

This is due to the new ERROR handling code that I added today for the
lock/unlock APIs. Will fix.

>> Also, current value for LOGICALREP_IS_COMMIT is 1, but previous code expected
flags to be zero, so this way logical replication between postgres-10 and
postgres-with-2pc-decoding will be broken.

Good point. Will also test pg-10 to pg-11 logical replication to
ensure that there are no issues.

>> So I think we need a subscription parameter to enable/disable this,
defaulting to 'disabled'.

Ok, will add it to the "CREATE SUBSCRIPTION", btw, we should have
allowed storing options in an array form for a subscription. We might
add more options in the future and adding fields one by one doesn't
seem that extensible.


> 1) twophase.c
> -
>
> I think this comment is slightly inaccurate:
>
>  /*
>  * Coordinate with logical decoding backends that may be already
>  * decoding this prepared transaction. When aborting a transaction,
>  * we need to wait for all of them to leave the decoding group. If
>  * committing, we simply remove all members from the group.
>  */
>
> Strictly speaking, we're not waiting for the workers to leave the
> decoding group, but to set decodeLocked=false. That is, we may proceed
> when there still are members, but they must be in unlocked state.
>

Agreed. Will modify it to mention that it will wait only if some of
the backends are in locked state.

>
> 2) reorderbuffer.c
> --
>
> I've already said it before, I find the "flags" bitmask and rbtxn_*
> macros way less readable than individual boolean flags. It was claimed
> this was done on Andres' request, but I don't see that in the thread. I
> admit it's rather subjective, though.
>

Yeah, this is a little subjective.

> I see ReorederBuffer only does the lock/unlock around apply_change and
> RelationIdGetRelation. That seems insufficient - RelidByRelfilenode can
> do heap_open on pg_class, for example. And I guess we need to protect
> rb->message too, because who knows what the plugin does in the callback?
>
> Also, we should not allocate gid[GIDSIZE] for every transaction. For
> example subxacts never need it, and it seems rather wasteful to allocate
> 200B when the rest of the struct has only ~100B. This is particularly
> problematic considering ReorderBufferTXN is not spilled to disk when
> reaching the memory limit. It needs to be allocated ad-hoc only when
> actually needed.
>

OK, will look at allocating GID only when needed.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-04-02 Thread Nikhil Sontakke



> 
> Quick thought: Should be simple to release lock when interacting with network.

I don’t think this will be that simple. The network calls will typically happen 
from inside the plugins and we don’t want to make plugin authors responsible 
for that. 

> Could also have abort signal lockers.

With the decodegroup locking we do have access to all the decoding backend 
pids. So we could signal them. But am not sure signaling will work if the 
plugin is in the midst of a network
Call. 

I agree with Petr. With this decodegroup
Lock implementation we have an inexpensive but workable implementation for 
locking around the plugin call. Sure, the abort will be penalized but it’s 
bounded by the Wal sender timeout or a max of one change apply cycle.
As he mentioned if we can optimize this later we can do so without changing 
plugin coding semantics later. 

Regards,
Nikhils

> 
> Andres
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Feature Request - DDL deployment with logical replication

2018-03-31 Thread Nikhil Sontakke
Hi Jeremy,

> My whole point is that in most architectures, DBAs decide to deploy the same
> SQL on providers and subscribers.  Yes it isn't perfect, but IMO, it is very
> helpful to try to automate that idea, as opposed to trying to actually
> replicate DDL at the low level.  The latter is better, yes, but seems to
> have proven extremely difficult.  Hence, why you see the advent of functions
> to pipe DDL through the replication stream.
>

The community is currently working on in the current commitfest to try
and get logical decoding of 2PC in into the core.

Once something like that gets in, for a majority of subset of DDLs
(which works inside transaction blocks), one of the use cases of that
functionality could be to trap these DDLs and convert them into
implicit 2PC commands. Details need to be worked out, but we would get
all the logical replication cluster nodes in sync with each other and
issue a PREPARE transaction involving this DDL on all nodes in the
logical replication cluster. If any of the nodes is not able to
successfully prepare this DDL, then we can rollback or else commit the
2PC, thus moving the entire logical cluster consistently in terms of
schema changes.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-03-30 Thread Nikhil Sontakke
Hi Petr, Andres and Tomas


>>> Thanks. I think the README is a good start, but I think we also need to
>>> improve the comments, which is usually more detailed than the README.
>>> For example, it's not quite acceptable that LogicalLockTransaction and
>>> LogicalUnlockTransaction have about no comments, especially when it's
>>> meant to be public API for decoding plugins.
>>

Tomas, thanks for providing your review comments based patch. I will include the
documentation that you have provided in that patch for the APIs. Will
also look at
your decodeGroupLocking related comments and submit a fresh patch soon.

>> FWIW, for me that's ground to not accept the feature. Burdening output
>> plugins with this will make their development painful (because they'll
>> have to adapt regularly) and correctness doubful (there's nothing
>> checking for the lock being skipped).  Another way needs to be found.
>>
>
> I have to agree with Andres here.
>

Ok. Let's have another go at alleviating this issue then.

> I as wondering how to hide this. Best idea I had so far would be to put
> it in heap_beginscan (and index_beginscan given that catalog scans use
> is as well) behind some condition. That would also improve performance
> because locking would not need to happen for syscache hits. The problem
> is however how to inform the heap_beginscan about the fact that we are
> in 2PC decoding. We definitely don't want to change all the scan apis
> for this. I wonder if we could add some kind of property to Snapshot
> which would indicate this fact - logical decoding is using it's own
> snapshots it could inject the information about being inside the 2PC
> decoding.
>

The idea of adding that info in the Snapshot itself is interesting. We
could introduce a logicalxid field in SnapshotData to point to the XID
that the decoding backend is interested in. This could be added only
for the 2PC case. Support in the future for in-progress transactions
could use this field as well. If it's a valid XID, we could call
LogicalLockTransaction/LogicalUnlockTransaction on that XID from
heap_beginscan/head_endscan respectively. I can also look at what
other *_beginscan APIs would need this as well.

Regards,
Nikhils



Re: [HACKERS] logical decoding of two-phase transactions

2018-03-01 Thread Nikhil Sontakke
Hi Andres and Craig,

>> In the future you should number them. Right now they appear to be out of
>> order in your email.  I suggest using git format-patch, that does all
>> the necessary work for you.
>>
> Yep, specially git format-patch with a -v argument, so the whole patchset is
> visibly versioned and sorts in the correct order.
>

I did try to use *_Number.patch to convey the sequence, but admittedly
it's pretty lame.

I will re-submit with "git format-patch" soon.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-02-12 Thread Nikhil Sontakke
Hi Andres,

> First off: This patch has way too many different types of changes as
> part of one huge commit. This needs to be split into several
> pieces. First the cleanups (e.g. the fields -> flag changes), then the
> individual infrastructure pieces (like the twophase.c changes, best
> split into several pieces as well, the locking stuff), then the main
> feature, then support for it in the output plugin.  Each should have an
> individual explanation about why the change is necessary and not a bad
> idea.
>

Ok, I will break this patch into multiple logical pieces and re-submit.

>
> On 2018-02-06 17:50:40 +0530, Nikhil Sontakke wrote:
>> @@ -46,6 +48,9 @@ typedef struct
>>   boolskip_empty_xacts;
>>   boolxact_wrote_changes;
>>   boolonly_local;
>> + booltwophase_decoding;
>> + booltwophase_decode_with_catalog_changes;
>> + int decode_delay; /* seconds to sleep after every 
>> change record */
>
> This seems too big a crock to add just for testing. It'll also make the
> testing timing dependent...
>

The idea *was* to make testing timing dependent. We wanted to simulate
the case when a rollback is issued by another backend while the
decoding is still ongoing. This allows that test case to be tested.


>>  } TestDecodingData;
>
>>  void
>>  _PG_init(void)
>> @@ -85,9 +106,15 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
>>   cb->begin_cb = pg_decode_begin_txn;
>>   cb->change_cb = pg_decode_change;
>>   cb->commit_cb = pg_decode_commit_txn;
>> + cb->abort_cb = pg_decode_abort_txn;
>
>>   cb->filter_by_origin_cb = pg_decode_filter;
>>   cb->shutdown_cb = pg_decode_shutdown;
>>   cb->message_cb = pg_decode_message;
>> + cb->filter_prepare_cb = pg_filter_prepare;
>> + cb->filter_decode_txn_cb = pg_filter_decode_txn;
>> + cb->prepare_cb = pg_decode_prepare_txn;
>> + cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
>> + cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
>>  }
>
> Why does this introduce both abort_cb and abort_prepared_cb? That seems
> to conflate two separate features.
>

Consider the case when we have a bunch of change records to apply for
a transaction. We sent a "BEGIN" and then start decoding each change
record one by one. Now a rollback was encountered while we were
decoding. In that case it doesn't make sense to keep on decoding and
sending the change records. We immediately send a regular ABORT. We
cannot send "ROLLBACK PREPARED" because the transaction was not
prepared on the subscriber and have to send a regular ABORT instead.
And we need the "ROLLBACK PREPARED" callback for the case when a
prepared transaction gets rolled back and is encountered during the
usual WAL processing.

Please take a look at "contrib/test_decoding/t/001_twophase.pl" where
this test case is enacted.

>
>> +/* Filter out unnecessary two-phase transactions */
>> +static bool
>> +pg_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>> + TransactionId xid, const char *gid)
>> +{
>> + TestDecodingData *data = ctx->output_plugin_private;
>> +
>> + /* treat all transactions as one-phase */
>> + if (!data->twophase_decoding)
>> + return true;
>> +
>> + if (txn && txn_has_catalog_changes(txn) &&
>> + !data->twophase_decode_with_catalog_changes)
>> + return true;
>
> What? I'm INCREDIBLY doubtful this is a sane thing to expose to output
> plugins. As in, unless I hear a very very convincing reason I'm strongly
> opposed.
>

These bools are specific to the test_decoding plugin.

Again, these are useful in testing decoding in various scenarios with
twophase decoding enabled/disabled. Testing decoding when catalog
changes are allowed/disallowed etc. Please take a look at
"contrib/test_decoding/sql/prepared.sql" for the various scenarios.


>
>> +/*
>> + * Check if we should continue to decode this transaction.
>> + *
>> + * If it has aborted in the meanwhile, then there's no sense
>> + * in decoding and sending the rest of the changes, we might
>> + * as well ask the subscribers to abort immediately.
>> + *
>> + * This should be called if we are streaming a transaction
>> + * before it's committed or if we are decoding a 2PC
>> + * transaction. Otherwise we always decode committed
>> + * transactions
>> + *
>> + * Additional checks can be

Re: [HACKERS] logical decoding of two-phase transactions

2018-02-09 Thread Nikhil Sontakke
Hi Stas,


> Reading through patch I’ve noticed that you deleted call to 
> SnapBuildCommitTxn()
> in DecodePrepare(). As you correctly spotted upthread there was unnecessary
> code that marked transaction as running after decoding of prepare. However 
> call
> marking it as committed before decoding of prepare IMHO is still needed as
> SnapBuildCommitTxn does some useful thing like setting base snapshot for 
> parent
> transactions which were skipped because of SnapBuildXactNeedsSkip().
>
> E.g. current code will crash in assert for following transaction:
>
> BEGIN;
> SAVEPOINT one;
> CREATE TABLE test_prepared_savepoints (a int);
> PREPARE TRANSACTION 'x';
> COMMIT PREPARED 'x';
> :get_with2pc_nofilter
> :get_with2pc_nofilter  <- second call will crash decoder
>

Thanks for taking a look!

The first ":get_with2pc_nofilter" call consumes the data appropriately.

The second ":get_with2pc_nofilter" sees that it has to skip and hence
enters the ReorderBufferForget() function in the skip code path
causing the assert. If we have to skip anyways why do we need to setup
SnapBuildCommitTxn() for such a transaction is my query? I don't see
the need for doing that for skipped transactions..

Will continue to look at this and will add this scenario to the test
cases. Further comments/feedback appreciated.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-28 Thread Nikhil Sontakke
Hi,

> Having this as responsibility of plugin sounds interesting. It certainly
> narrows the scope for which we need to solve the abort issue. For 2PC
> that may be okay as we need to somehow interact with transaction manager
> as Simon noted. I am not sure if this helps streaming use-case though as
> there is not going to be any external transaction management involved there.
>
> In any case all this interlocking could potentially be made less
> impact-full by only doing it when we know the transaction did catalog
> changes prior to currently decoded change (which we do during decoding)
> since that's the only time we are interested in if it aborted or not.
>
> This all leads me to another idea. What if logical decoding provided API
> for "locking/unlocking" the currently decoded transaction against abort.
> This function would then be called by both decoding and output plugin
> before any catalog read. The function can be smart enough to be NOOP if
> the transaction is not running (ie we are not doing 2PC decoding or
> streaming) or when the transaction didn't do any catalog modifications
> (we already have that info easily accessible as bool).
>
> That would mean we'd never do any kind of heavy locking for prolonged
> periods of time (ie network calls) but only during catalog access and
> only when needed. It would also solve this for both 2PC and streaming
> and it would be easy to use by plugin authors. Just document that some
> call should be done before catalog access when in output plugin, can
> even be Asserted that the call was done probably.
>
> Thoughts?
>

Yeah, this might work. We already have SET_LOCKTAG_TRANSACTION() via
which every transaction takes an exclusive lock on its own
transactionid when it starts, for example. We ideally want a single
solution to handle 2PC and ongoing (streaming) transactions. We could
introduce a new SET_LOCKTAG_LOGICALTRANSACTION(). The logical decoding
process could take a SHARED lock on this, check if the XID is still ok
to decode, read the catalog and unlock. Abort/Commit transaction
processing could take this in EXCLUSIVE mode.

As mentioned above, the plugin API which takes this lock will be smart
enough to be a NOOP if the transaction is not running (i.e we are not
doing 2PC decoding or streaming) or when the transaction didn't do any
catalog modifications.

Thoughts?

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-23 Thread Nikhil Sontakke
> I must be missing something in this discussion, cos I don't see any
> problems with this "other option".
>
> Surely we prepare the 2PCxact and then it is persistent. Yes,
> potentially for an unbounded period of time. And it is (usually) up to
> the XA manager to resolve that. 2PC interacts with transaction
> management and yes, it can be slow. But the choice is slow and
> consistent, or not. This would only be used with the full choice of
> the user, just like synchronous_commit.
>
> In this case, we call the decoding plugin's precommit hook which would
> then prepare the 2PCxact and set a non-persistent flag saying it is
> being decoded. If decoding completes normally we release the lock and
> commit. If decoding fails or the DBA has another reason to do so, we
> provide a function that allows the flag to be unlocked. While it is
> locked the 2PCxact cannot be aborted or committed.
>
> There is no danger of accidental abort because the prepare has persistent 
> state.

This concurrent abort handling while decoding is ongoing is turning
out to be complex affair.

Thinking more about this, just to provide an example, we have a
decoding plugin hook to determine if a GID for a 2PC was decoded at
PREPARE time or COMMIT time as part of the 2PC logical decoding patch.
We need that to determine the *same* static answer every time we see a
specific GID while decoding across restarts; the plugin should know
what it had done the last time around and should tell us the same
later as well. It just occurred to me that as Simon also mentioned, it
could/should also be the decoding plugin's responsibility to indicate
if it's ok to go ahead with the abort of the transaction.

So, we could consider adding a preAbort hook. That preAbort hook gets
the GID, XID and other parameters as needed and tells us whether we
can go ahead with the abort or if we need to wait out (maybe we pass
in an ok_to_wait param as well). As an example, a plugin could lookup
some shmem structure which points to the current transaction being
decoded and does related processing to ensure that it stops decoding
at a clean juncture, thus keeping the response time bounded to a
maximum of one change record apply cycle. That passes the onus onto
the plugin writers and keeps the core code around this concurrent
abort handling clean.

I will cook up something along the above lines unless there are any
objections to this approach.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2017-12-26 Thread Nikhil Sontakke
Hi,

As of today, in the existing code in HeapTupleSatisfiesVacuum, if a
row is inserted via an aborted transaction, then it's deemed to be
dead immediately and can be cleaned up for re-use by HOT or vacuum,
etc.

With logical decoding, there might arise a case that such a row, if it
belongs to a system catalog table or even a user catalog table
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=66abc2608c7c00fcd449e00a9e23f13f02e65d04),
then it might be being used for decoding at the very same moment that
the abort came in. If the row is re-claimed immediately, then it's
possible that the decoding happening alongside might face issues.

The main issue here is that HeapTupleSatisfiesVacuum *assumes* that
rows belonging to an aborted transaction are not visible to anyone
else. As mentioned above, that assumption needs to change in the case
when logical decoding is active and the row belongs to catalog tables.
This was not a problem before, but now there are at least two patches
out there which challenge these assumptions. One patch is for logical
decoding of 2PC, in which case a PREPARED transaction might be getting
decoded and a concurrent ROLLBACK PREPARED can happen alongside.
Another patch is about streaming logical changes before commit/abort.

The provided patch changes that assumption. We now pass in an
additional argument to HeapTupleSatisfiesVacuum() to indicate if the
current row belongs to a catalog table in the logical decoding
environment. We call the existing
RelationIsAccessibleInLogicalDecoding() function to ascertain if it's
a catalog table worth considering for this case. So if this is the
case, we return HEAPTUPLE_RECENTLY_DEAD if the xmin of the row is
newer than OldestXmin. So, for the following case, we now will return
HEAPTUPLE_RECENTLY_DEAD when earlier we would have returned
HEAPTUPLE_DEAD:

BEGIN;
ALTER subscribed_table ADD COLUMN;

ABORT;
VACUUM pg_attribute;

This is essentially the same logic that is used today for rows that
got deleted (via update or delete). We return HEAPTUPLE_RECENTLY_DEAD
for such deleted rows if the xmax of the row is newer than OldestXmin
because some open transactions can/could still see the tuple.

With changes to HeapTupleSatisfiesVacuum, we also had to tweak the
logic in heap_prune_chain(). That function again assumes that only
Updated tuples can return HEAPTUPLE_RECENTLY_DEAD. That assumption is
not true and we check explicitly for the HeapTupleHeaderGetUpdateXid
value to be valid before further processing.

The patch had to make changes in all locations where
HeapTupleSatisfiesVacuum gets called and I have done multiple "make
check-world" runs with cassert enabled and things run fine. Also,
since it only sets the flag for system/user catalog tables and that
too ONLY in the logical decoding environment, it does not cause any
performance issues in normal circumstances.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


HeapTupleSatisfiesVacuum_Logical_Decoding_26_12_17.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2017-12-19 Thread Nikhil Sontakke
>> Note that this patch does not contain the HeapTupleSatisfiesVacuum
>> changes. I believe we need changes to HeapTupleSatisfiesVacuum given
>> than logical decoding changes the assumption that catalog tuples
>> belonging to a transaction which never committed can be reclaimed
>> immediately. With 2PC logical decoding or streaming logical decoding,
>> we can always have a split time window in which the ongoing decode
>> cycle needs those tuples. The solution is that even for aborted
>> transactions, we do not return HEAPTUPLE_DEAD if the transaction id is
>> newer than the OldestXmin (same logic we use for deleted tuples of
>> committed transactions). We can do this only for catalog table rows
>> (both system and user defined) to limit the scope of impact. In any
>> case, this needs to be a separate patch along with a separate
>> discussion thread.
>
> Are you working on that as well?

Sure, I was planning to work on that after getting the documentation
for this patch out of the way.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2017-12-12 Thread Nikhil Sontakke
Hi,

Thanks for the warning fix, I will also look at the cassert case soon.

I have been adding more test cases to this patch. I added a TAP test
which now allows us to do a concurrent ROLLBACK PREPARED when the
walsender is in the midst of decoding this very prepared transaction.

Have added  a "decode-delay" parameter to test_decoding via which each
apply call sleeps for a few configurable number of seconds allowing us
to have deterministic rollback in parallel. This logic seems to work
ok.

However, I am battling an issue with invalidations now. Consider the
below test case:

CREATE TABLE test_prepared1(id integer primary key);
-- test prepared xact containing ddl
BEGIN; INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
COMMIT PREPARED 'test_prepared#3';
SELECT data FROM pg_logical_slot_get_changes(..) <-- this shows the
2PC being decoded appropriately
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
SELECT data FROM pg_logical_slot_get_changes(..)

The last pg_logical_slot_get_changes call, shows:

table public.test_prepared1: INSERT: id[integer]:8

whereas since the 2PC committed, it should have shown:

table public.test_prepared1: INSERT: id[integer]:8 data[text]:null

This is an issue because of the way we are handling invalidations. We
don't allow ReorderBufferAddInvalidations() at COMMIT PREPARE time
since we assume that handling them at PREPARE time is enough.
Apparently, it's not enough. Am trying to allow invalidations at
COMMIT PREPARE time as well, but maybe calling
ReorderBufferAddInvalidations() blindly again is not a good idea.
Also, if I do that, then I am getting some restart_lsn inconsistencies
which causes subsequent pg_logical_slot_get_changes() calls to
re-decode older records. I continue to investigate.

I am attaching the latest WIP patch. This contains the additional TAP
test changes.

Regards,
Nikhils

On 8 December 2017 at 01:15, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 12/7/17 08:31, Peter Eisentraut wrote:
>> On 12/4/17 10:15, Nikhil Sontakke wrote:
>>> PFA, latest patch for this functionality.
>>
>> This probably needs documentation updates for the logical decoding chapter.
>
> You need the attached patch to be able to compile without warnings.
>
> Also, the regression tests crash randomly for me at
>
> frame #4: 0x00010a6febdb
> postgres`heap_prune_record_prunable(prstate=0x7ffee5578990, xid=0)
> at pruneheap.c:625
>622   * This should exactly match the PageSetPrunable macro.  We
> can't store
>623   * directly into the page header yet, so we update working 
> state.
>624   */
> -> 625  Assert(TransactionIdIsNormal(xid));
>626  if (!TransactionIdIsValid(prstate->new_prune_xid) ||
>627  TransactionIdPrecedes(xid, prstate->new_prune_xid))
>628  prstate->new_prune_xid = xid;
>
> Did you build with --enable-cassert?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


2pc_logical_12_12_17_wip.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2017-12-05 Thread Nikhil Sontakke
>> - Modified HeapTupleSatisfiesVacuum to return HEAPTUPLE_RECENTLY_DEAD
>> if the transaction id is newer than OldestXmin. Doing this only for
>> CATALOG tables (htup->t_tableOid < (Oid) FirstNormalObjectId).
>
>
> Because logical decoding supports user-catalog relations, we need to use the
> same sort of logical that GetOldestXmin uses instead of a simple oid-range
> check. See  RelationIsAccessibleInLogicalDecoding() and the
> user_catalog_table reloption.
>

Unfortunately, HeapTupleSatisfiesVacuum does not have the Relation
structure handily available to allow for these checks..

> Otherwise pseudo-catalogs used by logical decoding output plugins could
> still suffer issues with needed tuples getting vacuumed, though only if the
> txn being decoded made changes to those tables than ROLLBACKed. It's a
> pretty tiny corner case for decoding of 2pc but a bigger one when we're
> addressing streaming decoding.
>

We disallow rewrites on user_catalog_tables, so they cannot change
underneath. Yes, DML can be carried out on them inside a 2PC
transaction which then gets ROLLBACK'ed. But if it's getting aborted,
then we are not interested in that data anyways. Also, now that we
have the "filter_decode_txn_cb_wrapper()" function, we will stop
decoding by the next change record cycle because of the abort.

So, I am not sure if we need to track user_catalog_tables in
HeapTupleSatisfiesVacuum explicitly.

> Otherwise I'm really, really happy with how this is progressing and want to
> find time to play with it.

Yeah, I will do some more testing and add a few more test cases in the
test_decoding plugin. It might be handy to have a DELAY of a few
seconds after every change record processing, for example. That ways,
we can have a TAP test which can do a few WAL activities and then we
introduce a concurrent rollback midways from another session in the
middle of that delayed processing. I have done debugger based testing
of this concurrent rollback functionality as of now.

Another test (actually, functionality) that might come in handy, is to
have a way for DDL to be actually carried out on the subscriber. We
will need something like pglogical.replicate_ddl_command to be added
to the core for this to work. We can add this functionality as a
follow-on separate patch after discussing how we want to implement
that in core.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2017-12-04 Thread Nikhil Sontakke
PFA, latest patch for this functionality.
This patch contains the following changes as compared to the earlier patch:

- Fixed a bunch of typos and comments

- Modified HeapTupleSatisfiesVacuum to return HEAPTUPLE_RECENTLY_DEAD
if the transaction id is newer than OldestXmin. Doing this only for
CATALOG tables (htup->t_tableOid < (Oid) FirstNormalObjectId).

- Added a filter callback filter_decode_txn_cb_wrapper() to decide if
it's ok to decode the NEXT change record. This filter as of now checks
if the XID that is involved got aborted. Additional checks can be
added here as needed.

- Added ABORT callback in the decoding process. This was not needed
before because we always used to decode committed transactions. With
2PC transactions, it possible that while we are decoding it, another
backend might issue  a concurrent ROLLBACK PREPARED. So when
filter_decode_txn_cb_wrapper() gets called, it will tell us to not to
decode the next change record. In that case we need to send an ABORT
to the subscriber (and not ROLLBACK PREPARED because we are yet to
issue PREPARE to the subscriber)

- Added all functionality to read the abort command and apply it on
the remote subscriber as needed.

- Added functionality in ReorderBufferCommit() to abort midways based
on the feedback from filter_decode_txn_cb_wrapper()

- Modified LockGXact() and FinishPreparedTransaction() to allow
missing GID in case of "ROLLBACK PREPARED". Currently, this will only
happen in the logical apply code path. We still send it to the
subscriber because it's difficult to identify on the provider if this
transaction was aborted midways in decoding or if it's in PREPARED
state on the subscriber. It will error out as before in all other
cases.

- Totally removed snapshot addition/deletion code while doing the
decoding. That's not needed at all while decoding an ongoing
transaction. The entries in the snapshot are needed for future
transactions to be able to decode older transactions. For 2PC
transactions, we don't need to decode them till COMMIT PREPARED gets
called. This has simplified all that unwanted snapshot push/pop code,
which is nice.

Regards,
Nikhils

On 30 November 2017 at 16:08, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote:
> Hi,
>
>
>>> So perhaps better approach would be to not return
>>> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same
>>> logic we use for deleted tuples of committed transactions) in the
>>> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly
>>> checked HOT pruning and AFAICS the normal HOT pruning (the one not
>>> called by vacuum) also uses the xmin as authoritative even for aborted
>>> txes so nothing needs to be done there probably.
>>>
>>> In case we are worried that this affects cleanups of for example large
>>> aborted COPY transactions and we think it's worth worrying about then we
>>> could limit the new OldestXmin based logic only to catalog tuples as
>>> those are the only ones we need available in decoding.
>>
>>
>> Yeah, if it's limited to catalog tuples only then that sounds good. I was
>> quite concerned about how it'd impact vacuuming otherwise, but if limited to
>> catalogs about the only impact should be on workloads that create lots of
>> TEMPORARY tables then ROLLBACK - and not much on those.
>>
>
> Based on these discussions, I think there are two separate issues here:
>
> 1) Make HeapTupleSatisfiesVacuum() to behave differently for recently
> aborted catalog tuples.
>
> 2) Invent a mechanism to stop a specific logical decoding activity in
> the middle. The reason to stop it could be a concurrent abort, maybe a
> global transaction manager decides to rollback, or any other reason,
> for example.
>
>  ISTM, that for 2, if (1) is able to leave the recently abort tuples
> around for a little bit while (we only really need them till the
> decode of the current change record is ongoing), then we could
> accomplish it via a callback. This callback should be called before
> commencing decode and network send of each change record. In case of
> in-core logical decoding, the callback for pgoutput could check for
> the transaction having aborted (a call to TransactionIdDidAbort() or
> similar such functions), additional logic can be added as needed for
> various scenarios. If it's aborted, we will abandon decoding and send
> an ABORT to the subscribers before returning.
>
> Regards,
> Nikhils



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


2pc_logical_04_12_17.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2017-11-30 Thread Nikhil Sontakke
Hi,


>> So perhaps better approach would be to not return
>> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same
>> logic we use for deleted tuples of committed transactions) in the
>> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly
>> checked HOT pruning and AFAICS the normal HOT pruning (the one not
>> called by vacuum) also uses the xmin as authoritative even for aborted
>> txes so nothing needs to be done there probably.
>>
>> In case we are worried that this affects cleanups of for example large
>> aborted COPY transactions and we think it's worth worrying about then we
>> could limit the new OldestXmin based logic only to catalog tuples as
>> those are the only ones we need available in decoding.
>
>
> Yeah, if it's limited to catalog tuples only then that sounds good. I was
> quite concerned about how it'd impact vacuuming otherwise, but if limited to
> catalogs about the only impact should be on workloads that create lots of
> TEMPORARY tables then ROLLBACK - and not much on those.
>

Based on these discussions, I think there are two separate issues here:

1) Make HeapTupleSatisfiesVacuum() to behave differently for recently
aborted catalog tuples.

2) Invent a mechanism to stop a specific logical decoding activity in
the middle. The reason to stop it could be a concurrent abort, maybe a
global transaction manager decides to rollback, or any other reason,
for example.

 ISTM, that for 2, if (1) is able to leave the recently abort tuples
around for a little bit while (we only really need them till the
decode of the current change record is ongoing), then we could
accomplish it via a callback. This callback should be called before
commencing decode and network send of each change record. In case of
in-core logical decoding, the callback for pgoutput could check for
the transaction having aborted (a call to TransactionIdDidAbort() or
similar such functions), additional logic can be added as needed for
various scenarios. If it's aborted, we will abandon decoding and send
an ABORT to the subscribers before returning.

Regards,
Nikhils