On Wed, Dec 8, 2021 at 3:50 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Wed, Dec 8, 2021 at 2:16 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > > > > > > > On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapil...@gmail.com> > > > > wrote: > > > > > > > > I'll submit the patch tomorrow. > > > > > > > > While updating the patch, I realized that skipping a transaction that > > > > is prepared on the publisher will be tricky a bit; > > > > > > > > First of all, since skip-xid is in pg_subscription catalog, we need to > > > > do a catalog update in a transaction and commit it to disable it. I > > > > think we need to set origin-lsn and timestamp of the transaction being > > > > skipped to the transaction that does the catalog update. That is, > > > > during skipping the (not prepared) transaction, we skip all > > > > data-modification changes coming from the publisher, do a catalog > > > > update, and commit the transaction. If we do the catalog update in the > > > > next transaction after skipping the whole transaction, skip_xid could > > > > be left in case of a server crash between them. > > > > > > > > > > But if we haven't updated origin_lsn/timestamp before the crash, won't > > > it request the same transaction again from the publisher? If so, it > > > will be again able to skip it because skip_xid is still not updated. > > > > Yes. I mean that if we update origin_lsn and origin_timestamp when > > committing the skipped transaction and then update the catalog in the > > next transaction it doesn't work in case of a crash. But it's not > > possible in the first place since the first transaction is empty and > > we cannot set origin_lsn and origin_timestamp to it. > > > > > > > > > Also, we cannot set > > > > origin-lsn and timestamp to an empty transaction. > > > > > > > > > > But won't we update the catalog for skip_xid in that case? > > > > Yes. Probably my explanation was not clear. Even if we skip all > > changes of the transaction, the transaction doesn't become empty since > > we update the catalog. > > > > > > > > Do we see any advantage of updating the skip_xid in the same > > > transaction vs. doing it in a separate transaction? If not then > > > probably we can choose either of those ways and add some comments to > > > indicate the possibility of doing it another way. > > > > I think that since the skipped transaction is always empty there is > > always one transaction. What we need to consider is when we update > > origin_lsn and origin_timestamp. In non-prepared transaction cases, > > the only option is when updating the catalog. > > > > Your last sentence is not completely clear to me but it seems you > agree that we can use one transaction instead of two to skip the > changes, perform a catalog update, and update origin_lsn/timestamp.
Yes. > > > > > > > > In prepared transaction cases, I think that when handling a prepare > > > > message, we need to commit the transaction to update the catalog, > > > > instead of preparing it. And at the commit prepared and rollback > > > > prepared time, we skip it since there is not the prepared transaction > > > > on the subscriber. > > > > > > > > > > Can't we think of just allowing prepare in this case and updating the > > > skip_xid only at commit time? I see that in this case, we would be > > > doing prepare for a transaction that has no changes but as such cases > > > won't be common, isn't that acceptable? > > > > In this case, we will end up committing both the prepared (empty) > > transaction and the transaction that updates the catalog, right? > > > > Can't we do this catalog update before committing the prepared > transaction? If so, both in prepared and non-prepared cases, our > implementation could be the same and we have a reason to accomplish > the catalog update in the same transaction for which we skipped the > changes. But in case of a crash between these two transactions, given that skip_xid is already cleared how do we know the prepared transaction that was supposed to be skipped? > > > If > > so, since these are separate transactions it can be a problem in case > > of a crash between these two commits. > > > > > > > > > Currently, handling rollback prepared already > > > > behaves so; it first checks whether we have prepared the transaction > > > > or not and skip it if haven’t. So I think we need to do that also for > > > > commit prepared case. With that, this requires protocol changes so > > > > that the subscriber can get prepare-lsn and prepare-time when handling > > > > commit prepared. > > > > > > > > So I’m writing a separate patch to add prepare-lsn and timestamp to > > > > commit_prepared message, which will be a building block for skipping > > > > prepared transactions. Actually, I think it’s beneficial even today; > > > > we can skip preparing the transaction if it’s an empty transaction. > > > > Although the comment it’s not a common case, I think that it could > > > > happen quite often in some cases: > > > > > > > > * XXX, We can optimize such that at commit prepared time, we first > > > > check > > > > * whether we have prepared the transaction or not but that doesn't > > > > seem > > > > * worthwhile because such cases shouldn't be common. > > > > */ > > > > > > > > For example, if the publisher has multiple subscriptions and there are > > > > many prepared transactions that modify the particular table subscribed > > > > by one publisher, many empty transactions are replicated to other > > > > subscribers. > > > > > > > > > > I think this is not clear to me. Why would one have multiple > > > subscriptions for the same publication? I thought it is possible when > > > say some publisher doesn't publish any data of prepared transaction > > > say because the corresponding action is not published or something > > > like that. I don't deny that someday we want to optimize this case but > > > it might be better if we don't need to do it along with this patch. > > > > I imagined that the publisher has two publications (say pub-A and > > pub-B) that publishes a diferent set of relations in the database and > > there are two subscribers that are subscribing to either one > > publication (e.g, subscriber-A subscribes to pub-A and subscriber-B > > subscribes to pub-B). If many prepared transactions happen on the > > publisher and these transactions modify only relations published by > > pub-A, both subscriber-A and subscriber-B would prepare the same > > number of transactions but all of them in subscriber-B is empty. > > > > Okay, I understand those cases but note always checking if the > prepared xact exists during commit prepared has a cost and that is why > we avoided it at the first place. There is a separate effort in > progress [1] where we want to avoid sending empty transactions at the > first place. So, it is better to avoid this cost via that effort > rather than adding additional cost at commit of each prepared > transaction. OTOH, if there are other strong reasons to do it then we > can probably consider it. > Thank you for the information. Agreed. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/