Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-03-01 Thread Artem Livshits
Hi Jun, > 32. ... metric name ... I've updated the metric name to be *kafka.coordinator.transaction:type=TransactionStateManager,name=ActiveTransactionOpenTimeMax.* Let me know if it works. -Artem On Thu, Feb 29, 2024 at 12:03 PM Artem Livshits wrote: > Hi Jun, > > > So, it doesn't provid

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-29 Thread Artem Livshits
Hi Jun, > So, it doesn't provide the same guarantees as 2PC either. I think the key point is that we don't claim 2PC guarantees in that case. Maybe it's splitting hairs from the technical perspective (in the end of the day if the operator doesn't let the user use 2PC, it's going to be a "works u

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-28 Thread Jun Rao
Hi, Artem, Thanks for the reply. I understand your concern on having a timeout breaking the 2PC guarantees. However, the fallback plan to disable 2PC with an independent keepPreparedTxn is subject to the timeout too. So, it doesn't provide the same guarantees as 2PC either. To me, if we provide

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-28 Thread Andrew Schofield
Hi Artem, I totally agree that a timeout for the 2PC case is a bad idea. It does abandon the 2PC guarantee. Thanks, Andrew > On 28 Feb 2024, at 00:44, Artem Livshits > wrote: > > Hi Jun, > > Thank you for the discussion. > >> For 3b, it would be useful to understand the reason why an admin does

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-27 Thread Artem Livshits
Hi Jun, Thank you for the discussion. > For 3b, it would be useful to understand the reason why an admin doesn't authorize 2PC for self-hosted Flink I think the nuance here is that for cloud, there is a cloud admin (operator) and there is cluster admin (who, for example could manage acls on topi

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-23 Thread Jun Rao
Hi, Artem, Thanks for the reply. For 3b, it would be useful to understand the reason why an admin doesn't authorize 2PC for self-hosted Flink. Is the main reason that 2PC has unbounded timeout that could lead to unbounded outstanding transactions? If so, another way to address that is to allow th

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-21 Thread Artem Livshits
Hi Jun, > 20A. One option is to make the API initTransactions(boolean enable2PC). We could do that. I think there is a little bit of symmetry between the client and server that would get lost with this approach (server has enable2PC as config), but I don't really see a strong reason for enable2P

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-21 Thread Artem Livshits
Hi Rowland, > The Open Group DTP model and the XA interface requires that resource managers be able to report prepared transactions only, so a prepare RPC will be required. It's required in the XA protocol, but I'm not sure we have to build it into a Kafka. Looks like we just need a catalog of p

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-21 Thread Jun Rao
Hi, Artem, Thanks for the reply. 20A. One option is to make the API initTransactions(boolean enable2PC). Then, it's clear from the code whether 2PC related logic should be added. 20B. But realistically, we want Flink (and other apps) to have a single implementation of the 2PC logic, not two diff

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-21 Thread Artem Livshits
Hi Jun, > 20A. This only takes care of the abort case. The application still needs to be changed to handle the commit case properly My point here is that looking at the initTransactions() call it's not clear what the semantics is. Say I'm doing code review, I cannot say if the code is correct o

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-20 Thread Jun Rao
Hi, Artem, Thanks for the reply. 20. "Say if an application currently uses initTransactions() to achieve the current semantics, it would need to be rewritten to use initTransactions() + abort to achieve the same semantics if the config is changed. " This only takes care of the abort case. The ap

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-19 Thread Rowland Smith
Hi Artem, I think that we both have the same understanding. An explicit prepare RPC does not eliminate any conditions, it just reduces the window for possible undesirable conditions like pending in-doubt transactions. So there is no right or wrong answer, a prepare RPC will reduce the number of oc

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-16 Thread Artem Livshits
Hi Rowland, > I am not sure what you mean by guarantee, A guarantee would be an elimination of complexity or a condition. E.g. if adding an explicit prepare RPC eliminated in-doubt transactions, or eliminated a significant complexity in implementation. > 1. Transactions that haven’t reached “pr

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-15 Thread Artem Livshits
Hi Jun, Thank you for your questions. > 20. So to abort a prepared transaction after the producer start, we could use ... I agree, initTransaction(true) + abort would accomplish the behavior of initTransactions(false), so we could technically have fewer ways to achieve the same thing, which is g

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-07 Thread Jun Rao
Hi, Artem, Thanks for the reply. 20. So to abort a prepared transaction after producer start, we could use either producer.initTransactions(false) or producer.initTransactions(true) producer.abortTransaction Could we just always use the latter API? If we do this, we could potentially elimin

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-06 Thread Rowland Smith
Hi Artem, I am not sure what you mean by guarantee, but I am referring to a better operational experience. You mentioned this as the first benefit of an explicit "prepare" RPC in the KIP. 1. Transactions that haven’t reached “prepared” state can be aborted via timeout. However, in explaining wh

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-06 Thread Artem Livshits
Hi Jun, > 20. For Flink usage, it seems that the APIs used to abort and commit a prepared txn are not symmetric. For Flink it is expected that Flink would call .commitTransaction or .abortTransaction directly, it wouldn't need to deal with PreparedTxnState, the outcome is actually determined by t

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-05 Thread Artem Livshits
Hi Rowland, Thank you for your reply. I think I understand what you're saying and just tried to provide a quick summary. The https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC actually go

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-05 Thread Rowland Smith
Hi Artem, I don't think that you understand what I am saying. In any transaction, there is work done before the call to prepareTranscation() and work done afterwards. Any work performed before the call to prepareTransaction() can be aborted after a relatively short timeout if the client fails. It

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-05 Thread Jun Rao
Hi, Artem, Thanks for the reply. 20. For Flink usage, it seems that the APIs used to abort and commit a prepared txn are not symmetric. To abort, the app will just call producer.initTransactions(false) To commit, the app needs to call producer.initTransactions(true) producer.completeTransa

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-04 Thread Artem Livshits
Hi Rowland, Thank you for your feedback. Using an explicit prepare RPC was discussed and is listed in the rejected alternatives: https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC. Basical

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-04 Thread Rowland Smith
Hi Artem, It has been a while, but I have gotten back to this. I understand that when 2PC is used, the transaction timeout will be effectively infinite. I don't think that this behavior is desirable. A long running transaction can be extremely disruptive since it blocks consumers on any partitions

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-02 Thread Artem Livshits
Hi Jun, > Then, should we change the following in the example to use InitProducerId(true) instead? We could. I just thought that it's good to make the example self-contained by starting from a clean state. > Also, could Flink just follow the dual-write recipe? I think it would bring some unnec

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-29 Thread Jun Rao
Hi, Artem, Thanks for the reply. 20. So for the dual-write recipe, we should always call InitProducerId(keepPreparedTxn=true) from the producer? Then, should we change the following in the example to use InitProducerId(true) instead? 1. InitProducerId(false); TC STATE: Empty, ProducerId=42, Produ

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-26 Thread Artem Livshits
Hi Jun, > 20. I am a bit confused by how we set keepPreparedTxn. ... keepPreparedTxn=true informs the transaction coordinator that it should keep the ongoing transaction, if any. If the keepPreparedTxn=false, then any ongoing transaction is aborted (this is exactly the current behavior). enabl

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-25 Thread Jun Rao
Hi, Artem, Thanks for the reply. A few more comments. 20. I am a bit confused by how we set keepPreparedTxn. From the KIP, I got the following (1) to start 2pc, we call InitProducerId(keepPreparedTxn=false); (2) when the producer fails and needs to do recovery, it calls InitProducerId(keepPrepare

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-05 Thread Artem Livshits
Hi Rowland, Thank you for the feedback. For the 2PC cases, the expectation is that the timeout on the client would be set to "effectively infinite", that would exceed all practical 2PC delays. Now I think that this flexibility is confusing and can be misused, I have updated the KIP to just say t

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-04 Thread Rowland Smith
It is probably me. I copied the original message subject into a new email. Perhaps that is not enough to link them. It was not my understanding from reading KIP-939 that we are doing away with any transactional timeout in the Kafka broker. As I understand it, we are allowing the application to set

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-04 Thread Justine Olshan
Hey Rowland, Not sure why this message showed up in a different thread from the other KIP-939 discussion (is it just me?) In KIP-939, we do away with having any transactional timeout on the Kafka side. The external coordinator is fully responsible for controlling whether the transaction completes

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-01-03 Thread Rowland Smith
Hi Artem, I saw your response in the thread I started discussing Kafka distributed transaction support and the XA interface. I would like to work with you to add XA support to Kafka on top of the excellent foundational work that you have started with KIP-939. I agree that explicit XA support shoul

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-12-18 Thread Artem Livshits
Hi Jun, Thank you for the comments. > 10. For the two new fields in Enable2Pc and KeepPreparedTxn ... I added a note that all combinations are valid. Enable2Pc=false & KeepPreparedTxn=true could be potentially useful for backward compatibility with Flink, when the new version of Flink that impl

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-12-18 Thread Artem Livshits
Hi Justine, I've updated the KIP based on the KIP-890 updates. Now KIP-939 only needs to add one tagged field NextProducerEpoch as the other required fields will be added as part of KIP-890. > But here we could call the InitProducerId multiple times and we only want the producer with the correct

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-12-08 Thread Jun Rao
Hi, Artem, Thanks for the KIP. A few comments below. 10. For the two new fields in Enable2Pc and KeepPreparedTxn in InitProducerId, it would be useful to document a bit more detail on what values are set under what cases. For example, are all four combinations valid? 11. InitProducerIdResponse:

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-12-07 Thread Justine Olshan
Hey Artem, Thanks for the updates. I think what you say makes sense. I just updated my KIP so I want to reconcile some of the changes we made especially with respect to the TransactionLogValue. Firstly, I believe tagged fields require a default value so that if they are not filled, we return the

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-11-22 Thread Artem Livshits
Hi Justine, After thinking a bit about supporting atomic dual writes for Kafka + NoSQL database, I came to a conclusion that we do need to bump the epoch even with InitProducerId(keepPreparedTxn=true). As I described in my previous email, we wouldn't need to bump the epoch to protect from zombies

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-10-06 Thread Artem Livshits
Hi Raman, Thank you for the questions. Given that the primary effect of setting enable2pc flag is disabling timeout, it makes sense to make enable2pc have similar behavior w.r.t. when it can be set. One clarification about the Ongoing case -- the current (pre-KIP-939) behavior is to abort ongoin

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-10-06 Thread Artem Livshits
Hi Justine, Thank you for the questions. Currently (pre-KIP-939) we always bump the epoch on InitProducerId and abort an ongoing transaction (if any). I expect this behavior will continue with KIP-890 as well. With KIP-939 we need to support the case when the ongoing transaction needs to be pre

RE: [DISCUSS] KIP-939: Support Participation in 2PC

2023-10-04 Thread Raman Verma
Hello Artem, Now that `InitProducerIdRequest` will have an extra parameter (enable2PC), can the client change the value of this parameter during an ongoing transaction. Here is how the transaction coordinator responds to InitProducerId requests according to the current transaction's state. - Emp

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-10-03 Thread Justine Olshan
Hey Artem, Thanks for the KIP. I had a question about epoch bumping. Previously when we send an InitProducerId request on Producer startup, we bump the epoch and abort the transaction. Is it correct to assume that we will still bump the epoch, but just not abort the transaction? If we still bump

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-09-07 Thread Artem Livshits
Hi Alex, Thank you for your questions. > the purpose of having broker-level transaction.two.phase.commit.enable The thinking is that 2PC is a bit of an advanced construct so enabling 2PC in a Kafka cluster should be an explicit decision. If it is set to 'false' InitiProducerId (and initTransact

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-09-05 Thread Alexander Sorokoumov
Hi Artem, Thanks for publishing this KIP! Can you please clarify the purpose of having broker-level transaction.two.phase.commit.enable config in addition to the new ACL? If the brokers are configured with transaction.two.phase.commit.enable=false, at what point will a client configured with tran

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-25 Thread Roger Hoover
Other than supporting multiplexing transactional streams on a single producer, I don't see how to improve it. On Thu, Aug 24, 2023 at 12:12 PM Artem Livshits wrote: > Hi Roger, > > Thank you for summarizing the cons. I agree and I'm curious what would be > the alternatives to solve these proble

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-24 Thread Artem Livshits
Hi Roger, Thank you for summarizing the cons. I agree and I'm curious what would be the alternatives to solve these problems better and if they can be incorporated into this proposal (or built independently in addition to or on top of this proposal). E.g. one potential extension we discussed ear

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-24 Thread Artem Livshits
Hi Guy, You raise a very good point. Supporting XA sounds like a good way to integrate Kafka and it's something that I think we should support at some point in the future. For this KIP, though, we thought we focus on a more basic functionality keeping the following in mind: 1. XA is not univers

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-23 Thread Roger Hoover
Thanks. I like that you're moving Kafka toward supporting this dual-write pattern. Each use case needs to consider the tradeoffs. You already summarized the pros very well in the KIP. I would summarize the cons as follows: - you sacrifice availability - each write requires both DB and Kafka to

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-23 Thread Artem Livshits
Hi Roger, Thank you for the feedback. You make a very good point that we also discussed internally. Adding support for multiple concurrent transactions in one producer could be valuable but it seems to be a fairly large and independent change that would deserve a separate KIP. If such support i

RE: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-23 Thread guy
Hi, Nice idea, but you could maximise compatibility if you adhere to XA standard APIs rather than Kafka internal APIs. We at Atomikos offer 2PC coordination and recovery and we are happy to help you design this, it's a service we usually offer for free to backend vendors / systems. Let me kno

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-22 Thread Roger Hoover
Artem, Thanks for the reply. If I understand correctly, Kafka does not support concurrent transactions from the same producer (transactional id). I think this means that applications that want to support in-process concurrency (say thread-level concurrency with row-level DB locking) would need t

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-22 Thread Artem Livshits
Hi Roger, Arjun, Thank you for the questions. > It looks like the application must have stable transactional ids over time? The transactional id should uniquely identify a producer instance and needs to be stable across the restarts. If the transactional id is not stable across restarts, then zo

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-22 Thread Arjun Satish
Hello Artem, Thanks for the KIP. I have the same question as Roger on concurrent writes, and an additional one on consumer behavior. Typically, transactions will timeout if not committed within some time interval. With the proposed changes in this KIP, consumers cannot consume past the ongoing tr

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-21 Thread Roger Hoover
Hi Artem, Thanks for writing this KIP. Can you clarify the requirements a bit more for managing transaction state? It looks like the application must have stable transactional ids over time? What is the granularity of those ids and producers? Say the application is a multi-threaded Java web s