Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-13 Thread Apurva Mehta
The KIP has passed with three binding +1 votes (Guozhang, Ismael, Jason) and no -1 or +0 votes. Thanks to everyone for the feedback. Apurva On Mon, Sep 11, 2017 at 8:31 PM, Apurva Mehta wrote: > Hi Becket, > > You are right: the calculations are per partition produced to

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-11 Thread Apurva Mehta
Hi Becket, You are right: the calculations are per partition produced to by each idempotent producer. I actually think this makes the problem more acute when we actually end up enabling the idempotent producer by default. However, even the most optimized version will still result in an overhead

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-11 Thread Becket Qin
Hi Apurva, Thanks for the explanation. I think the STO will be per producer/partition, right? Am I missing something? You are right that the proposal does not strengthen the semantic. The goal is more about trying to bound the memory consumption to some reasonable number and use the memory more

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-11 Thread Apurva Mehta
Hi Becket, Regarding the current implementation: we opted for a simpler server side implementation where we _don't_ snapshot the metadata of the last 5 batches to disk. So if a broker fails, comes back online, and is the leader again, it will only have the last batch in memory. With max.in.flight

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-11 Thread Becket Qin
Hi Apurva, Sorry for being late on this thread. I am trying to understand the implementation of case that we will throw DuplicateSequenceException. My understanding is the following: 1. On the broker side, we will cache 5 most recent sequence/timestamp/offset (STO) for each of the producer ID. 2.

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-11 Thread Apurva Mehta
Thanks for the votes everyone. One of the proposals here was to raise a 'DuplicateSequenceException' to the user if the broker detected that one of the internal retries resulted in the duplicate, and the metadata for the original batch was no longer cached. However, when implementing this

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-07 Thread Apurva Mehta
Thanks for the comments Ismael. I have gone ahead and incorporated all your suggestions in the KIP document. You convinced me on adding max.message.bytes :) Apurva On Thu, Sep 7, 2017 at 6:12 PM, Ismael Juma wrote: > Thanks for the KIP. +1 (binding) from me. A few minor

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-07 Thread Ismael Juma
Thanks for the KIP. +1 (binding) from me. A few minor comments: 1. We should add a note to the backwards compatibility section explaining the impact of throwing DuplicateSequenceException (a new exception) from `send`. As I understand it, it's not an issue, but good to include it in the KIP. 2.

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-07 Thread Apurva Mehta
Thanks for the votes! Jason: I updated the KIP so that the messageFormatVersion field is int8. Guozhang: The type of the config field is a purely internal concept. The public API of the KafkaProducer takes a Map type or a Properties type (which is a map of String to String). So

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-07 Thread Guozhang Wang
+1. A quick clarification question regarding the compatibility plan as for "The legacy values for `enable.idempotence` will be interpreted as follows by the new producer: true will mean required, false will mean off." Right now "enable.idempotence" is defined as Type.BOOLEAN while we are likely

Re: [VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-07 Thread Jason Gustafson
+1. Thanks for the KIP. One nit: we could use int8 to represent the message format version. That is how it is represented in the messages themselves. -Jason On Thu, Sep 7, 2017 at 1:51 PM, Apurva Mehta wrote: > Hi, > > I'd like to start a vote for KIP-192: > >

[VOTE] KIP-192 : Provide cleaner semantics when idempotence is enabled

2017-09-07 Thread Apurva Mehta
Hi, I'd like to start a vote for KIP-192: https://cwiki.apache.org/confluence/display/KAFKA/KIP-192+%3A+Provide+cleaner+semantics+when+idempotence+is+enabled Thanks, Apurva