Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-08 Thread Ismael Juma
Thanks! On Fri, Sep 8, 2017 at 6:09 AM, Sumant Tambe wrote: > Just did :) > > On 7 September 2017 at 17:52, Ismael Juma wrote: > > > Can we please start the vote on this KIP? The KIP must be accepted by > next > > Wednesday in order to make the cut for

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-07 Thread Sumant Tambe
Just did :) On 7 September 2017 at 17:52, Ismael Juma wrote: > Can we please start the vote on this KIP? The KIP must be accepted by next > Wednesday in order to make the cut for 1.0.0. This issue keeps coming up > again and again, and I'd really like to include a fix for

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-07 Thread Ismael Juma
Can we please start the vote on this KIP? The KIP must be accepted by next Wednesday in order to make the cut for 1.0.0. This issue keeps coming up again and again, and I'd really like to include a fix for 1.0.0. Ismael On Thu, Sep 7, 2017 at 10:01 PM, Apurva Mehta wrote:

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-07 Thread Apurva Mehta
I agree with what Ismael said. Having both retries and delivery.timeout.ms is confusing, and thus the goal is to not have a retries option at all once idempotence is fully battle tested and has become the entrenched default. Until that time, it makes sense to expire batch earlier than

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-07 Thread Ismael Juma
Good question regarding retries Sumant. A few comments: 1. Defaulting to MAX_INT makes sense in the context of delivery.timeout.ms, but introduces the possibility of reordering with the default max.in.flight of 5. Personally, I think reordering is better than dropping the message altogether (if

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-06 Thread Sumant Tambe
I'm not sure whether it's a good idea to have two different ways to control expiration. One option as you suggested is to expire batches based on whichever happens first (exceed delivery.timeout.ms or exhaust retries). Second option is to effectively ignore retries even if it's a very low value.

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-06 Thread Becket Qin
Hey Sumant, I agree with Jun that we can set the default value of retries to MAX_INT. Initially I was also thinking that retries can be deprecated. But after a second thought, I feel it may not be necessary to deprecate retries. With the newly added delivery.timeout.ms, the producer will expire

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-06 Thread Jun Rao
Hi, Sumant, The diagram in the wiki seems to imply that delivery.timeout.ms doesn't include the batching time. For retries, probably we can just default it to MAX_INT? Thanks, Jun On Wed, Sep 6, 2017 at 10:26 AM, Sumant Tambe wrote: > 120 seconds default sounds good to

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-06 Thread Sumant Tambe
120 seconds default sounds good to me. Throwing ConfigException instead of WARN is fine. Added clarification that the producer waits the full request.timeout.ms for the in-flight request. This implies that user might be notified of batch expiry while a batch is still in-flight. I don't recall if

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-05 Thread Ismael Juma
Thanks for updating the KIP, Sumant. A couple of points: 1. I think the default for delivery.timeout.ms should be higher than 30 seconds given that we previously would reset the clock once the batch was sent. The value should be large enough that batches are not expired due to expected events

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-09-05 Thread Sumant Tambe
I've updated the kip-91 writeup to capture some of the discussion here. Please confirm if it's sufficiently accurate. Feel free to edit it if you think some explanation can be better and has

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-30 Thread Jun Rao
Hi, Jiangjie, I mis-understood Jason's approach earlier. It does seem to be a good one. We still need to calculate the selector timeout based on the remaining delivery.timeout.ms to call the callback on time, but we can always wait for an inflight request based on request.timeout.ms. Thanks,

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Becket Qin
Yeah, I think expiring a batch but still wait for the response is probably reasonable given the result is not guaranteed anyways. @Jun, I think the frequent PID reset may still be possible if we do not wait for the in-flight response to return. Consider two partitions p0 and p1, the deadline of

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Jun Rao
Hmm, I thought delivery.timeout.ms bounds the time from a message is in the accumulator (i.e., when send() returns) to the time when the callback is called. If we wait for request.timeout.ms for an inflight request and the remaining delivery.timeout.ms is less than request.timeout.ms, the callback

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Jason Gustafson
I think the semantics of delivery.timeout.ms need to allow for the possibility that the record was actually written. Unless we can keep on retrying indefinitely, there's really no way to know for sure whether the record was written or not. A delivery timeout just means that we cannot guarantee

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Becket Qin
Hi Jason, If we expire the batch from user's perspective but still waiting for the response, would that mean it is likely that the batch will be successfully appended but the users will receive a TimeoutException? That seems a little non-intuitive to the users. Arguably it maybe OK though because

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Jason Gustafson
I think I'm with Becket. We should wait for request.timeout.ms for each produce request we send. We can still await the response internally for PID/sequence maintenance even if we expire the batch from the user's perspective. New sequence numbers would be assigned based on the current PID until

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-29 Thread Sumant Tambe
I'm updating the kip-91 writeup. There seems to be some confusion about expiring an inflight request. An inflight request gets a full delivery.timeout.ms duration from creation, right? So it should be max(remaining delivery.timeout.ms, request.timeout.ms)? Jun, we do want to wait for an inflight

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-28 Thread Becket Qin
Hi Jun, I was thinking of the last try. In that case the remaining delivery.timeout.ms is smaller than request timeout.ms. I am thinking that we can always wait up to request timeout after sending that request to avoid potential unnecessary PID reset. The concern of use remaining

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-27 Thread Jun Rao
Hi, Jiangjie, If we want to enforce delivery.timeout.ms, we need to take the min right? Also, if a user sets a large delivery.timeout.ms, we probably don't want to wait for an inflight request longer than request.timeout.ms. Thanks, Jun On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-25 Thread Becket Qin
Hi Jason, I see what you mean. That makes sense. So in the above case after the producer resets PID, when it retry batch_0_tp1, the batch will still have the old PID even if the producer has already got a new PID. @Jun, do you mean max(remaining delivery.timeout.ms, request.timeout.ms) instead

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-25 Thread Jun Rao
Hi, Becket, Good point on expiring inflight requests. Perhaps we can expire an inflight request after min(remaining delivery.timeout.ms, request.timeout.ms). This way, if a user sets a high delivery.timeout.ms, we can still recover from broker power outage sooner. Thanks, Jun On Thu, Aug 24,

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-24 Thread Jason Gustafson
Hey Becket, Yeah, my point is that we should never retry a batch using a different PID/sequence. In the current implementation, the PID/sequence is assigned when the batch is dequeued and it is never changed after that. Even if we reset the PID due to an error on another partition, we will

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-24 Thread Becket Qin
Hi Jason, delivery.timeout.ms sounds good to me. I was referring to the case that we are resetting the PID/sequence after expire a batch. This is more about the sending the batches after the expired batch. The scenario being discussed is expiring one of the batches in a in-flight request and

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-24 Thread Jason Gustafson
@Becket Good point about unnecessarily resetting the PID in cases where we know the request has failed. Might be worth opening a JIRA to try and improve this. So if we expire the batch prematurely and resend all > the other batches in the same request, chances are there will be > duplicates. If

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-23 Thread Becket Qin
Hi Jun, If TCP timeout is longer than request.timeout.ms, the producer will always hit request.timeout.ms before hitting TCP timeout, right? That is why we added request.timeout.ms in the first place. You are right. Currently we are reset the PID and resend the batches to avoid

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-23 Thread Jun Rao
Hi, Becket, If a message expires while it's in an inflight produce request, the producer will get a new PID if idempotent is enabled. This will prevent subsequent messages from hitting OutOfOrderSequenceException. The issue of not expiring an inflight request is that if a broker server goes down

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-23 Thread Sumant Tambe
OK. Looks like starting the clock after closing the batch has quite a few pitfalls. I can't think of a way of to work around it without adding yet another config. So I won't discuss that here. Anyone? As I said earlier, I'm not hung up on super-accurate notification times. If we are going down

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-23 Thread Ismael Juma
Thanks Becket, that seems reasonable. Sumant, would you be willing to update the KIP based on the discussion or are you still not convinced? Ismael On Wed, Aug 23, 2017 at 6:04 AM, Becket Qin wrote: > In general max.message.delivery.wait.ms is a cleaner approach. That

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-22 Thread Becket Qin
In general max.message.delivery.wait.ms is a cleaner approach. That would make the guarantee clearer. That said, there seem subtleties in some scenarios: 1. I agree with Sumante that it is a little weird that a message could be expired immediately if it happens to enter a batch that is about to

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-22 Thread Ismael Juma
Hi all, The discussion has been going on for a while, would it help to have a call to discuss this? I'd like to start a vote soonish so that we can include this in 1.0.0. I personally prefer max.message.delivery.wait.ms. It seems like Jun, Apurva and Jason also prefer that. Sumant, it seems like

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-16 Thread Jun Rao
Hi, Sumant, The semantics of linger.ms is a bit subtle. The reasoning for the current implementation is the following. Let's say one sets linger.ms to 0 (our current default value). Creating a batch for every message will be bad for throughput. Instead, the current implementation only forms a

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-15 Thread Sumant Tambe
Question about "the closing of a batch can be delayed longer than linger.ms": Is it possible to cause an indefinite delay? At some point bytes limit might kick in. Also, why is closing of a batch coupled with availability of its destination? In this approach a batch chosen for eviction due to

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-11 Thread Jun Rao
Hi, Sumant, 1. Yes, it's probably reasonable to require max.message.delivery.wait.ms > linger.ms. As for retries, perhaps we can set the default retries to infinite or just ignore it. Then the latency will be bounded by max.message.delivery.wait.ms. request.timeout.ms is the max time the request

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-11 Thread Sumant Tambe
Hi Jun, Thanks for looking into it. Yes, we did consider this message-level timeout approach and expiring batches selectively in a request but rejected it due to the reasons of added complexity without a strong benefit to counter-weigh that. Your proposal is a slight variation so I'll mention

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-11 Thread Sumant Tambe
> Thanks for the KIP. Nice documentation on all current issues with the > timeout. For the KIP writeup, all credit goes to Joel Koshy. I'll follow up on your comments a little later. > > You also brought up a good use case for timing out a message. For > applications that collect and send

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-10 Thread Jun Rao
Hi, Sumant, Thanks for the KIP. Nice documentation on all current issues with the timeout. You also brought up a good use case for timing out a message. For applications that collect and send sensor data to Kafka, if the data can't be sent to Kafka for some reason, the application may prefer to

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-09 Thread Sumant Tambe
On Wed, Aug 9, 2017 at 1:28 PM Apurva Mehta wrote: > > > There seems to be no relationship with cluster metadata availability or > > > staleness. Expiry is just based on the time since the batch has been > > ready. > > > Please correct me if I am wrong. > > > > > > > I was

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-09 Thread Apurva Mehta
> > There seems to be no relationship with cluster metadata availability or > > staleness. Expiry is just based on the time since the batch has been > ready. > > Please correct me if I am wrong. > > > > I was not very specific about where we do expiration. I glossed over some > details because

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-09 Thread Sumant Tambe
Responses inline. > > However, one thing which has not come out of the JIRA discussion is the > > > actual use cases for batch expiry. > > > > There are two usecases I can think of for batch expiry mechanism > > irrespective of how we try to bound the time (batch.expiry.ms or > >

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-07 Thread Apurva Mehta
Responses inline: On Mon, Aug 7, 2017 at 9:37 AM, Sumant Tambe wrote: > > > > > However, one thing which has not come out of the JIRA discussion is the > > actual use cases for batch expiry. > > There are two usecases I can think of for batch expiry mechanism > irrespective

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-07 Thread Sumant Tambe
Please see the replies inline. If we are going to have a separate configuration for expiry, I prefer my > proposal of max.message.delivery.wait.ms and its semantics. > OK. I hope others will voice their preference too. > > However, one thing which has not come out of the JIRA discussion is the

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-04 Thread Apurva Mehta
If we are going to have a separate configuration for expiry, I prefer my proposal of max.message.delivery.wait.ms and its semantics. However, one thing which has not come out of the JIRA discussion is the actual use cases for batch expiry. Also, the KIP document states the following: *The per

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-03 Thread Sumant Tambe
I don't want to list the alternatives in the JIRA as rejected just yet because they are still being discussed. I would encourage the respective proposers to do that. It's a wiki after all. As per my current understanding, there are two alternatives being proposed. The original kip-91 approach #1

Re: [DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-03 Thread Jason Gustafson
Thanks for the KIP. Just a quick comment. Can you list the alternatives mentioned in the JIRA discussion in the rejected alternatives section? -Jason On Thu, Aug 3, 2017 at 3:09 PM, Sumant Tambe wrote: > Hi all, > > KIP-91 [1] is another attempt to get better control on

[DISCUSS] KIP-91 Provide Intuitive User Timeouts in The Producer

2017-08-03 Thread Sumant Tambe
Hi all, KIP-91 [1] is another attempt to get better control on producer side timeouts. In essence we're proposing a new config named batch.expiry.ms that will cause batches in the accumulator to expire after the configured timeout. Recently, the discussion on KAFKA-5621 [2] has shed new light