Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-22 Thread Guozhang Wang
Hello John, Thanks for the new PR, and sorry for zig-zagging this KIP design. I made a pass on the PR and it looks good to me overall. Left some minor comments. Guozhang On Mon, Feb 22, 2021 at 9:06 AM John Roesler wrote: > Hello all, > > I have updated the KIP now. The diff is visible here:

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-22 Thread John Roesler
Hello all, I have updated the KIP now. The diff is visible here: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=165225765=13=12 The PR https://github.com/apache/kafka/pull/10137 is now available for reviews. Thanks so much to you all for your help on this! John On

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-22 Thread John Roesler
Thanks, Bill, I was waiting for feedback on the "currentLag" proposal before updating the KIP. Since there haven't been any objections to my new proposal, I'm in the process of updating the KIP document right now. Personally, I still like the original proposal to expose the metadata from the

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-22 Thread Bill Bejeck
Thanks for the KIP update John, The KIP as it stands LGTM. One question, in your previous comment you stated that the KIP would introduce the `currentLag()` method, but the KIP seems to show the `metadata()` implementation. FWIW I like the metadata approach as it seems to provide information

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-22 Thread John Roesler
Hello all, Since there haven't been any big objections to my proposed design fix, I'm updating the KIP now and opening the PR for reviews. Hopefully, we can get this merged quickly and unblock the system tests. Thanks, John On Mon, 2021-02-22 at 09:49 -0600, John Roesler wrote: > Thanks for

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-22 Thread John Roesler
Thanks for that idea, Chia-Ping, I agree that it would be nice for users to have a single API to get all kinds of metadata, but I'd rather introduce it in a KIP that would expose more than one piece of metadata. At the moment, I am motivated to keep this as simple as possible, which means only

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-17 Thread Chia-Ping Tsai
That new solution is good to me as it brings fewer changes than either options or config. one minor question - Could we return a composite object rather than OptionalLong? For example: class Metadata { long lag; } There are a bunch of 'metadata' for a partition and it is possible we want to

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-16 Thread John Roesler
Hello again, all, Thank you for your feedback and patience. I am hopeful that I have been able to come up with a solution that will satisfy everyone. Under a previous design iteration of the desired task idling semantics in KIP-695, we did indeed require the behavior change, but while I was

Re: Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-11 Thread Chia-Ping Tsai
here is my two cents. If the behavior eventually gets changed (return on response), the config is more suitable as it is easier to be deprecated (less changes). For example, we can introduce the config in 2.8 and then deprecate it in 2.9. 3.0 removes the config and supports only

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-10 Thread Sophie Blee-Goldman
Hey John, I know I'm a bit late to this party but just for the record, I don't think it's *totally *unreasonable for a user to take up the "poll on max timeout and assume some records will be returned" approach. And I also can imagine plenty of manually-assigned consumers implemented doing exactly

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-10 Thread John Roesler
Hello again, all. I have submitted the PR: https://github.com/apache/kafka/pull/10096 Ismael chimed in on the PR review to indicate that the config approach may not be desirable. How strongly do we feel that the behavior change is unacceptable? It seems like most of the people involved felt the

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-05 Thread Guozhang Wang
Thanks everyone for chiming in here! I'd also prefer the config approach if compared with API changes. On Fri, Feb 5, 2021 at 3:18 PM Bill Bejeck wrote: > I meant to chime in earlier. > > I also like the `PollOptions` idea, but I have to agree that the config > option would be the least

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-05 Thread Bill Bejeck
I meant to chime in earlier. I also like the `PollOptions` idea, but I have to agree that the config option would be the least disruptive approach. Thanks, Bill On Fri, Feb 5, 2021 at 6:12 PM John Roesler wrote: > Thanks, all! > > It seems that the config I proposed is a solution that >

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-05 Thread John Roesler
Thanks, all! It seems that the config I proposed is a solution that everyone can be happy with, so I will go ahead with a PR to fix that. I'll update the KIP after a round of PR reviews, in case there are new concerns that arise. Thanks, -John On Fri, 2021-02-05 at 15:07 -0800, Matthias J. Sax

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-05 Thread Matthias J. Sax
Thanks for providing more details. Adding a config might be the way a least resistance... I am fine with that. -Matthias On 2/4/21 9:42 AM, Chia-Ping Tsai wrote: >> vvv >> long_poll.mode: return_on_records|return_on_response > > This idea LGTM. It not only makes minimum changes

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-04 Thread Chia-Ping Tsai
> vvv > long_poll.mode: return_on_records|return_on_response This idea LGTM. It not only makes minimum changes to current behavior but also works for KIP-695. On 2021/02/04 16:07:11, John Roesler wrote: > Hi Matthias, Chia-Ping, and Tom, > > Thanks for the thoughtful

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-04 Thread John Roesler
Hi Matthias, Chia-Ping, and Tom, Thanks for the thoughtful replies! Re: poll(~forever~) to block indefinitely on records: Thanks for your dilligence, Chia-Ping. While I wouldn't personally recommend for anyone to write code that blocks forever on I/O, I do agree this is something that "real

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-04 Thread Tom Bentley
Hi, The Javadoc for KafkaConsumer#poll() includes the following: * This method returns immediately if there are records available. *Otherwise, > it will await the passed timeout.* > * If the timeout expires, an empty record set will be returned. Note that > this method may block beyond the > *

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-03 Thread Chia-Ping Tsai
Thanks for your sharing Matthias. I agree that is indeed an anti-pattern to assume poll() returns data or not. However, I check all usages of poll() in code base. There is an interesting use case - poll(a bigger timeout) - it implies that callers want to block poll()(forever) unless there are

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-03 Thread Matthias J. Sax
Thanks for your email John. I agree that it seems to be an anti-pattern to write code that makes assumptions if poll() returns data or not. Thus, we should fix-forward the system test from my point of view. >From my understanding, the impact of KIP-695 is that we might return early from poll()

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-02-03 Thread John Roesler
Hello again all, I'm resurrecting this thread to discuss an issue that has come up after merging the code for this KIP. The issue is that some of the system tests need to be updated in the same way that this integration test needed to be updated:

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2021-01-06 Thread John Roesler
Hello all, Thanks to all who participated in the discussion and vote. I'm closing the vote now and marking KIP-695 as accepted: * 4 binding +1 (Guozhang, Bill, Matthias, and myself) * 2 non-binding +1 (Bruno and Walker) The PRs will follow shortly. Thanks, -John On Fri, 2020-12-18 at 11:53

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-18 Thread Matthias J. Sax
Sorry that I am late to the game. +1 (binding) We always knew about this gap when we did KIP-353, and thus, I am glad that we finally address it. -Matthias On 12/18/20 10:16 AM, John Roesler wrote: > Thanks for your question, Ismael, > > Are you concerned about the consumer performance,

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-18 Thread John Roesler
Thanks for your question, Ismael, Are you concerned about the consumer performance, streams performance or both? On the consumer side, this is only creating one extra struct for each response partition to represent the metadata that we already have access to internally. I don’t think this

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-18 Thread Ismael Juma
Hi John, It would be good to make sure these changes have no measurable performance impact for the use cases that don't need it. Have we given this some thought? And what would be the perf testing strategy to verify this? Ismael On Fri, Dec 18, 2020 at 8:39 AM John Roesler wrote: > Thanks for

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-18 Thread John Roesler
Thanks for the votes and reviews, all. I'll wait for a response from Jason before closing the vote, since he asked for clarification. The present count is: * 3 binding +1 (Guozhang, Bill, and myself) * 2 non-binding +1 (Bruno and Walker) I have updated the KIP document in response to the

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-17 Thread John Roesler
Thanks Jason, We would only return the metadata for the latest fetches. So, if someone wanted to use this to lazily maintain a client-side metadata map for all partitions, they'd have to store it separately and merge in new updates as they arrive. This way: 1. We don't need to increase the

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-17 Thread Bill Bejeck
Hi John, I've made a pass over the KIP and I think it will be a good addition. Modulo Jason's question, I'm a +1 (binding). Thanks, Bill On Wed, Dec 16, 2020 at 1:29 PM Jason Gustafson wrote: > Hi John, > > Just one question. It wasn't very clear to me exactly when the metadata > would be

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-16 Thread Jason Gustafson
Hi John, Just one question. It wasn't very clear to me exactly when the metadata would be returned in `ConsumerRecords`. Would we /always/ include the metadata for all partitions that are assigned, or would it be based on the latest fetches? Thanks, Jason On Fri, Dec 11, 2020 at 4:07 PM John

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-11 Thread John Roesler
Thanks, Guozhang! All of your feedback sounds good to me. I’ll update the KIP when I am able. 3) I believe it is the position after the fetch, but I will confirm. I think omitting position may render beginning and end offsets useless as well, which leaves only lag. That would be fine with me,

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-11 Thread Guozhang Wang
Hello John, Thanks for the updates! I've made a pass on the KIP and also the POC PR, here are some minor comments: 1) nit: "receivedTimestamp" -> it seems the metadata keep getting updated, and we do not create a new object but just update the values in-place, so maybe calling it

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-11 Thread Walker Carlson
Thanks for the KIP! +1 (non-binding) walker On Wed, Dec 9, 2020 at 11:40 AM Bruno Cadonna wrote: > Thanks for the KIP, John! > > +1 (non-binding) > > Best, > Bruno > > On 08.12.20 18:03, John Roesler wrote: > > Hello all, > > > > There hasn't been much discussion on KIP-695 so far, so I'd > >

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-09 Thread Bruno Cadonna
Thanks for the KIP, John! +1 (non-binding) Best, Bruno On 08.12.20 18:03, John Roesler wrote: Hello all, There hasn't been much discussion on KIP-695 so far, so I'd like to go ahead and call for a vote. As a reminder, the purpose of KIP-695 to improve on the "task idling" feature we

[VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-08 Thread John Roesler
Hello all, There hasn't been much discussion on KIP-695 so far, so I'd like to go ahead and call for a vote. As a reminder, the purpose of KIP-695 to improve on the "task idling" feature we introduced in KIP-353. This KIP will allow Streams to offer deterministic time semantics in join-type