Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1745

2023-04-07 Thread Apache Jenkins Server
See 




[jira] [Reopened] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams

2023-04-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-14318:
-

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams

2023-04-07 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14318.
-
Resolution: Fixed

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1744

2023-04-07 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-909: Allow clients to rebootstrap DNS lookup failure

2023-04-07 Thread Kirk True
Sounds good. I think it's ready to call a vote. Thanks Philip!

On Wed, Apr 5, 2023, at 11:24 AM, Philip Nee wrote:
> Hi all,
> 
> The KIP has been around for some time, and I've updated the document
> according to the previous comments. Here is the outline of the proposed
> changes:
> 1. Adding a timeout configuration.
> 2. Adding a new exception type
> 3. Adding a WARN level log message
> 
> The obvious changes on the client side are:
> 1. They won't attempt to resolve for DNS at the constructor level
> 2. They will try to bootstrap once network client poll is invoked.
> 
> In terms of client-side behavior:
> *Consumer*: Either the poll timer runs out and returns an empty record, or
> BootstrapConnectionException thrown
> *Producer*: API calls will be blocked on waitOnMetadata until the timer
> runs out, either the max block ms, or the bootstrap timer.
> *Admin*: The API calls timeout if API timeout expires first; otherwise, it
> throws a BootstrapConnectionException.
> 
> Let me know your thoughts: I would like to start voting in a week or so.
> 
> Link:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-909%3A+DNS+Resolution+Failure+Should+Not+Fail+the+Clients
> 
> Thanks,
> P
> 
> On Thu, Mar 23, 2023 at 12:59 PM Philip Nee  wrote:
> 
> > Hey Kirk,
> >
> > Sorry about omitting your response; it slipped through the cracks...
> >
> > *I’m not sure if producers and consumers likewise do DNS resolution in
> > their constructors?*
> > Yes, both producer and consumer bootstrap in the constructor.
> >
> > *I agree that moving the DNS resolution to poll(), it would be hard to
> > distinguish hard failures (host name resolution) from transient network
> > issues. After all, the DNS resolution issue we saw was technically a
> > transient issue.*
> > I'm amending a timeout configuration in my KIP because that should resolve
> > any transient issues if it doesn't persist for too long.  If it is a hard
> > failure (like, misconfiguration of the host name), it should be logged and
> > errored out after the expiration.
> >
> > P
> >
> > On Wed, Mar 8, 2023 at 8:12 AM Kirk True  wrote:
> >
> >> Hi Philip,
> >>
> >> I’m understanding the options proposed as consisting of these questions:
> >>
> >> Should we throw an Exception or not?
> >> Where should the DNS resolution/bootstrapping occur—in the constructor,
> >> poll, or somewhere else?
> >> Should there be a timeout, and if so, what configuration drives it?
> >>
> >> We were seeing instances of this issue when constructing KafkaAdminClient
> >> instances. The constructor performs the DNS lookup in that client. I’m not
> >> sure if producers and consumers likewise do DNS resolution in their
> >> constructors?
> >>
> >> I agree that moving the DNS resolution to poll(), it would be hard to
> >> distinguish hard failures (host name resolution) from transient network
> >> issues. After all, the DNS resolution issue we saw was technically a
> >> transient issue.
> >>
> >> Is this an accurate summary of the current thinking:
> >>
> >> Per Jason's suggestion, introduce a new BootstrapConnectionException
> >> Per Chris’ suggestion, introduce a new
> >> API—performInitialDnsResolution()—that an application developer can call to
> >> perform a fast-fail check, throwing BootstrapConnectionException
> >> Otherwise, move the DNS resolution to poll()
> >> Handle BootstrapConnectionException failures in poll() similar to how we
> >> handle NetworkException today, with retries, timeouts, etc., i.e. we don’t
> >> introduce any new configuration
> >> Improve logging to distinguish the DNS resolution case
> >>
> >> Thanks,
> >> Kirk
> >>
> >> > On Mar 6, 2023, at 9:15 AM, Philip Nee  wrote:
> >> >
> >> > Cheers Kafka Community,
> >> >
> >> > I just wanna give this thread bump, as it has been a bit quiet for the
> >> past
> >> > week. I have not updated the KIP based on Chris and Jason's feedback,
> >> as I
> >> > would also like to know more about what do people think.
> >> >
> >> > Jason - Thanks for the suggestion, I think your suggestion makes a lot
> >> of
> >> > sense.
> >> >
> >> > Thanks!
> >> > P
> >> >
> >> > On Tue, Feb 28, 2023 at 2:45 PM Jason Gustafson
> >> 
> >> > wrote:
> >> >
> >> >> Hi Philip,
> >> >>
> >> >>> Having an overall timeout also seems reasonable, but I wonder what
> >> should
> >> >> the client do after running out of the time? Should we throw a
> >> >> non-retriable exception (instead of TimeoutExceptoin to stop the client
> >> >> from retrying) and alert the user to examine the config and the DNS
> >> server?
> >> >>
> >> >> Yeah, not sure exactly. I'd probably suggest a
> >> >> `BootstrapConnectionException` or something like that with a clear
> >> message
> >> >> indicating the problem. What the user does with it is up to them, but
> >> at
> >> >> least it gives them the option to fail their application if that is
> >> what
> >> >> they prefer to do in this case. If they catch it and ignore it, I would
> >> >> expect the client to just continue 

[jira] [Created] (KAFKA-14884) Include check transaction is still ongoing right before append

2023-04-07 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14884:
--

 Summary: Include check transaction is still ongoing right before 
append 
 Key: KAFKA-14884
 URL: https://issues.apache.org/jira/browse/KAFKA-14884
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan


Even after checking via AddPartitionsToTxn, the transaction could be aborted 
after the response. We can add one more check before appending.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14617) Replicas with stale broker epoch should not be allowed to join the ISR

2023-04-07 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-14617.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

merged all PRs to trunk.

> Replicas with stale broker epoch should not be allowed to join the ISR
> --
>
> Key: KAFKA-14617
> URL: https://issues.apache.org/jira/browse/KAFKA-14617
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Calvin Liu
>Assignee: Calvin Liu
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1743

2023-04-07 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-14883) Broker state should be "observer" in KRaft quorum

2023-04-07 Thread Paolo Patierno (Jira)
Paolo Patierno created KAFKA-14883:
--

 Summary: Broker state should be "observer" in KRaft quorum
 Key: KAFKA-14883
 URL: https://issues.apache.org/jira/browse/KAFKA-14883
 Project: Kafka
  Issue Type: Improvement
  Components: kraft, metrics
Affects Versions: 3.4.0
Reporter: Paolo Patierno
Assignee: Paolo Patierno


Currently, the `current-state` KRaft related metric reports `follower` state 
for a broker while technically it should be reported as an `observer`  as the 
`kafka-metadata-quorum` tool does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-12634) Should checkpoint after restore finished

2023-04-07 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-12634.
---
Resolution: Fixed

> Should checkpoint after restore finished
> 
>
> Key: KAFKA-12634
> URL: https://issues.apache.org/jira/browse/KAFKA-12634
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: Matthias J. Sax
>Assignee: Philip Nee
>Priority: Critical
>  Labels: new-streams-runtime-should-fix, newbie++
> Fix For: 3.5.0
>
>
> For state stores, Kafka Streams maintains local checkpoint files to track the 
> offsets of the state store changelog topics. The checkpoint is updated on 
> commit or when a task is closed cleanly.
> However, after a successful restore, the checkpoint is not written. Thus, if 
> an instance crashes after restore but before committing, even if the state is 
> on local disk the checkpoint file is missing (indicating that there is no 
> state) and thus state would be restored from scratch.
> While for most cases, the time between restore end and next commit is small, 
> there are cases when this time could be large, for example if there is no new 
> input data to be processed (if there is no input data, the commit would be 
> skipped).
> Thus, we should write the checkpoint file after a successful restore to close 
> this gap (or course, only for at-least-once processing).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)