[jira] [Created] (KAFKA-16857) Zookeeper - Add new ZNodes

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16857:
-

 Summary: Zookeeper - Add new ZNodes
 Key: KAFKA-16857
 URL: https://issues.apache.org/jira/browse/KAFKA-16857
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Additional information needs to be stored in new ZNodes as part of disablement. 
Ensure that said information makes it into Zookeeper.
{code:java}
/brokers/topics/{topic-name}/partitions
/tieredstorage/  /tiered_epoch  
/state {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16856) Zookeeper - Add new exception

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16856:
-

 Summary: Zookeeper - Add new exception
 Key: KAFKA-16856
 URL: https://issues.apache.org/jira/browse/KAFKA-16856
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Add TIERED_STORAGE_DISABLEMENT_IN_PROGRESS exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16855) KRaft - Wire replaying a TopicRecord

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16855:
-

 Summary: KRaft - Wire replaying a TopicRecord
 Key: KAFKA-16855
 URL: https://issues.apache.org/jira/browse/KAFKA-16855
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Replaying a TopicRecord containing a new TieredEpoch and TieredState needs to 
interact with the two thread pools in the RemoteLogManager to add/remove the 
correct tasks from each



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16854) Zookeeper - Add v5 of StopReplica

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16854:
-

 Summary: Zookeeper - Add v5 of StopReplica
 Key: KAFKA-16854
 URL: https://issues.apache.org/jira/browse/KAFKA-16854
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16853) Split RemoteLogManagerScheduledThreadPool

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16853:
-

 Summary: Split RemoteLogManagerScheduledThreadPool
 Key: KAFKA-16853
 URL: https://issues.apache.org/jira/browse/KAFKA-16853
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

To begin with create just the RemoteDataExpirationThreadPool and move 
expiration to it. Keep all settings as if the only thread pool was the 
RemoteLogManagerScheduledThreadPool. Ensure that the new thread pool is wired 
correctly to the RemoteLogManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16851) Add remote.log.disable.policy

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16851:
-

 Summary: Add remote.log.disable.policy
 Key: KAFKA-16851
 URL: https://issues.apache.org/jira/browse/KAFKA-16851
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Add the configuration as internal-only to begin with. Do not wire it to 
anything yet, just ensure that it is settable dynamically



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16852) Add *.thread.pool.size

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16852:
-

 Summary: Add *.thread.pool.size
 Key: KAFKA-16852
 URL: https://issues.apache.org/jira/browse/KAFKA-16852
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


*Summary*

Add the remote.log.manager.copier.thread.pool.size and 
remote.log.manager.expiration.thread.pool.size configurations as internal-only



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16850) KRaft - Add v2 of TopicRecord

2024-05-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16850:
-

 Summary: KRaft - Add v2 of TopicRecord
 Key: KAFKA-16850
 URL: https://issues.apache.org/jira/browse/KAFKA-16850
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-24 Thread Christo Lolov
Hello!

I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping and
Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews!

Realistically, I don't think I have the bandwidth to get this in 3.8.0.
Due to this, I will mark tentatively the Zookeeper part for 3.9 if the
community decides that they do in fact want one more 3.x release.
I will mark the KRaft part as ready to be started and aiming for either 4.0
or 3.9.

Best,
Christo


[jira] [Created] (KAFKA-16790) Calls to RemoteLogManager are made before it is configured

2024-05-17 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16790:
-

 Summary: Calls to RemoteLogManager are made before it is configured
 Key: KAFKA-16790
 URL: https://issues.apache.org/jira/browse/KAFKA-16790
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.8.0
Reporter: Christo Lolov


BrokerMetadataPublisher#onMetadataUpdate calls ReplicaManager#applyDelta (1) 
which in turn calls RemoteLogManager#onLeadershipChange (2), however, the 
RemoteLogManager is configured after the BrokerMetadataPublisher starts running 
(3, 4). This is incorrect, we either need to initialise the RemoteLogManager 
before we start the BrokerMetadataPublisher or we need to skip calls to 
onLeadershipChange if the RemoteLogManager is not initialised.

(1) 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala#L151]

(2) 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2737]

(3) 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerServer.scala#L432]

(4) 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerServer.scala#L515]

The way to reproduce the problem is by looking at the following branch 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-950: Tiered Storage Disablement

2024-05-14 Thread Christo Lolov
Heya!

I would like to start a vote on KIP-950: Tiered Storage Disablement in
order to catch the last Kafka release targeting Zookeeper -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement

Best,
Christo


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-13 Thread Christo Lolov
Heya!

re Kamal - Okay, I believe I understand what you mean and I agree. I have
made the following change

```

During tiered storage disablement, when RemoteLogManager#stopPartition() is
called:

   - Tasks scheduled for the topic-partitions in the
   RemoteStorageCopierThreadPool will be canceled.
   - If the disablement policy is retain, scheduled tasks for the
   topic-partitions in the RemoteDataExpirationThreadPool will remain
   unchanged.
   - If the disablement policy is delete, we will first advance the log
   start offset and we will let tasks scheduled for the topic-partitions in
   the RemoteDataExpirationThreadPool to successfully delete all remote
   segments before the log start offset and then unregister themselves.

```

re Luke - I checked once again. As far as I understand when a broker goes
down all replicas it hosts go to OfflineReplica state in the state machine
the controller maintains. The moment the broker comes back up again the
state machine resends StopReplica based on
```

* OfflineReplica -> ReplicaDeletionStarted
* --send StopReplicaRequest to the replica (with deletion)

```
from ReplicaStateMachine.scala. So I was wrong and you are right, we do not
appear to be sending constant requests today. I believe it is safe for us
to follow a similar approach i.e. if a replica comes online again we resend
the StopReplica.

If you don't notice any more problems I will aim to start a VOTE tomorrow
so we can get at least part of this KIP in 3.8.

Best,
Christo

On Fri, 10 May 2024 at 11:11, Luke Chen  wrote:

> Hi Christo,
>
> > 1. I am not certain I follow the question. From DISABLED you can only go
> to
> ENABLED regardless of whether your cluster is backed by Zookeeper or KRaft.
> Am I misunderstanding your point?
>
> Yes, you're right.
>
> > 4. I was thinking that if there is a mismatch we will just fail accepting
> the request for disablement. This should be the same in both Zookeeper and
> KRaft. Or am I misunderstanding your question?
>
> OK, sounds good.
>
> > 6. I think my current train of thought is that there will be unlimited
> retries until all brokers respond in a similar way to how deletion of a
> topic works today in ZK. In the meantime the state will continue to be
> DISABLING. Do you have a better suggestion?
>
> I don't think infinite retries is a good idea since if a broker is down
> forever, this request will never complete.
> You mentioned the existing topic deletion is using the similar pattern, how
> does it handle this issue?
>
> Thanks.
> Luke
>
> On Thu, May 9, 2024 at 9:21 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > re: Luke
> >
> > 1. I am not certain I follow the question. From DISABLED you can only go
> to
> > ENABLED regardless of whether your cluster is backed by Zookeeper or
> KRaft.
> > Am I misunderstanding your point?
> >
> > 2. Apologies, this was a leftover from previous versions. I have updated
> > the Zookeeper section. The steps ought to be: controller receives change,
> > commits necessary data to Zookeeper, enqueues disablement and starts
> > sending StopReplicas request to brokers; brokers receive StopReplicas and
> > propagate them all the way to RemoteLogManager#stopPartitions which takes
> > care of the rest.
> >
> > 3. Correct, it should say DISABLED - this should now be corrected.
> >
> > 4. I was thinking that if there is a mismatch we will just fail accepting
> > the request for disablement. This should be the same in both Zookeeper
> and
> > KRaft. Or am I misunderstanding your question?
> >
> > 5. Yeah. I am now doing a second pass on all diagrams and will update
> them
> > by the end of the day!
> >
> > 6. I think my current train of thought is that there will be unlimited
> > retries until all brokers respond in a similar way to how deletion of a
> > topic works today in ZK. In the meantime the state will continue to be
> > DISABLING. Do you have a better suggestion?
> >
> > re: Kamal
> >
> > Yep, I will update all diagrams
> >
> > I am not certain I follow the reasoning for making retain and delete the
> > same. Deletion when the policy is retain happens asynchronously due to
> > expiration. I think that deletion when the policy is delete ought to (at
> > least for the initial implementation) happen synchronously. Should people
> > run into timeout problems we can always then have a follow-up KIP where
> we
> > make it asynchronous.
> >
> > Best,
> > Christo
> >
> > On Tue, 7 May 2024 at 10:04, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi Christo,
> > >
> > > Thanks for 

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-09 Thread Christo Lolov
hen  wrote:
> >>
> >>> Also, I think using `stopReplicas` request is a good idea because it
> >>> won't cause any problems while migrating to KRaft mode.
> >>> The stopReplicas request is one of the request that KRaft controller
> >>> will send to ZK brokers during migration.
> >>>
> >>> Thanks.
> >>> Luke
> >>>
> >>> On Fri, May 3, 2024 at 11:48 AM Luke Chen  wrote:
> >>>
> >>>> Hi Christo,
> >>>>
> >>>> Thanks for the update.
> >>>>
> >>>> Questions:
> >>>> 1. For this
> >>>> "The possible state transition from DISABLED state is to the ENABLED."
> >>>> I think it only applies for KRaft mode. In ZK mode, the possible state
> >>>> is "DISABLING", right?
> >>>>
> >>>> 2. For this:
> >>>> "If the cluster is using Zookeeper as the control plane, enabling
> >>>> remote storage for a topic triggers the controller to send this
> information
> >>>> to Zookeeper. Each broker listens for changes in Zookeeper, and when a
> >>>> change is detected, the broker triggers
> >>>> RemoteLogManager#onLeadershipChange()."
> >>>>
> >>>> I think the way ZK brokers knows the leadership change is by getting
> >>>> the LeaderAndISRRequeset from the controller, not listening for
> changes in
> >>>> ZK.
> >>>>
> >>>> 3. In the KRaft handler steps, you said:
> >>>> "The controller also updates the Topic metadata to increment the
> >>>> tiered_epoch and update the tiered_stateto DISABLING state."
> >>>>
> >>>> Should it be "DISABLED" state since it's KRaft mode?
> >>>>
> >>>> 4. I was thinking how we handle the tiered_epoch not match error.
> >>>> For ZK, I think the controller won't write any data into ZK Znode,
> >>>> For KRaft, either configRecord or updateTopicMetadata records won't be
> >>>> written.
> >>>> Is that right? Because the current workflow makes me think there will
> >>>> be partial data updated in ZK/KRaft when tiered_epoch error.
> >>>>
> >>>> 5. Since we changed to use stopReplicas (V5) request now, the diagram
> >>>> for ZK workflow might also need to update.
> >>>>
> >>>> 6. In ZK mode, what will the controller do if the "stopReplicas"
> >>>> responses not received from all brokers? Reverting the changes?
> >>>> This won't happen in KRaft mode because it's broker's responsibility
> to
> >>>> fetch metadata update from controller.
> >>>>
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>>
> >>>> On Fri, Apr 19, 2024 at 10:23 PM Christo Lolov <
> christolo...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Heya all!
> >>>>>
> >>>>> I have updated KIP-950. A list of what I have updated is:
> >>>>>
> >>>>> * Explicitly state that Zookeeper-backed clusters will have ENABLED
> ->
> >>>>> DISABLING -> DISABLED while KRaft-backed clusters will only have
> ENABLED ->
> >>>>> DISABLED
> >>>>> * Added two configurations for the new thread pools and explained
> >>>>> where values will be picked-up mid Kafka version upgrade
> >>>>> * Explained how leftover remote partitions will be scheduled for
> >>>>> deletion
> >>>>> * Updated the API to use StopReplica V5 rather than a whole new
> >>>>> controller-to-broker API
> >>>>> * Explained that the disablement procedure will be triggered by the
> >>>>> controller listening for an (Incremental)AlterConfig change
> >>>>> * Explained that we will first move log start offset and then issue a
> >>>>> deletion
> >>>>> * Went into more details that changing remote.log.disable.policy
> after
> >>>>> disablement won't do anything and that if a customer would like
> additional
> >>>>> data deleted they would have to use already existing methods
> >>>>>
> >>>>> Let me know if there are any new comments or I have missed something!
> >

Re: [VOTE] KIP-1018: Introduce max remote fetch timeout config

2024-05-09 Thread Christo Lolov
Heya Kamal,

Thanks for the KIP and the answers in the discussion!

+1 from me :)

Best,
Christo

On Thu, 9 May 2024 at 11:11, Federico Valeri  wrote:

> +1 non binding
>
> Thanks
>
> On Thu, May 9, 2024 at 12:05 PM Luke Chen  wrote:
> >
> > Hi Kamal,
> >
> > Thanks for the KIP!
> > +1 from me.
> >
> > Thanks.
> > Luke
> >
> > On Mon, May 6, 2024 at 5:03 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > We would like to start a voting thread for KIP-1018: Introduce
> > > max remote fetch timeout config for DelayedRemoteFetch requests.
> > >
> > > The KIP is available on
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> > >
> > > If you have any suggestions, feel free to participate in the discussion
> > > thread:
> > > https://lists.apache.org/thread/9x21hzpxzmrt7xo4vozl17d70fkg3chk
> > >
> > > --
> > > Kamal
> > >
>


Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-04-29 Thread Christo Lolov
Heya!

Is it difficult to instead add the metric at
kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other
name=*)? Alternatively, if it is difficult to add it there, is it possible
to add 2 metrics, one at the RequestMetrics level (even if it is
total-time-ms - (all other times)) and one at what you are proposing? As an
operator I would find it strange to not see the metric in the
RequestMetrics.

Your thoughts?

Best,
Christo

On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Christo,
>
> Updated the KIP with the remote fetch latency metric. Please take another
> look!
>
> --
> Kamal
>
> On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Federico,
> >
> > Thanks for the suggestion! Updated the config name to "
> > remote.fetch.max.wait.ms".
> >
> > Christo,
> >
> > Good point. We don't have the remote-read latency metrics to measure the
> > performance of the remote read requests. I'll update the KIP to emit this
> > metric.
> >
> > --
> > Kamal
> >
> >
> > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri 
> > wrote:
> >
> >> Hi Kamal, it looks like all TS configurations starts with "remote."
> >> prefix, so I was wondering if we should name it
> >> "remote.fetch.max.wait.ms".
> >>
> >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash
> >>  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > If there are no more comments, I'll start a vote thread by tomorrow.
> >> > Please review the KIP.
> >> >
> >> > Thanks,
> >> > Kamal
> >> >
> >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash <
> >> > kamal.chandraprak...@gmail.com> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > Bumping the thread. Please review this KIP. Thanks!
> >> > >
> >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash <
> >> > > kamal.chandraprak...@gmail.com> wrote:
> >> > >
> >> > >> Hi Jorge,
> >> > >>
> >> > >> Thanks for the review! Added your suggestions to the KIP. PTAL.
> >> > >>
> >> > >> The `fetch.max.wait.ms` config will be also applicable for topics
> >> > >> enabled with remote storage.
> >> > >> Updated the description to:
> >> > >>
> >> > >> ```
> >> > >> The maximum amount of time the server will block before answering
> the
> >> > >> fetch request
> >> > >> when it is reading near to the tail of the partition
> >> (high-watermark) and
> >> > >> there isn't
> >> > >> sufficient data to immediately satisfy the requirement given by
> >> > >> fetch.min.bytes.
> >> > >> ```
> >> > >>
> >> > >> --
> >> > >> Kamal
> >> > >>
> >> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate Otoya <
> >> > >> quilcate.jo...@gmail.com> wrote:
> >> > >>
> >> > >>> Hi Kamal,
> >> > >>>
> >> > >>> Thanks for this KIP! It should help to solve one of the main
> issues
> >> with
> >> > >>> tiered storage at the moment that is dealing with individual
> >> consumer
> >> > >>> configurations to avoid flooding logs with interrupted exceptions.
> >> > >>>
> >> > >>> One of the topics discussed in [1][2] was on the semantics of `
> >> > >>> fetch.max.wait.ms` and how it's affected by remote storage.
> Should
> >> we
> >> > >>> consider within this KIP the update of `fetch.max.wail.ms` docs
> to
> >> > >>> clarify
> >> > >>> it only applies to local storage?
> >> > >>>
> >> > >>> Otherwise, LGTM -- looking forward to see this KIP adopted.
> >> > >>>
> >> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-15776
> >> > >>> [2]
> >> https://github.com/apache/kafka/pull/14778#issuecomment-1820588080
> >> > >>>
> >> > >>> On Tue, 30 Jan 2024 at 01:01, Kamal Chandraprakash <
> >> > >>> kamal.chandraprak...@gmail.com> wrote:
> >> > >>>
> >> > >>> > Hi all,
> >> > >>> >
> >> > >>> > I have opened a KIP-1018
> >> > >>> > <
> >> > >>> >
> >> > >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> >> > >>> > >
> >> > >>> > to introduce dynamic max-remote-fetch-timeout broker config to
> >> give
> >> > >>> more
> >> > >>> > control to the operator.
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> >> > >>> >
> >> > >>> > Let me know if you have any feedback or suggestions.
> >> > >>> >
> >> > >>> > --
> >> > >>> > Kamal
> >> > >>> >
> >> > >>>
> >> > >>
> >>
> >
>


Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-04-28 Thread Christo Lolov
Heya Kamal,

I quite like the proposal and would support it!

However, today I don't think we have a metric which shows the latency of
fetch requests which are served from remote, am I wrong?
I looked at both
https://github.com/clolov/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L521-L527
and https://kafka.apache.org/documentation/#tiered_storage_monitoring.
If I am right, then I believe it would be very useful if this KIP also
introduces such a metric because the two are tightly coupled.

What do you think?

Best,
Christo

On Sat, 27 Apr 2024 at 11:33, Federico Valeri  wrote:

> Hi Kamal, it looks like all TS configurations starts with "remote."
> prefix, so I was wondering if we should name it
> "remote.fetch.max.wait.ms".
>
> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash
>  wrote:
> >
> > Hi all,
> >
> > If there are no more comments, I'll start a vote thread by tomorrow.
> > Please review the KIP.
> >
> > Thanks,
> > Kamal
> >
> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Bumping the thread. Please review this KIP. Thanks!
> > >
> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > >> Hi Jorge,
> > >>
> > >> Thanks for the review! Added your suggestions to the KIP. PTAL.
> > >>
> > >> The `fetch.max.wait.ms` config will be also applicable for topics
> > >> enabled with remote storage.
> > >> Updated the description to:
> > >>
> > >> ```
> > >> The maximum amount of time the server will block before answering the
> > >> fetch request
> > >> when it is reading near to the tail of the partition (high-watermark)
> and
> > >> there isn't
> > >> sufficient data to immediately satisfy the requirement given by
> > >> fetch.min.bytes.
> > >> ```
> > >>
> > >> --
> > >> Kamal
> > >>
> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate Otoya <
> > >> quilcate.jo...@gmail.com> wrote:
> > >>
> > >>> Hi Kamal,
> > >>>
> > >>> Thanks for this KIP! It should help to solve one of the main issues
> with
> > >>> tiered storage at the moment that is dealing with individual consumer
> > >>> configurations to avoid flooding logs with interrupted exceptions.
> > >>>
> > >>> One of the topics discussed in [1][2] was on the semantics of `
> > >>> fetch.max.wait.ms` and how it's affected by remote storage. Should
> we
> > >>> consider within this KIP the update of `fetch.max.wail.ms` docs to
> > >>> clarify
> > >>> it only applies to local storage?
> > >>>
> > >>> Otherwise, LGTM -- looking forward to see this KIP adopted.
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-15776
> > >>> [2]
> https://github.com/apache/kafka/pull/14778#issuecomment-1820588080
> > >>>
> > >>> On Tue, 30 Jan 2024 at 01:01, Kamal Chandraprakash <
> > >>> kamal.chandraprak...@gmail.com> wrote:
> > >>>
> > >>> > Hi all,
> > >>> >
> > >>> > I have opened a KIP-1018
> > >>> > <
> > >>> >
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> > >>> > >
> > >>> > to introduce dynamic max-remote-fetch-timeout broker config to give
> > >>> more
> > >>> > control to the operator.
> > >>> >
> > >>> >
> > >>> >
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> > >>> >
> > >>> > Let me know if you have any feedback or suggestions.
> > >>> >
> > >>> > --
> > >>> > Kamal
> > >>> >
> > >>>
> > >>
>


Re: [VOTE] KIP-1023: Follower fetch from tiered offset

2024-04-26 Thread Christo Lolov
Heya Abhijeet,

Thanks a lot for pushing this forward, especially with the explanation of
EARLIEST_PENDING_UPLOAD_OFFSET_TIMESTAMP!
+1 from me :)

Best,
Christo

On Fri, 26 Apr 2024 at 12:50, Luke Chen  wrote:

> Hi Abhijeet,
>
> Thanks for the KIP.
> +1 from me.
>
> Thanks.
> Luke
>
> On Fri, Apr 26, 2024 at 5:41 PM Omnia Ibrahim 
> wrote:
>
> > Thanks for the KIP. +1 non-binding from me
> >
> > > On 26 Apr 2024, at 06:29, Abhijeet Kumar 
> > wrote:
> > >
> > > Hi All,
> > >
> > > I would like to start the vote for KIP-1023 - Follower fetch from
> tiered
> > > offset
> > >
> > > The KIP is here:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > >
> > > Regards.
> > > Abhijeet.
> >
> >
>


Re: [ANNOUNCE] New committer: Igor Soarez

2024-04-25 Thread Christo Lolov
Congratulations Igor :) !

On Thu, 25 Apr 2024 at 17:07, Igor Soarez  wrote:

> Thanks everyone, I'm very honoured to join!
>
> --
> Igor
>


Re: [VOTE] KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission

2024-04-22 Thread Christo Lolov
Heya Nikhil,

Thanks for the proposal, as mentioned before it makes sense to me!

+1 (binding)

Best,
Christo

On Sat, 20 Apr 2024 at 00:25, Justine Olshan 
wrote:

> Hey Nikhil,
>
> I meant to comment on the discussion thread, but my draft took so long, you
> opened the vote.
>
> Regardless, I just wanted to say that it makes sense to me. +1 (binding)
>
> Justine
>
> On Fri, Apr 19, 2024 at 7:22 AM Nikhil Ramakrishnan <
> ramakrishnan.nik...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I would like to start a voting thread for KIP-1037: Allow
> > WriteTxnMarkers API with Alter Cluster Permission
> > (
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1037%3A+Allow+WriteTxnMarkers+API+with+Alter+Cluster+Permission
> > )
> > as there have been no objections on the discussion thread.
> >
> > For comments or feedback please check the discussion thread here:
> > https://lists.apache.org/thread/bbkyt8mrc8xp3jfyvhph7oqtjxl29xmn
> >
> > Thanks,
> > Nikhil
> >
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-04-19 Thread Christo Lolov
Heya all!

I have updated KIP-950. A list of what I have updated is:

* Explicitly state that Zookeeper-backed clusters will have ENABLED ->
DISABLING -> DISABLED while KRaft-backed clusters will only have ENABLED ->
DISABLED
* Added two configurations for the new thread pools and explained where
values will be picked-up mid Kafka version upgrade
* Explained how leftover remote partitions will be scheduled for deletion
* Updated the API to use StopReplica V5 rather than a whole new
controller-to-broker API
* Explained that the disablement procedure will be triggered by the
controller listening for an (Incremental)AlterConfig change
* Explained that we will first move log start offset and then issue a
deletion
* Went into more details that changing remote.log.disable.policy after
disablement won't do anything and that if a customer would like additional
data deleted they would have to use already existing methods

Let me know if there are any new comments or I have missed something!

Best,
Christo

On Mon, 15 Apr 2024 at 12:40, Christo Lolov  wrote:

> Heya Doguscan,
>
> I believe that the state of the world after this KIP will be the following:
>
> For Zookeeper-backed clusters there will be 3 states: ENABLED, DISABLING
> and DISABLED. We want this because Zookeeper-backed clusters will await a
> confirmation from the brokers that they have indeed stopped tiered-related
> operations on the topic.
>
> For KRaft-backed clusters there will be only 2 states: ENABLED and
> DISABLED. KRaft takes a fire-and-forget approach for topic deletion. I
> believe the same approach ought to be taken for tiered topics. The
> mechanism which will ensure that leftover state in remote due to failures
> is cleaned up to me is the retention mechanism. In today's code, a leader
> deletes all segments it finds in remote with offsets below the log start
> offset. I believe this will be good enough for cleaning up leftover state
> in remote due to failures.
>
> I know that quite a few changes have been discussed so I will aim to put
> them on paper in the upcoming days and let everyone know!
>
> Best,
> Christo
>
> On Tue, 9 Apr 2024 at 14:49, Doğuşcan Namal 
> wrote:
>
>> +1 let's not introduce a new api and mark it immediately as deprecated :)
>>
>> On your second comment Luke, one thing we need to clarify is when do we
>> consider remote storage to be DISABLED for a topic?
>> Particularly, what is the state when the remote storage is being deleted
>> in case of disablement.policy=delete? Is it DISABLING or DISABLED?
>>
>> If we move directly to the DISABLED state,
>>
>> a) in case of failures, the leaders should continue remote storage
>> deletion even if the topic is moved to the DISABLED state, otherwise we
>> risk having stray data on remote storage.
>> b) on each restart, we should initiate the remote storage deletion
>> because although we replayed a record with a DISABLED state, we can not be
>> sure if the remote data is deleted or not.
>>
>> We could either consider keeping the remote topic in DISABLING state
>> until all of the remote storage data is deleted, or we need an additional
>> mechanism to handle the remote stray data.
>>
>> The existing topic deletion, for instance, handles stray logs on disk by
>> detecting them on KafkaBroker startup and deleting before the
>> ReplicaManager is started.
>> Maybe we need a similar mechanism here as well if we don't want a
>> DISABLING state. Otherwise, we need a callback from Brokers to validate
>> that remote storage data is deleted and now we could move to the DISABLED
>> state.
>>
>> Thanks.
>>
>> On Tue, 9 Apr 2024 at 12:45, Luke Chen  wrote:
>>
>>> Hi Christo,
>>>
>>> > I would then opt for moving information from DisableRemoteTopic
>>> within the StopReplicas API which will then disappear in KRaft world as
>>> it
>>> is already scheduled for deprecation. What do you think?
>>>
>>> Sounds good to me.
>>>
>>> Thanks.
>>> Luke
>>>
>>> On Tue, Apr 9, 2024 at 6:46 PM Christo Lolov 
>>> wrote:
>>>
>>> > Heya Luke!
>>> >
>>> > I thought a bit more about it and I reached the same conclusion as you
>>> for
>>> > 2 as a follow-up from 1. In other words, in KRaft world I don't think
>>> the
>>> > controller needs to wait for acknowledgements for the brokers. All we
>>> care
>>> > about is that the leader (who is responsible for archiving/deleting
>>> data in
>>> > tiered storage) knows about the change and applies it properly. If

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-04-15 Thread Christo Lolov
Heya Doguscan,

I believe that the state of the world after this KIP will be the following:

For Zookeeper-backed clusters there will be 3 states: ENABLED, DISABLING
and DISABLED. We want this because Zookeeper-backed clusters will await a
confirmation from the brokers that they have indeed stopped tiered-related
operations on the topic.

For KRaft-backed clusters there will be only 2 states: ENABLED and
DISABLED. KRaft takes a fire-and-forget approach for topic deletion. I
believe the same approach ought to be taken for tiered topics. The
mechanism which will ensure that leftover state in remote due to failures
is cleaned up to me is the retention mechanism. In today's code, a leader
deletes all segments it finds in remote with offsets below the log start
offset. I believe this will be good enough for cleaning up leftover state
in remote due to failures.

I know that quite a few changes have been discussed so I will aim to put
them on paper in the upcoming days and let everyone know!

Best,
Christo

On Tue, 9 Apr 2024 at 14:49, Doğuşcan Namal 
wrote:

> +1 let's not introduce a new api and mark it immediately as deprecated :)
>
> On your second comment Luke, one thing we need to clarify is when do we
> consider remote storage to be DISABLED for a topic?
> Particularly, what is the state when the remote storage is being deleted
> in case of disablement.policy=delete? Is it DISABLING or DISABLED?
>
> If we move directly to the DISABLED state,
>
> a) in case of failures, the leaders should continue remote storage
> deletion even if the topic is moved to the DISABLED state, otherwise we
> risk having stray data on remote storage.
> b) on each restart, we should initiate the remote storage deletion because
> although we replayed a record with a DISABLED state, we can not be sure if
> the remote data is deleted or not.
>
> We could either consider keeping the remote topic in DISABLING state until
> all of the remote storage data is deleted, or we need an additional
> mechanism to handle the remote stray data.
>
> The existing topic deletion, for instance, handles stray logs on disk by
> detecting them on KafkaBroker startup and deleting before the
> ReplicaManager is started.
> Maybe we need a similar mechanism here as well if we don't want a
> DISABLING state. Otherwise, we need a callback from Brokers to validate
> that remote storage data is deleted and now we could move to the DISABLED
> state.
>
> Thanks.
>
> On Tue, 9 Apr 2024 at 12:45, Luke Chen  wrote:
>
>> Hi Christo,
>>
>> > I would then opt for moving information from DisableRemoteTopic
>> within the StopReplicas API which will then disappear in KRaft world as it
>> is already scheduled for deprecation. What do you think?
>>
>> Sounds good to me.
>>
>> Thanks.
>> Luke
>>
>> On Tue, Apr 9, 2024 at 6:46 PM Christo Lolov 
>> wrote:
>>
>> > Heya Luke!
>> >
>> > I thought a bit more about it and I reached the same conclusion as you
>> for
>> > 2 as a follow-up from 1. In other words, in KRaft world I don't think
>> the
>> > controller needs to wait for acknowledgements for the brokers. All we
>> care
>> > about is that the leader (who is responsible for archiving/deleting
>> data in
>> > tiered storage) knows about the change and applies it properly. If
>> there is
>> > a leadership change halfway through the operation then the new leader
>> still
>> > needs to apply the message from the state topic and we know that a
>> > disable-message will be applied before a reenablement-message. I will
>> > change the KIP later today/tomorrow morning to reflect this reasoning.
>> >
>> > However, with this I believe that introducing a new API just for
>> > Zookeeper-based clusters (i.e. DisableRemoteTopic) becomes a bit of an
>> > overkill. I would then opt for moving information from
>> DisableRemoteTopic
>> > within the StopReplicas API which will then disappear in KRaft world as
>> it
>> > is already scheduled for deprecation. What do you think?
>> >
>> > Best,
>> > Christo
>> >
>> > On Wed, 3 Apr 2024 at 07:59, Luke Chen  wrote:
>> >
>> > > Hi Christo,
>> > >
>> > > 1. I agree with Doguscan that in KRaft mode, the controller won't send
>> > RPCs
>> > > to the brokers (except in the migration path).
>> > > So, I think we could adopt the similar way we did to
>> > `AlterReplicaLogDirs`
>> > > (
>> > > KIP-858
>> > > <
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%

Re: [DISCUSS] KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission

2024-04-15 Thread Christo Lolov
Heya Nikhil,

Thank you for raising this KIP!

Your proposal makes sense to me. In essence you are saying that the
permission required by WriteTxnMarkers should be the same as for CreateAcls
and DeleteAcls, which is reasonable. If we trust an administrator to assign
the correct permissions then we should also trust them to be able to abort
a hanging transaction.

I would support this KIP if it is put to the vote unless there are other
suggestions for improvements!

Best,
Christo

On Thu, 11 Apr 2024 at 16:48, Nikhil Ramakrishnan <
ramakrishnan.nik...@gmail.com> wrote:

> Hi everyone,
>
> I would like to start a discussion for
>
> KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1037%3A+Allow+WriteTxnMarkers+API+with+Alter+Cluster+Permission
>
> The WriteTxnMarkers API was originally used for inter-broker
> communication only. This required the ClusterAction permission on the
> Cluster resource to invoke.
>
> In KIP-664, we modified the WriteTxnMarkers API so that it could be
> invoked externally from the Kafka AdminClient to safely abort a
> hanging transaction. Such usage is more aligned with the Alter
> permission on the Cluster resource, which includes other
> administrative actions invoked from the Kafka AdminClient (i.e.
> CreateAcls and DeleteAcls). This KIP proposes allowing the
> WriteTxnMarkers API to be invoked with the Alter permission on the
> Cluster.
>
> I am looking forward to your thoughts and suggestions for improvement!
>
> Thanks,
> Nikhil
>


Re: [ANNOUNCE] New Kafka PMC Member: Greg Harris

2024-04-15 Thread Christo Lolov
Congratulations, Greg :)

On Mon, 15 Apr 2024 at 07:34, Zhisheng Zhang <31791909...@gmail.com> wrote:

> Congratulations Greg!
>
>
> Manikumar  于2024年4月15日周一 13:49写道:
>
> > Congratulations, Greg.
> >
> > On Mon, Apr 15, 2024 at 11:18 AM Bruno Cadonna 
> wrote:
> > >
> > > Congratulations, Greg!
> > >
> > > Best,
> > > Bruno
> > >
> > > On 4/15/24 7:33 AM, Claude Warren wrote:
> > > > Congrats Greg!  All the hard work paid off.
> > > >
> > > > On Mon, Apr 15, 2024 at 6:58 AM Ivan Yurchenko 
> wrote:
> > > >
> > > >> Congrats Greg!
> > > >>
> > > >> On Sun, Apr 14, 2024, at 22:51, Sophie Blee-Goldman wrote:
> > > >>> Congrats Greg! Happy to have you
> > > >>>
> > > >>> On Sun, Apr 14, 2024 at 9:26 AM Jorge Esteban Quilcate Otoya <
> > > >>> quilcate.jo...@gmail.com> wrote:
> > > >>>
> > >  Congrats, Greg!!
> > > 
> > >  On Sun 14. Apr 2024 at 15.05, Josep Prat
> > 
> > >  wrote:
> > > 
> > > > Congrats Greg!!!
> > > >
> > > >
> > > > Best,
> > > >
> > > > Josep Prat
> > > > Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> > > > +491715557497 | aiven.io
> > > > Aiven Deutschland GmbH
> > > > Alexanderufer 3-7, 10117 Berlin
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > > > On Sun, Apr 14, 2024, 12:30 Divij Vaidya <
> divijvaidy...@gmail.com>
> > >  wrote:
> > > >
> > > >> Congratulations Greg!
> > > >>
> > > >> --
> > > >> Divij Vaidya
> > > >>
> > > >>
> > > >>
> > > >> On Sun, Apr 14, 2024 at 6:39 AM Kamal Chandraprakash <
> > > >> kamal.chandraprak...@gmail.com> wrote:
> > > >>
> > > >>> Congratulations, Greg!
> > > >>>
> > > >>> On Sun, Apr 14, 2024 at 8:57 AM Yash Mayya <
> yash.ma...@gmail.com
> > > >>>
> > > > wrote:
> > > >>>
> > >  Congrats Greg!
> > > 
> > >  On Sun, 14 Apr, 2024, 05:56 Randall Hauch, 
> > >  wrote:
> > > 
> > > > Congratulations, Greg!
> > > >
> > > > On Sat, Apr 13, 2024 at 6:36 PM Luke Chen  > > >>>
> > > > wrote:
> > > >
> > > >> Congrats, Greg!
> > > >>
> > > >> On Sun, Apr 14, 2024 at 7:05 AM Viktor Somogyi-Vass
> > > >>  wrote:
> > > >>
> > > >>> Congrats Greg! :)
> > > >>>
> > > >>> On Sun, Apr 14, 2024, 00:35 Bill Bejeck <
> > > >> bbej...@gmail.com>
> > > >> wrote:
> > > >>>
> > >  Congrats Greg!
> > > 
> > >  -Bill
> > > 
> > >  On Sat, Apr 13, 2024 at 4:25 PM Boudjelda Mohamed Said
> > > >> <
> > > >>> bmsc...@gmail.com>
> > >  wrote:
> > > 
> > > > Congratulations Greg
> > > >
> > > > On Sat 13 Apr 2024 at 20:42, Chris Egerton <
> > > >>> ceger...@apache.org>
> > > >>> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Greg Harris has been a Kafka committer since July
> > > >> 2023.
> > > > He
> > > >>> has
> > > >>> remained
> > > >> very active and instructive in the community since
> > > >> becoming a
> > >  committer.
> > > >> It's my pleasure to announce that Greg is now a
> > > >> member
> > >  of
> > > >>> Kafka
> > > >> PMC.
> > > >>
> > > >> Congratulations, Greg!
> > > >>
> > > >> Chris, on behalf of the Apache Kafka PMC
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > > >
> > > >
> >
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-04-09 Thread Christo Lolov
Heya Luke!

I thought a bit more about it and I reached the same conclusion as you for
2 as a follow-up from 1. In other words, in KRaft world I don't think the
controller needs to wait for acknowledgements for the brokers. All we care
about is that the leader (who is responsible for archiving/deleting data in
tiered storage) knows about the change and applies it properly. If there is
a leadership change halfway through the operation then the new leader still
needs to apply the message from the state topic and we know that a
disable-message will be applied before a reenablement-message. I will
change the KIP later today/tomorrow morning to reflect this reasoning.

However, with this I believe that introducing a new API just for
Zookeeper-based clusters (i.e. DisableRemoteTopic) becomes a bit of an
overkill. I would then opt for moving information from DisableRemoteTopic
within the StopReplicas API which will then disappear in KRaft world as it
is already scheduled for deprecation. What do you think?

Best,
Christo

On Wed, 3 Apr 2024 at 07:59, Luke Chen  wrote:

> Hi Christo,
>
> 1. I agree with Doguscan that in KRaft mode, the controller won't send RPCs
> to the brokers (except in the migration path).
> So, I think we could adopt the similar way we did to `AlterReplicaLogDirs`
> (
> KIP-858
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft#KIP858:HandleJBODbrokerdiskfailureinKRaft-Intra-brokerreplicamovement
> >)
> that let the broker notify controller any update, instead of controller to
> broker. And once the controller receives all the complete requests from
> brokers, it'll enter "Disabled" state. WDYT?
>
> 2. Why should we wait until all brokers to respond before moving to
> "Disabled" state in "KRaft mode"?
> Currently, only the leader node does the remote log upload/fetch tasks, so
> does that mean the controller only need to make sure the leader completes
> the stopPartition?
> If during the leader node stopPartition process triggered leadership
> change, then the new leader should receive and apply the configRecord
> update before the leadership change record based on the KRaft design, which
> means there will be no gap that the follower node becomes the leader and
> starting doing unexpected upload/fetch tasks, right?
> I agree we should make sure in ZK mode, all brokers are completed the
> stopPartitions before moving to "Disabled" state because ZK node watcher is
> working in a separate thread. But not sure about KRaft mode.
>
> Thanks.
> Luke
>
>
> On Fri, Mar 29, 2024 at 4:14 PM Christo Lolov 
> wrote:
>
> > Heya everyone!
> >
> > re: Doguscan
> >
> > I believe the answer to 101 needs a bit more discussion. As far as I
> know,
> > tiered storage today has methods to update a metadata of a segment to say
> > "hey, I would like this deleted", but actual deletion is left to plugin
> > implementations (or any background cleaners). In other words, there is no
> > "immediate" deletion. In this KIP, we would like to continue doing the
> same
> > if the retention policy is set to delete. So I believe the answer is
> > actually that a) we will update the metadata of the segments to mark them
> > as deleted and b) we will advance the log start offset. Any deletion of
> > actual files will still be delegated to plugin implementations. I believe
> > this is further supported by "*remote.log.disable.policy=delete:* Logs
> that
> > are archived in the remote storage will not be part of the contiguous
> > "active" log and will be deleted asynchronously as part of the
> disablement
> > process"
> >
> > Following from the above, I believe for 102 it is fine to allow setting
> of
> > remote.log.disable.policy on a disabled topic in much the same way we
> allow
> > other remote-related configurations to be set on a topic (i.e.
> > local.retention.*) - it just won't have an effect. Granted, I do believe
> we
> > should restrict the policy being changed while a disablement is ongoing.
> >
> > re: Satish and Kamal
> >
> > 104, 1 and 2 are fair asks, I will work with Doguscan to update the KIP
> > with the information!
> >
> > Best,
> > Christo
> >
> > On Thu, 28 Mar 2024 at 10:31, Doğuşcan Namal 
> > wrote:
> >
> > > Hi Satish, I will try to answer as much as I can and the others could
> > chime
> > > in with further details.
> > >
> > >
> > >
> > >
> > >
> > > *101. For remote.log.disable.policy=delete: Does it delete the remote
> log
> 

[jira] [Created] (KAFKA-16480) ListOffsets change should have an associated API/IBP version update

2024-04-06 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16480:
-

 Summary: ListOffsets change should have an associated API/IBP 
version update
 Key: KAFKA-16480
 URL: https://issues.apache.org/jira/browse/KAFKA-16480
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov


https://issues.apache.org/jira/browse/KAFKA-16154 introduced the changes to the 
ListOffsets API to accept latest-tiered-timestamp and return the corresponding 
offset.

Those changes should have a) increased the version of the ListOffsets API b) 
increased the inter-broker protocol version c) hidden the latest version of the 
ListOffsets behind the latestVersionUnstable flag

The purpose of this task is to remedy said miss



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-05 Thread Christo Lolov
Hello Abhijeet and Jun,

I have been mulling this KIP over a bit more in recent days!

re: Jun

I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
retrospect it should have been fairly obvious. I would need to go an update
KIP-1005 myself then, thank you for giving the useful reference!

4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
fetch from latest-tiered-offset + 1 only for new replicas (or replicas
which experienced a disk failure) to decrease the time a partition spends
in under-replicated state. In other words, a follower which has just fallen
out of ISR, but has local data will continue using today's Tiered Storage
replication protocol (i.e. fetching from earliest-local). I further believe
he has taken this approach so that local state of replicas which have just
fallen out of ISR isn't forcefully wiped thus leading to situation 1.
Abhijeet, have I understood (and summarised) what you are proposing
correctly?

5. I think in today's Tiered Storage we know the leader's log-start-offset
from the FetchResponse and we can learn its local-log-start-offset from the
ListOffsets by asking for earliest-local timestamp (-4). But granted, this
ought to be added as an additional API call in the KIP.

re: Abhijeet

101. I am still a bit confused as to why you want to include a new offset
(i.e. pending-upload-offset) when you yourself mention that you could use
an already existing offset (i.e. last-tiered-offset + 1). In essence, you
end your Motivation with "In this KIP, we will focus only on the follower
fetch protocol using the *last-tiered-offset*" and then in the following
sections you talk about pending-upload-offset. I understand this might be
classified as an implementation detail, but if you introduce a new offset
(i.e. pending-upload-offset) you have to make a change to the ListOffsets
API (i.e. introduce -6) and thus document it in this KIP as such. However,
the last-tiered-offset ought to already be exposed as part of KIP-1005
(under implementation). Am I misunderstanding something here?

Best,
Christo

On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>
> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader t

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-03-29 Thread Christo Lolov
gt; > > is disabled on a topic?
> > >
> > > 103. Do we plan to add any metrics related to this feature?
> > >
> > > 104. Please add configuration details about copier thread pool,
> > > expiration thread pool and the migration of the existing
> > > RemoteLogManagerScheduledThreadPool.
> > >
> > > 105. How is the behaviour with topic or partition deletion request
> > > handled when tiered storage disablement request is still being
> > > processed on a topic?
> > >
> > > ~Satish.
> > >
> > > On Mon, 25 Mar 2024 at 13:34, Doğuşcan Namal  >
> > > wrote:
> > > >
> > > > Hi Christo and Luke,
> > > >
> > > > I think the KRaft section of the KIP requires slight improvement. The
> > > metadata propagation in KRaft is handled by the RAFT layer instead of
> > > sending Controller -> Broker RPCs. In fact, KIP-631 deprecated these
> > RPCs.
> > > >
> > > > I will come up with some recommendations on how we could improve that
> > > one but until then, @Luke please feel free to review the KIP.
> > > >
> > > > @Satish, if we want this to make it to Kafka 3.8 I believe we need to
> > > aim to get the KIP approved in the following weeks otherwise it will
> slip
> > > and we can not support it in Zookeeper mode.
> > > >
> > > > I also would like to better understand what is the community's stand
> > for
> > > adding a new feature for Zookeeper since it is marked as deprecated
> > already.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > > On Mon, 18 Mar 2024 at 13:42, Christo Lolov 
> > > wrote:
> > > >>
> > > >> Heya,
> > > >>
> > > >> I do have some time to put into this, but to be honest I am still
> > after
> > > >> reviews of the KIP itself :)
> > > >>
> > > >> After the latest changes it ought to be detailing both a Zookeeper
> > > approach
> > > >> and a KRaft approach.
> > > >>
> > > >> Do you have any thoughts on how it could be improved or should I
> > start a
> > > >> voting thread?
> > > >>
> > > >> Best,
> > > >> Christo
> > > >>
> > > >> On Thu, 14 Mar 2024 at 06:12, Luke Chen  wrote:
> > > >>
> > > >> > Hi Christo,
> > > >> >
> > > >> > Any update with this KIP?
> > > >> > If you don't have time to complete it, I can collaborate with you
> to
> > > work
> > > >> > on it.
> > > >> >
> > > >> > Thanks.
> > > >> > Luke
> > > >> >
> > > >> > On Wed, Jan 17, 2024 at 11:38 PM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Christo,
> > > >> > > Thanks for volunteering to contribute to the KIP discussion. I
> > > suggest
> > > >> > > considering this KIP for both ZK and KRaft as it will be helpful
> > for
> > > >> > > this feature to be available in 3.8.0 running with ZK clusters.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Satish.
> > > >> > >
> > > >> > > On Wed, 17 Jan 2024 at 19:04, Christo Lolov <
> > christolo...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > Hello!
> > > >> > > >
> > > >> > > > I volunteer to get this KIP moving forward and implemented in
> > > Apache
> > > >> > > Kafka
> > > >> > > > 3.8.
> > > >> > > >
> > > >> > > > I have caught up with Mehari offline and we have agreed that
> > given
> > > >> > Apache
> > > >> > > > Kafka 4.0 being around the corner we would like to propose
> this
> > > feature
> > > >> > > > only for KRaft clusters.
> > > >> > > >
> > > >> > > > Any and all reviews and comments are welcome!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Christo
> > > >> > > >
> > > >> > > > On Tue, 9 Jan 2024 at 09:44, Doğuşcan Namal <
> > > namal.dogus...@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Hi everyone, any progress on the status of this KIP? Overall
> > > looks
> > > >> > > good to
> > > >> > > > > me but I wonder whether we still need to support it for
> > > Zookeeper
> > > >> > mode
> > > >> > > > > given that it will be deprecated in the next 3 months.
> > > >> > > > >
> > > >> > > > > On 2023/07/21 20:16:46 "Beyene, Mehari" wrote:
> > > >> > > > > > Hi everyone,
> > > >> > > > > > I would like to start a discussion on KIP-950: Tiered
> > Storage
> > > >> > > Disablement
> > > >> > > > > (
> > > >> > > > >
> > > >> > > > >
> > > >> > >
> > > >> >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> > > >> > > > > ).
> > > >> > > > > >
> > > >> > > > > > This KIP proposes adding the ability to disable and
> > re-enable
> > > >> > tiered
> > > >> > > > > storage on a topic.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Mehari
> > > >> > > > > >
> > > >> > > > >
> > > >> > >
> > > >> >
> > >
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-29 Thread Christo Lolov
Heya!

First of all, thank you very much for the proposal, you have explained the
problem you want solved very well - I think a faster bootstrap of an empty
replica is definitely an improvement!

For my understanding, which concrete offset do you want the leader to give
back to a follower - earliest-pending-upload-offset or the
latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to
already be exposing that offset as part of the ListOffsets API, no?

Best,
Christo

On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar 
wrote:

> Hi All,
>
> I have created KIP-1023 to introduce follower fetch from tiered offset.
> This feature will be helpful in significantly reducing Kafka
> rebalance/rebuild times when the cluster is enabled with tiered storage.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
>
> Feedback and suggestions are welcome.
>
> Regards,
> Abhijeet.
>


Re: [ANNOUNCE] New committer: Christo Lolov

2024-03-26 Thread Christo Lolov
Thank you everyone!

It wouldn't have been possible without quite a lot of reviews and extremely
helpful inputs from you and the rest of the community! I am looking forward
to working more closely with you going forward :)

On Tue, 26 Mar 2024 at 14:31, Kirk True  wrote:

> Congratulations Christo!
>
> > On Mar 26, 2024, at 7:27 AM, Satish Duggana 
> wrote:
> >
> > Congratulations Christo!
> >
> > On Tue, 26 Mar 2024 at 19:20, Ivan Yurchenko  wrote:
> >>
> >> Congrats!
> >>
> >> On Tue, Mar 26, 2024, at 14:48, Lucas Brutschy wrote:
> >>> Congrats!
> >>>
> >>> On Tue, Mar 26, 2024 at 2:44 PM Federico Valeri 
> wrote:
> 
>  Congrats!
> 
>  On Tue, Mar 26, 2024 at 2:27 PM Mickael Maison <
> mickael.mai...@gmail.com> wrote:
> >
> > Congratulations Christo!
> >
> > On Tue, Mar 26, 2024 at 2:26 PM Chia-Ping Tsai 
> wrote:
> >>
> >> Congrats Christo!
> >>
> >> Chia-Ping
> >>>
>
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-03-18 Thread Christo Lolov
Heya,

I do have some time to put into this, but to be honest I am still after
reviews of the KIP itself :)

After the latest changes it ought to be detailing both a Zookeeper approach
and a KRaft approach.

Do you have any thoughts on how it could be improved or should I start a
voting thread?

Best,
Christo

On Thu, 14 Mar 2024 at 06:12, Luke Chen  wrote:

> Hi Christo,
>
> Any update with this KIP?
> If you don't have time to complete it, I can collaborate with you to work
> on it.
>
> Thanks.
> Luke
>
> On Wed, Jan 17, 2024 at 11:38 PM Satish Duggana 
> wrote:
>
> > Hi Christo,
> > Thanks for volunteering to contribute to the KIP discussion. I suggest
> > considering this KIP for both ZK and KRaft as it will be helpful for
> > this feature to be available in 3.8.0 running with ZK clusters.
> >
> > Thanks,
> > Satish.
> >
> > On Wed, 17 Jan 2024 at 19:04, Christo Lolov 
> > wrote:
> > >
> > > Hello!
> > >
> > > I volunteer to get this KIP moving forward and implemented in Apache
> > Kafka
> > > 3.8.
> > >
> > > I have caught up with Mehari offline and we have agreed that given
> Apache
> > > Kafka 4.0 being around the corner we would like to propose this feature
> > > only for KRaft clusters.
> > >
> > > Any and all reviews and comments are welcome!
> > >
> > > Best,
> > > Christo
> > >
> > > On Tue, 9 Jan 2024 at 09:44, Doğuşcan Namal 
> > > wrote:
> > >
> > > > Hi everyone, any progress on the status of this KIP? Overall looks
> > good to
> > > > me but I wonder whether we still need to support it for Zookeeper
> mode
> > > > given that it will be deprecated in the next 3 months.
> > > >
> > > > On 2023/07/21 20:16:46 "Beyene, Mehari" wrote:
> > > > > Hi everyone,
> > > > > I would like to start a discussion on KIP-950: Tiered Storage
> > Disablement
> > > > (
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> > > > ).
> > > > >
> > > > > This KIP proposes adding the ability to disable and re-enable
> tiered
> > > > storage on a topic.
> > > > >
> > > > > Thanks,
> > > > > Mehari
> > > > >
> > > >
> >
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-01-17 Thread Christo Lolov
Hello!

I volunteer to get this KIP moving forward and implemented in Apache Kafka
3.8.

I have caught up with Mehari offline and we have agreed that given Apache
Kafka 4.0 being around the corner we would like to propose this feature
only for KRaft clusters.

Any and all reviews and comments are welcome!

Best,
Christo

On Tue, 9 Jan 2024 at 09:44, Doğuşcan Namal 
wrote:

> Hi everyone, any progress on the status of this KIP? Overall looks good to
> me but I wonder whether we still need to support it for Zookeeper mode
> given that it will be deprecated in the next 3 months.
>
> On 2023/07/21 20:16:46 "Beyene, Mehari" wrote:
> > Hi everyone,
> > I would like to start a discussion on KIP-950: Tiered Storage Disablement
> (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> ).
> >
> > This KIP proposes adding the ability to disable and re-enable tiered
> storage on a topic.
> >
> > Thanks,
> > Mehari
> >
>


[jira] [Created] (KAFKA-16154) Make broker changes to return an offset for LATEST_TIERED_TIMESTAMP

2024-01-17 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16154:
-

 Summary: Make broker changes to return an offset for 
LATEST_TIERED_TIMESTAMP
 Key: KAFKA-16154
 URL: https://issues.apache.org/jira/browse/KAFKA-16154
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov
 Fix For: 3.8.0


A broker should start returning offsets when given a timestamp of -5, which 
signifies a LATEST_TIERED_TIMESTAMP.

There are 3 cases.

Tiered Storage is not enabled. In such a situation asking for 
LATEST_TIERED_TIMESTAMP should always return no offset.

Tiered Storage is enabled and there is nothing in remote storage. In such a 
situation the offset returned should be 0.

Tiered Storage is enabled and there is something in remote storage. In such a 
situation the offset returned should be the highest offset the broker is aware 
of.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1005: Expose EarliestLocalOffset and TieredOffset

2024-01-17 Thread Christo Lolov
Thank you everyone for casting your votes!

KIP-1005 passes with 6 +1 (three binding and three not binding) - I will
get down to implementing it :)

Best,
Christo

On Tue, 16 Jan 2024 at 01:35, Luke Chen  wrote:

> +1 binding from me.
>
> Thanks for the KIP!
> Luke
>
> On Fri, Jan 12, 2024 at 5:41 PM Federico Valeri 
> wrote:
>
> > +1 non binding
> >
> > Thanks
> >
> > On Fri, Jan 12, 2024 at 1:31 AM Boudjelda Mohamed Said
> >  wrote:
> > >
> > > +1 (binding)
> > >
> > >
> > > On Fri, Jan 12, 2024 at 1:21 AM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Thu, 11 Jan 2024 at 17:52, Divij Vaidya 
> > > > wrote:
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 26, 2023 at 7:05 AM Kamal Chandraprakash <
> > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > >
> > > > > > +1 (non-binding). Thanks for the KIP!
> > > > > >
> > > > > > --
> > > > > > Kamal
> > > > > >
> > > > > > On Thu, Dec 21, 2023 at 2:23 PM Christo Lolov <
> > christolo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Heya all!
> > > > > > >
> > > > > > > KIP-1005 (
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Expose+EarliestLocalOffset+and+TieredOffset
> > > > > > > )
> > > > > > > has been open for around a month with no further comments - I
> > would
> > > > like
> > > > > > to
> > > > > > > start a voting round on it!
> > > > > > >
> > > > > > > Best,
> > > > > > > Christo
> > > > > > >
> > > > > >
> > > >
> >
>


[jira] [Resolved] (KAFKA-15734) KRaft support in BaseConsumerTest

2024-01-16 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15734.
---
Resolution: Fixed

> KRaft support in BaseConsumerTest
> -
>
> Key: KAFKA-15734
> URL: https://issues.apache.org/jira/browse/KAFKA-15734
> Project: Kafka
>  Issue Type: Task
>  Components: core
>Reporter: Sameer Tejani
>Assignee: Sushant Mahajan
>Priority: Minor
>  Labels: kraft, kraft-test, newbie
>
> The following tests in BaseConsumerTest in 
> core/src/test/scala/integration/kafka/api/BaseConsumerTest.scala need to be 
> updated to support KRaft
> 38 : def testSimpleConsumption(): Unit = {
> 57 : def testClusterResourceListener(): Unit = {
> 78 : def testCoordinatorFailover(): Unit = {
> Scanned 125 lines. Found 0 KRaft tests out of 3 tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2024-01-15 Thread Christo Lolov
Heya!

Okay, your suggestion also makes sense to me!

I have updated the KIP.

Best,
Christo

On Mon, 15 Jan 2024 at 11:51, Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the update.
> For "-4 or earliest-local" but tiered storage disabled, I agree it should
> work as requesting for as "-2 or earliest".
> For "-5 or latest-tiered" but tiered storage disabled, returning the
> earliest timestamp doesn't make sense to me.
> I'm thinking if we can return nothing, like what we did for this
> "[Note: No offset is returned, if the timestamp greater than recently
> committed record timestamp is given.]"
>
> WDYT?
>
> Thanks.
> Luke
>
> On Mon, Jan 15, 2024 at 6:46 PM Christo Lolov 
> wrote:
>
> > Heya Luke,
> >
> > Thank for the question! I have expanded in the KIP - in my opinion if -5
> > (latest-tiered) is requested when tiered storage is disabled Kafka should
> > return -2. My reasoning is that if there is no remote storage then we
> > should be returning an offset which is within the bounds of the log. Let
> me
> > know if you disagree!
> >
> > Best,
> > Christo
> >
> > On Fri, 12 Jan 2024 at 03:43, Luke Chen  wrote:
> >
> > > Hi Christo,
> > >
> > > Thanks for the KIP!
> > > One question:
> > >
> > > What will the offset return if tiered storage is disabled?
> > > For "-4 or earliest-local", it should be the same as "-2 or earliest",
> > > right?
> > > For "-5 or latest-tiered", it will be...0?
> > >
> > > I think the result should be written in the KIP (or script help text)
> > > explicitly.
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Thu, Jan 11, 2024 at 6:54 PM Divij Vaidya 
> > > wrote:
> > >
> > > > Thank you for making the change Christo. It looks good to me.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Thu, Jan 11, 2024 at 11:19 AM Christo Lolov <
> christolo...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Thank you Divij!
> > > > >
> > > > > I have updated the KIP to explicitly state that the broker will
> have
> > a
> > > > > different behaviour when a timestamp of -5 is requested as part of
> > > > > ListOffsets.
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Tue, 2 Jan 2024 at 11:10, Divij Vaidya  >
> > > > wrote:
> > > > >
> > > > > > Thanks for the KIP Christo.
> > > > > >
> > > > > > The shell command that you mentioned calls ListOffsets API
> > > internally.
> > > > > > Hence, I believe that we would be making a public interface
> change
> > > > (and a
> > > > > > version bump) to ListOffsetsAPI as well to include -5? If yes,
> can
> > > you
> > > > > > please add that information to the change in public interfaces in
> > the
> > > > > KIP.
> > > > > >
> > > > > > --
> > > > > > Divij Vaidya
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Nov 21, 2023 at 2:19 PM Christo Lolov <
> > > christolo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Heya!
> > > > > > >
> > > > > > > Thanks a lot for this. I have updated the KIP to include
> exposing
> > > the
> > > > > > > tiered-offset as well. Let me know whether the Public
> Interfaces
> > > > > section
> > > > > > > needs more explanations regarding the changes needed to the
> > > > OffsetSpec
> > > > > or
> > > > > > > others.
> > > > > > >
> > > > > > > Best,
> > > > > > > Christo
> > > > > > >
> > > > > > > On Tue, 21 Nov 2023 at 04:20, Satish Duggana <
> > > > satish.dugg...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Christo for starting the discussion on the KIP.
> > > > > > > >

Re: [DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2024-01-15 Thread Christo Lolov
Heya Luke,

Thank for the question! I have expanded in the KIP - in my opinion if -5
(latest-tiered) is requested when tiered storage is disabled Kafka should
return -2. My reasoning is that if there is no remote storage then we
should be returning an offset which is within the bounds of the log. Let me
know if you disagree!

Best,
Christo

On Fri, 12 Jan 2024 at 03:43, Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the KIP!
> One question:
>
> What will the offset return if tiered storage is disabled?
> For "-4 or earliest-local", it should be the same as "-2 or earliest",
> right?
> For "-5 or latest-tiered", it will be...0?
>
> I think the result should be written in the KIP (or script help text)
> explicitly.
>
> Thanks.
> Luke
>
> On Thu, Jan 11, 2024 at 6:54 PM Divij Vaidya 
> wrote:
>
> > Thank you for making the change Christo. It looks good to me.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Thu, Jan 11, 2024 at 11:19 AM Christo Lolov 
> > wrote:
> >
> > > Thank you Divij!
> > >
> > > I have updated the KIP to explicitly state that the broker will have a
> > > different behaviour when a timestamp of -5 is requested as part of
> > > ListOffsets.
> > >
> > > Best,
> > > Christo
> > >
> > > On Tue, 2 Jan 2024 at 11:10, Divij Vaidya 
> > wrote:
> > >
> > > > Thanks for the KIP Christo.
> > > >
> > > > The shell command that you mentioned calls ListOffsets API
> internally.
> > > > Hence, I believe that we would be making a public interface change
> > (and a
> > > > version bump) to ListOffsetsAPI as well to include -5? If yes, can
> you
> > > > please add that information to the change in public interfaces in the
> > > KIP.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Tue, Nov 21, 2023 at 2:19 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Heya!
> > > > >
> > > > > Thanks a lot for this. I have updated the KIP to include exposing
> the
> > > > > tiered-offset as well. Let me know whether the Public Interfaces
> > > section
> > > > > needs more explanations regarding the changes needed to the
> > OffsetSpec
> > > or
> > > > > others.
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Tue, 21 Nov 2023 at 04:20, Satish Duggana <
> > satish.dugg...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Christo for starting the discussion on the KIP.
> > > > > >
> > > > > > As mentioned in KAFKA-15857[1], the goal is to add new entries
> for
> > > > > > local-log-start-offset and tierd-offset in OffsetSpec. This will
> be
> > > > > > used in AdminClient APIs and also to be added as part of
> > > > > > GetOffsetShell. This was also raised by Kamal in the earlier
> email.
> > > > > >
> > > > > > OffsetSpec related changes for these entries also need to be
> > > mentioned
> > > > > > as part of the PublicInterfaces section because these are exposed
> > to
> > > > > > users as public APIs through Admin#listOffsets() APIs[2, 3].
> > > > > >
> > > > > > Please update the KIP with the above details.
> > > > > >
> > > > > > 1. https://issues.apache.org/jira/browse/KAFKA-15857
> > > > > > 2.
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1238
> > > > > > 3.
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1226
> > > > > >
> > > > > > ~Satish.
> > > > > >
> > > > > > On Mon, 20 Nov 2023 at 18:35, Kamal Chandraprakash
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Christo,
> > > > > > >
> > > > > > > Thanks for the KIP!
> > > > > > >
> > > > > > > Similar to the earliest-local-log offset, can we also expose
> the
> > > > > > > highest-copied-remote-offset via
> > > > > > > GetOffsetShell tool? This will be useful during the debugging
> > > > session.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 20, 2023 at 5:38 PM Christo Lolov <
> > > > christolo...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello all!
> > > > > > > >
> > > > > > > > I would like to start a discussion for
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Add+EarliestLocalOffset+to+GetOffsetShell
> > > > > > > > .
> > > > > > > >
> > > > > > > > A new offset called local log start offset was introduced as
> > part
> > > > of
> > > > > > > > KIP-405: Kafka Tiered Storage. KIP-1005 aims to expose this
> > > offset
> > > > by
> > > > > > > > changing the AdminClient and in particular the GetOffsetShell
> > > tool.
> > > > > > > >
> > > > > > > > I am looking forward to your suggestions for improvement!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Christo
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2024-01-11 Thread Christo Lolov
Thank you Divij!

I have updated the KIP to explicitly state that the broker will have a
different behaviour when a timestamp of -5 is requested as part of
ListOffsets.

Best,
Christo

On Tue, 2 Jan 2024 at 11:10, Divij Vaidya  wrote:

> Thanks for the KIP Christo.
>
> The shell command that you mentioned calls ListOffsets API internally.
> Hence, I believe that we would be making a public interface change (and a
> version bump) to ListOffsetsAPI as well to include -5? If yes, can you
> please add that information to the change in public interfaces in the KIP.
>
> --
> Divij Vaidya
>
>
>
> On Tue, Nov 21, 2023 at 2:19 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > Thanks a lot for this. I have updated the KIP to include exposing the
> > tiered-offset as well. Let me know whether the Public Interfaces section
> > needs more explanations regarding the changes needed to the OffsetSpec or
> > others.
> >
> > Best,
> > Christo
> >
> > On Tue, 21 Nov 2023 at 04:20, Satish Duggana 
> > wrote:
> >
> > > Thanks Christo for starting the discussion on the KIP.
> > >
> > > As mentioned in KAFKA-15857[1], the goal is to add new entries for
> > > local-log-start-offset and tierd-offset in OffsetSpec. This will be
> > > used in AdminClient APIs and also to be added as part of
> > > GetOffsetShell. This was also raised by Kamal in the earlier email.
> > >
> > > OffsetSpec related changes for these entries also need to be mentioned
> > > as part of the PublicInterfaces section because these are exposed to
> > > users as public APIs through Admin#listOffsets() APIs[2, 3].
> > >
> > > Please update the KIP with the above details.
> > >
> > > 1. https://issues.apache.org/jira/browse/KAFKA-15857
> > > 2.
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1238
> > > 3.
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1226
> > >
> > > ~Satish.
> > >
> > > On Mon, 20 Nov 2023 at 18:35, Kamal Chandraprakash
> > >  wrote:
> > > >
> > > > Hi Christo,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > Similar to the earliest-local-log offset, can we also expose the
> > > > highest-copied-remote-offset via
> > > > GetOffsetShell tool? This will be useful during the debugging
> session.
> > > >
> > > >
> > > > On Mon, Nov 20, 2023 at 5:38 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello all!
> > > > >
> > > > > I would like to start a discussion for
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Add+EarliestLocalOffset+to+GetOffsetShell
> > > > > .
> > > > >
> > > > > A new offset called local log start offset was introduced as part
> of
> > > > > KIP-405: Kafka Tiered Storage. KIP-1005 aims to expose this offset
> by
> > > > > changing the AdminClient and in particular the GetOffsetShell tool.
> > > > >
> > > > > I am looking forward to your suggestions for improvement!
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > >
> >
>


[VOTE] KIP-1005: Expose EarliestLocalOffset and TieredOffset

2023-12-21 Thread Christo Lolov
Heya all!

KIP-1005 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Expose+EarliestLocalOffset+and+TieredOffset)
has been open for around a month with no further comments - I would like to
start a voting round on it!

Best,
Christo


Re: [VOTE] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-12-21 Thread Christo Lolov
Heya Kamal,

The proposed change makes sense to me as it will be a more explicit
behaviour than what Kafka does today - I am happy with it!

+1 (non-binding) from me

Best,
Christo

On Tue, 12 Dec 2023 at 09:01, Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi,
>
> I would like to call a vote for KIP-1007
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> >.
> This KIP aims to introduce a new error code for retriable remote storage
> errors. Thanks to everyone who reviewed the KIP!
>
> --
> Kamal
>


[jira] [Created] (KAFKA-16002) Implement RemoteCopyLagSegments, RemoteDeleteLagBytes and RemoteDeleteLagSegments

2023-12-13 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-16002:
-

 Summary: Implement RemoteCopyLagSegments, RemoteDeleteLagBytes and 
RemoteDeleteLagSegments
 Key: KAFKA-16002
 URL: https://issues.apache.org/jira/browse/KAFKA-16002
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov
 Fix For: 3.7.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-963: Additional metrics in Tiered Storage

2023-11-23 Thread Christo Lolov
Hello all,

With 3 +1 binding and 1 +1 non-binding votes KIP-963 is adopted 拾!
I will get down to implementing it.

Best,
Christo

On Tue, 21 Nov 2023 at 07:22, Luke Chen  wrote:

> +1 (binding) from me.
> Thanks for the KIP.
>
> Luke
>
> On Tue, Nov 21, 2023 at 11:53 AM Satish Duggana 
> wrote:
>
> > +1 (binding)
> > Thanks for the KIP and the discussion.
> >
> > Discussion mail thread for the KIP:
> > https://lists.apache.org/thread/40vsyc240hyody37mf2f0pn90shkzb45
> >
> >
> >
> > On Tue, 21 Nov 2023 at 05:21, Kamal Chandraprakash
> >  wrote:
> > >
> > > +1 (non-binding). Thanks for the KIP!
> > >
> > > On Tue, Nov 21, 2023, 03:04 Divij Vaidya 
> > wrote:
> > >
> > > > + 1 (binding)
> > > >
> > > > This Kip will greatly improve Tiered Storage troubleshooting. Thank
> you
> > > > Christo.
> > > >
> > > > On Mon 20. Nov 2023 at 17:21, Christo Lolov 
> > > > wrote:
> > > >
> > > > > Hello all!
> > > > >
> > > > > Now that the discussion for KIP-963 has winded down, I would like
> to
> > open
> > > > > it for a vote targeting 3.7.0 as the release. You can find the
> > current
> > > > > version of the KIP at
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > >
> >
>


[jira] [Created] (KAFKA-15883) Implement RemoteCopyLagBytes

2023-11-22 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-15883:
-

 Summary: Implement RemoteCopyLagBytes
 Key: KAFKA-15883
 URL: https://issues.apache.org/jira/browse/KAFKA-15883
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2023-11-21 Thread Christo Lolov
Heya!

Thanks a lot for this. I have updated the KIP to include exposing the
tiered-offset as well. Let me know whether the Public Interfaces section
needs more explanations regarding the changes needed to the OffsetSpec or
others.

Best,
Christo

On Tue, 21 Nov 2023 at 04:20, Satish Duggana 
wrote:

> Thanks Christo for starting the discussion on the KIP.
>
> As mentioned in KAFKA-15857[1], the goal is to add new entries for
> local-log-start-offset and tierd-offset in OffsetSpec. This will be
> used in AdminClient APIs and also to be added as part of
> GetOffsetShell. This was also raised by Kamal in the earlier email.
>
> OffsetSpec related changes for these entries also need to be mentioned
> as part of the PublicInterfaces section because these are exposed to
> users as public APIs through Admin#listOffsets() APIs[2, 3].
>
> Please update the KIP with the above details.
>
> 1. https://issues.apache.org/jira/browse/KAFKA-15857
> 2.
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1238
> 3.
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1226
>
> ~Satish.
>
> On Mon, 20 Nov 2023 at 18:35, Kamal Chandraprakash
>  wrote:
> >
> > Hi Christo,
> >
> > Thanks for the KIP!
> >
> > Similar to the earliest-local-log offset, can we also expose the
> > highest-copied-remote-offset via
> > GetOffsetShell tool? This will be useful during the debugging session.
> >
> >
> > On Mon, Nov 20, 2023 at 5:38 PM Christo Lolov 
> > wrote:
> >
> > > Hello all!
> > >
> > > I would like to start a discussion for
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Add+EarliestLocalOffset+to+GetOffsetShell
> > > .
> > >
> > > A new offset called local log start offset was introduced as part of
> > > KIP-405: Kafka Tiered Storage. KIP-1005 aims to expose this offset by
> > > changing the AdminClient and in particular the GetOffsetShell tool.
> > >
> > > I am looking forward to your suggestions for improvement!
> > >
> > > Best,
> > > Christo
> > >
>


[DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2023-11-20 Thread Christo Lolov
Hello all!

I would like to start a discussion for
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Add+EarliestLocalOffset+to+GetOffsetShell
.

A new offset called local log start offset was introduced as part of
KIP-405: Kafka Tiered Storage. KIP-1005 aims to expose this offset by
changing the AdminClient and in particular the GetOffsetShell tool.

I am looking forward to your suggestions for improvement!

Best,
Christo


[VOTE] KIP-963: Additional metrics in Tiered Storage

2023-11-20 Thread Christo Lolov
Hello all!

Now that the discussion for KIP-963 has winded down, I would like to open
it for a vote targeting 3.7.0 as the release. You can find the current
version of the KIP at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage

Best,
Christo


Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-17 Thread Christo Lolov
Heya all!

I have updated the KIP so please have another read through when you have
the time. I know we are cutting it a bit close, but I would be grateful if
I could start a vote early next week in order to get this in 3.7.

re: Satish

104. I envision that ideally we will compare this metric with
*RemoteCopyLagSegments*, yes.

re: Jorge

I have been thinking about your suggestion for RemoteDeleteBytesPerSec. The
*BytesPerSec make sense on the Copy and Read paths because there we
actually write a whole segment or read it, but on the delete path we tend
to just mark it for deletion, as such we don't really have deleted bytes
per sec. Or am I misunderstanding why you want this metric added?

I have also thought a bit more about the LocalDeleteLag and your
description. If I understand correctly you propose this metric to monitor
the segments expired due to local retention, have the .deleted suffix, but
haven't yet been actually deleted by the LogCleaner. This will serve as a
proxy for how much data which we should be serving from remote but we are
serving from local? However, I believe that the moment we add the .delete
suffix we stop serving traffic from those segments, hence we will be
serving requests for those from remote. Am I wrong?

Best,
Christo

On Thu, 16 Nov 2023 at 08:45, Satish Duggana 
wrote:

> Thanks Christo for your reply.
>
> 101 and 102 We have conclusion on them.
>
> 103. I am not strongly opinionated on this. I am fine if it is helpful
> for your scenarios.
>
> 104. It seems you want to compare this metric with the number of
> segments that are copied. Do you have such a metric now?
>
> Kamal and Luke,
> I agree some of the metrics are needed outside of RSM layer in remote
> fetch path. Can we take those fine grained remote fetch flow sequence
> metrics separately later?
>
> Thanks,
> Satish.
>
> On Tue, 14 Nov 2023 at 22:07, Christo Lolov 
> wrote:
> >
> > Heya everyone,
> >
> > Apologies for the delay in my response and thank you very much for all
> your
> > comments! I will start answering in reverse:
> >
> > *re: Satish*
> >
> > 101. I am happy to scope down this KIP and start off by emitting those
> > metrics on a topic level. I had a preference to emit them on a partition
> > level because I have ran into situations where data wasn't evenly spread
> > across partitions and not having that granularity made it harder to
> > discover.
> >
> > 102. Fair enough, others have expressed the same preference. I will scope
> > down the KIP to only bytes-based and segment-based metrics.
> >
> > 103. I agree that we could do this, but I personally prefer this to be a
> > metric. At the very least a newcomer might not know to look for the log
> > line, while most metric-collection systems allow you to explore the whole
> > namespace. For example, I really dislike that while log loading happens
> > Kafka emits log lines of "X/Y logs loaded" rather than just show me the
> > progress via a metric. If you are strongly against this, however, I am
> > happy to scope down on this as well.
> >
> > 104. Ideally we have only one metadata in remote storage for every
> segment
> > of the correct lineage. Due to leadership changes, however, this is not
> > always the case. I envisioned that exposing such a metric will showcase
> if
> > there are problems with too many metadata records not part of the correct
> > lineage of a log.
> >
> > *re: Luke*
> >
> > 1. I am a bit conflicted on this one. As discussed earlier with Jorge, in
> > my head such metrics are better left to plugin implementations. If you
> and
> > Kamal feel strongly about this being included I will add it to the KIP.
> >
> > 2. After running tiered storage in production for a while I ran into
> > problems where a partition-level metric would have allowed me to zone in
> on
> > the problem sooner. I tried balancing this with not exposing everything
> on
> > a partition level so not to explode the cardinality too much (point 101
> > from Satish). I haven't ran into a situation where knowing the
> > RemoteLogSizeComputationTime on a partition level helped me, but this
> > doesn't mean there isn't one.
> >
> > 3. I was thinking that the metric can be emitted while reading of those
> > records is happening i.e. if it takes a long time then it will just
> > gradually increase as we read. What do you think?
> >
> > *re: Jorge*
> >
> > 3.5. Sure, I will aim to add my thoughts to the KIP
> >
> > 4. Let me check and come back to you on this one. I have a vague memory
> > this wasn't as easy to calculate, but if it is, I will include
&g

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-14 Thread Christo Lolov
gt; delete at a given point in time
> >   - Ideally, this lag is zero -- grows when retention condition changes
> but
> > RLM task is not able to schedule deletion yet.
> >
> > Is my understanding of these lags correct?
> >
> > I'd like to also consider an additional lag:
> > - LocalDeleteLag: difference between: latest local candidate segment to
> > keep based on local retention - oldest local segment
> >   - Represents how many segments are still available locally when they
> are
> > candidate for deletion. This usually happens when log cleaner has not
> been
> > scheduled yet. It's important because it represents how much data is
> stored
> > locally when it could be removed, and it represents how much data that
> can
> > be sourced from remote tier is still been sourced from local tier.
> >   - Ideally, this lag returns to zero when log cleaner runs; but could be
> > growing if there are issues uploading data (other lag) or removing data
> > internally.
> >
> > Thanks,
> > Jorge.
> >
> > On Thu, 9 Nov 2023 at 10:51, Luke Chen  wrote:
> >
> > > Hi Christo,
> > >
> > > Thanks for the KIP!
> > >
> > > Some comments:
> > > 1. I agree with Kamal that a metric to cover the time taken to read
> data
> > > from remote storage is helpful.
> > >
> > > 2. I can see there are some metrics are only on topic level, but some
> are
> > > on partition level.
> > > Could you explain why some of them are only on topic level?
> > > Like RemoteLogSizeComputationTime, it's different from partition to
> > > partition, will it be better to be exposed as partition metric?
> > >
> > > 3. `RemoteLogSizeBytes` metric hanging.
> > > To compute the RemoteLogSizeBytes, we need to wait until all records
> in the
> > > metadata topic loaded.
> > > What will happen if it takes long to load the data from metadata topic?
> > > Should we instead return -1 or something to indicate it's still
> loading?
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Fri, Nov 3, 2023 at 1:53 AM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Christo,
> > > >
> > > > Thanks for expanding the scope of the KIP!  We should also cover the
> time
> > > > taken to
> > > > read data from remote storage. This will give our users a fair idea
> about
> > > > the P99, P95,
> > > > and P50 Fetch latency to read data from remote storage.
> > > >
> > > > The Fetch API request metrics currently provides a breakdown of the
> time
> > > > spent on each item:
> > > >
> > > >
> > >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L517
> > > > Should we also provide `RemoteStorageTimeMs` item (only for FETCH
> API) so
> > > > that users can
> > > > understand the overall and per-step time taken?
> > > >
> > > > Regarding the Remote deletion metrics, should we also emit a metric
> to
> > > > expose the oldest segment time?
> > > > Users can configure the topic retention either by size (or) time. If
> time
> > > > is configured, then emitting
> > > > the oldest segment time allows the user to configure an alert on top
> of
> > > it
> > > > and act accordingly.
> > > >
> > > > On Wed, Nov 1, 2023 at 7:07 PM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jo...@gmail.com> wrote:
> > > >
> > > > > Thanks, Christo!
> > > > >
> > > > > 1. Agree. Having a further look into how many latency metrics are
> > > > included
> > > > > on the broker side there are only a few of them (e.g. request
> > > lifecycle)
> > > > —
> > > > > but seems mostly delegated to clients, or plugin in this case, to
> > > measure
> > > > > this.
> > > > >
> > > > > 3.2. Personally, I find the record-based lag less useful as records
> > > can't
> > > > > be relied as a stable unit of measure. So, if we can keep bytes-
> and
> > > > > segment-based lag, LGTM.
> > > > > 3.4.  Agree, these metrics should be on the broker side. Though if
> > > plugin
> > > > > decides to take deletion as a background process, then it should
> have
> > > > i

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-30 Thread Christo Lolov
ll (e.g. 1), it
> may be ok; but if the number of segments is high, then it can be more
> relevant to operators.
> 3.3. Could we consider having the same metrics for Delete Lag as there are
> for Copy Lag? i.e. including RemoteDeleteLagBytes, and (potentially)
> RemoteDeleteLag for segments.
> 3.4. The description of delete lag is unclear to me: I though it was about
> the remote segments to be deleted (because of total retention) but not
> deleted yet; however from description it seems that it's related to local
> segments that are marked for deletion. Is this correct?
>
> 4. On Remote Delete metrics:
> - Could we also include bytes-based metric as with Copy and Fetch? t would
> be useful to know how many bytes are being deleted. If aggregated and
> compared with copied bytes, we can get a sense of the amount of data stored
> remotely, even if not exact (only considers segment size, not indexes)
>
> 5. On RemoteLogAuxState metrics: could you elaborate a bit more on the
> purpose of this component and why the metrics proposed are needed?
>
> 6. On Total Remote Log Size metrics: similarly, could you elaborate on why
> this metric is needed? I'm missing what makes this operation as relevant
> (compared to other internals) to have some metrics attached -- maybe if you
> could shared scenarios where this metrics would be useful would be helpful.
>
> 7. On the metrics naming: not sure the `Total*` prefix is really needed or
> adds meaning. When I found it useful is when there are related metric that
> are a subset, then the total prefix helps: e.g.
> `TotalProduceRequestsPerSec` and `FailedProduceRequestsPerSec`
>
> Cheers,
> Jorge.
>
>
> On Tue, 24 Oct 2023 at 12:24, Christo Lolov 
> wrote:
>
> > Hello all,
> >
> > Now that 3.6 has been released, I would like to bring back attention to
> the
> > following KIP for adding metrics to tiered storage targeting 3.7 -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Add+more+metrics+to+Tiered+Storage
> > .
> >
> > Let me know your thoughts about the list of metrics and their
> granularity!
> >
> > Best,
> > Christo
> >
> > On Fri, 13 Oct 2023 at 10:14, Christo Lolov 
> > wrote:
> >
> > > Heya Gantigmaa,
> > >
> > > Apologies for the (very) late reply!
> > >
> > > Now that 3.6 has been released and reviewers have a bit more time I
> will
> > > be picking up this KIP again. I am more than happy to add useful new
> > > metrics to the KIP, I would just ask for a couple of days to review
> your
> > > pull request and I will come back to you.
> > >
> > > Best,
> > > Christo
> > >
> > > On Mon, 25 Sept 2023 at 10:49, Gantigmaa Selenge 
> > > wrote:
> > >
> > >> Hi Christo,
> > >>
> > >> Thank you for writing the KIP.
> > >>
> > >> I recently raised a PR to add metrics for tracking remote segment
> > >> deletions
> > >> (https://github.com/apache/kafka/pull/14375) but realised those
> metrics
> > >> were not mentioned in the original KIP-405 or KIP-930. Do you think
> > these
> > >> would make sense to be added to this KIP and get included in the
> > >> discussion?
> > >>
> > >> Regards,
> > >> Gantigmaa
> > >>
> > >> On Wed, Aug 9, 2023 at 1:53 PM Christo Lolov 
> > >> wrote:
> > >>
> > >> > Heya Kamal,
> > >> >
> > >> > Thank you for going through the KIP and for the question!
> > >> >
> > >> > I have been thinking about this and as an operator I might find it
> the
> > >> most
> > >> > useful to know all three of them actually.
> > >> >
> > >> > I would find knowing the size in bytes useful to determine how much
> > >> disk I
> > >> > might need to add temporarily to compensate for the slowdown.
> > >> > I would find knowing the number of records useful, because using the
> > >> > MessagesInPerSec metric I would be able to determine how old the
> > records
> > >> > which are facing problems are.
> > >> > I would find knowing the number of segments useful because I would
> be
> > >> able
> > >> > to correlate this with whether I need to change
> > >> > *remote.log.manager.task.interval.ms
> > >> > <http://remote.log.manager.task.interval.ms> *to a lower or higher
> > >> value.
> > >

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-24 Thread Christo Lolov
Hello all,

Now that 3.6 has been released, I would like to bring back attention to the
following KIP for adding metrics to tiered storage targeting 3.7 -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Add+more+metrics+to+Tiered+Storage
.

Let me know your thoughts about the list of metrics and their granularity!

Best,
Christo

On Fri, 13 Oct 2023 at 10:14, Christo Lolov  wrote:

> Heya Gantigmaa,
>
> Apologies for the (very) late reply!
>
> Now that 3.6 has been released and reviewers have a bit more time I will
> be picking up this KIP again. I am more than happy to add useful new
> metrics to the KIP, I would just ask for a couple of days to review your
> pull request and I will come back to you.
>
> Best,
> Christo
>
> On Mon, 25 Sept 2023 at 10:49, Gantigmaa Selenge 
> wrote:
>
>> Hi Christo,
>>
>> Thank you for writing the KIP.
>>
>> I recently raised a PR to add metrics for tracking remote segment
>> deletions
>> (https://github.com/apache/kafka/pull/14375) but realised those metrics
>> were not mentioned in the original KIP-405 or KIP-930. Do you think these
>> would make sense to be added to this KIP and get included in the
>> discussion?
>>
>> Regards,
>> Gantigmaa
>>
>> On Wed, Aug 9, 2023 at 1:53 PM Christo Lolov 
>> wrote:
>>
>> > Heya Kamal,
>> >
>> > Thank you for going through the KIP and for the question!
>> >
>> > I have been thinking about this and as an operator I might find it the
>> most
>> > useful to know all three of them actually.
>> >
>> > I would find knowing the size in bytes useful to determine how much
>> disk I
>> > might need to add temporarily to compensate for the slowdown.
>> > I would find knowing the number of records useful, because using the
>> > MessagesInPerSec metric I would be able to determine how old the records
>> > which are facing problems are.
>> > I would find knowing the number of segments useful because I would be
>> able
>> > to correlate this with whether I need to change
>> > *remote.log.manager.task.interval.ms
>> > <http://remote.log.manager.task.interval.ms> *to a lower or higher
>> value.
>> >
>> > What are your thoughts on the above? Would you find some of them more
>> > useful than others?
>> >
>> > Best,
>> > Christo
>> >
>> > On Tue, 8 Aug 2023 at 16:43, Kamal Chandraprakash <
>> > kamal.chandraprak...@gmail.com> wrote:
>> >
>> > > Hi Christo,
>> > >
>> > > Thanks for the KIP!
>> > >
>> > > The proposed tiered storage metrics are useful. The unit mentioned in
>> the
>> > > KIP is the number of records.
>> > > Each topic can have varying amounts of records in a segment depending
>> on
>> > > the record size.
>> > >
>> > > Do you think having the tier-lag by number of segments (or) size of
>> > > segments in bytes will be useful
>> > > to the operator?
>> > >
>> > > Thanks,
>> > > Kamal
>> > >
>> > > On Tue, Aug 8, 2023 at 8:56 PM Christo Lolov 
>> > > wrote:
>> > >
>> > > > Hello all!
>> > > >
>> > > > I would like to start a discussion for KIP-963: Upload and delete
>> lag
>> > > > metrics in Tiered Storage (
>> > https://cwiki.apache.org/confluence/x/sZGzDw
>> > > ).
>> > > >
>> > > > The purpose of this KIP is to introduce a couple of metrics to track
>> > lag
>> > > > with respect to remote storage from the point of view of Kafka.
>> > > >
>> > > > Thanks in advance for leaving a review!
>> > > >
>> > > > Best,
>> > > > Christo
>> > > >
>> > >
>> >
>>
>


[jira] [Created] (KAFKA-15660) File-based Tiered Storage should delete folders upon topic deletion

2023-10-20 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-15660:
-

 Summary: File-based Tiered Storage should delete folders upon 
topic deletion
 Key: KAFKA-15660
 URL: https://issues.apache.org/jira/browse/KAFKA-15660
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Christo Lolov


We have added a quick-start guide for Tiered Storage as part of Apache Kafka 
3.6 - [https://kafka.apache.org/documentation/#tiered_storage_config_ex.]

When interacting with it, however, it appears that when topics are deleted 
while remote segments and their indecies are deleted the folders are not:
{code:java}
> ls /tmp/kafka-remote-storage/kafka-tiered-storage 
A-0-ApBdPOE1SOOw-Ie8RQLuAA  B-0-2omLZKw1Tiu2-EUKsIzj9Q  
C-0-FXdccGWXQJCj-RQynsOK3Q  D-0-vqfdYtYLSlWEyXp6cwwmpg

> ls /tmp/kafka-remote-storage/kafka-tiered-storage/A-0-ApBdPOE1SOOw-Ie8RQLuAA

{code}
I think that the file-based implementation shipping with Kafka should delete 
the folders as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-13 Thread Christo Lolov
Heya Gantigmaa,

Apologies for the (very) late reply!

Now that 3.6 has been released and reviewers have a bit more time I will be
picking up this KIP again. I am more than happy to add useful new metrics
to the KIP, I would just ask for a couple of days to review your pull
request and I will come back to you.

Best,
Christo

On Mon, 25 Sept 2023 at 10:49, Gantigmaa Selenge 
wrote:

> Hi Christo,
>
> Thank you for writing the KIP.
>
> I recently raised a PR to add metrics for tracking remote segment deletions
> (https://github.com/apache/kafka/pull/14375) but realised those metrics
> were not mentioned in the original KIP-405 or KIP-930. Do you think these
> would make sense to be added to this KIP and get included in the
> discussion?
>
> Regards,
> Gantigmaa
>
> On Wed, Aug 9, 2023 at 1:53 PM Christo Lolov 
> wrote:
>
> > Heya Kamal,
> >
> > Thank you for going through the KIP and for the question!
> >
> > I have been thinking about this and as an operator I might find it the
> most
> > useful to know all three of them actually.
> >
> > I would find knowing the size in bytes useful to determine how much disk
> I
> > might need to add temporarily to compensate for the slowdown.
> > I would find knowing the number of records useful, because using the
> > MessagesInPerSec metric I would be able to determine how old the records
> > which are facing problems are.
> > I would find knowing the number of segments useful because I would be
> able
> > to correlate this with whether I need to change
> > *remote.log.manager.task.interval.ms
> > <http://remote.log.manager.task.interval.ms> *to a lower or higher
> value.
> >
> > What are your thoughts on the above? Would you find some of them more
> > useful than others?
> >
> > Best,
> > Christo
> >
> > On Tue, 8 Aug 2023 at 16:43, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi Christo,
> > >
> > > Thanks for the KIP!
> > >
> > > The proposed tiered storage metrics are useful. The unit mentioned in
> the
> > > KIP is the number of records.
> > > Each topic can have varying amounts of records in a segment depending
> on
> > > the record size.
> > >
> > > Do you think having the tier-lag by number of segments (or) size of
> > > segments in bytes will be useful
> > > to the operator?
> > >
> > > Thanks,
> > > Kamal
> > >
> > > On Tue, Aug 8, 2023 at 8:56 PM Christo Lolov 
> > > wrote:
> > >
> > > > Hello all!
> > > >
> > > > I would like to start a discussion for KIP-963: Upload and delete lag
> > > > metrics in Tiered Storage (
> > https://cwiki.apache.org/confluence/x/sZGzDw
> > > ).
> > > >
> > > > The purpose of this KIP is to introduce a couple of metrics to track
> > lag
> > > > with respect to remote storage from the point of view of Kafka.
> > > >
> > > > Thanks in advance for leaving a review!
> > > >
> > > > Best,
> > > > Christo
> > > >
> > >
> >
>


[jira] [Resolved] (KAFKA-15385) Replace EasyMock with Mockito for AbstractStreamTest

2023-10-10 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15385.
---
Resolution: Duplicate

Closing this in favour of https://issues.apache.org/jira/browse/KAFKA-14133

> Replace EasyMock with Mockito for AbstractStreamTest
> 
>
> Key: KAFKA-15385
> URL: https://issues.apache.org/jira/browse/KAFKA-15385
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: Fei Xie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15382) Replace EasyMock with Mockito for KStreamTransformValuesTest

2023-10-10 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15382.
---
Resolution: Duplicate

Closing this ticket in favour of 
https://issues.apache.org/jira/browse/KAFKA-14133

> Replace EasyMock with Mockito for KStreamTransformValuesTest
> 
>
> Key: KAFKA-15382
> URL: https://issues.apache.org/jira/browse/KAFKA-15382
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: Fei Xie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15383) Replace EasyMock with Mockito for KTableImplTest

2023-10-10 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15383.
---
Resolution: Duplicate

Closing this ticket in favour of 
https://issues.apache.org/jira/browse/KAFKA-14133

> Replace EasyMock with Mockito for KTableImplTest
> 
>
> Key: KAFKA-15383
> URL: https://issues.apache.org/jira/browse/KAFKA-15383
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: Fei Xie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15384) Replace EasyMock with Mockito for KTableTransformValuesTest

2023-10-10 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15384.
---
Resolution: Fixed

Closing this ticket in favour of 
https://issues.apache.org/jira/browse/KAFKA-14133

> Replace EasyMock with Mockito for KTableTransformValuesTest
> ---
>
> Key: KAFKA-15384
> URL: https://issues.apache.org/jira/browse/KAFKA-15384
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: Fei Xie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.6.0 RC0

2023-09-19 Thread Christo Lolov
Heya,

I have compiled and ran the test target successfully for the 3.6.0-rc0
branch on:

Java 11, Scala 2.13 - ARM
Java 17, Scala 2.13 - ARM
Java 20, Scala 2.18 - ARM
Java 11, Scala 2.13 - Intel x86
Java 17, Scala 2.13 - Intel x86
Java 20, Scala 2.13 - Intel x86

I will update the Zookeeper KIP title in the KIPs page, that's my miss

Best,
Christo

On Tue, 19 Sept 2023 at 13:21, Divij Vaidya  wrote:

> Hey Satish
>
> Thank you for managing this release. I have a few comments:
>
> Documentation
>
> 1. Section: Zookeeper/Stable Version - The documentation states "The
> current stable branch is 3.5. Kafka is regularly updated to include
> the latest release in the 3.5 series." in the ZooKeeper section. That
> needs an update since we are running Zk 3.8 now.
>
> 2. Section: Zookeeper/Migration - The documentation states "Migration
> of an existing ZooKeeper based Kafka cluster to KRaft is currently
> Preview and we expect it to be ready for production usage in version
> 3.6.". This probably needs an update on whether it is production ready
> or not in 3.6
>
> 3. Section: Kraft/missing features
> (https://kafka.apache.org/36/documentation.html#kraft_missing) - I
> believe that delegation token is now part of 3.6? I think this
> probably needs an update.
>
> 4. Section: Configuration/rack.aware.assignment.strategy - there seems
> to be a formatting problem starting from here
> (
> https://kafka.apache.org/36/documentation.html#streamsconfigs_rack.aware.assignment.strategy
> )
>
> 5. Section: KRaft Monitoring - Newly added metrics in
> https://issues.apache.org/jira/browse/KAFKA-15183 are missing from the
> documentation here.
>
> Release notes
>
> 1. I found a bunch of tickets which have not been marked with a
> release version but have been resolved in last 6 months using the
> query
> https://issues.apache.org/jira/browse/KAFKA-15380?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%20EMPTY%20AND%20resolved%20%3E%3D%20-24w%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> . Are some of them targeted for 3.6 release?
>
> 2. The KIP "KIP-902: Upgrade Zookeeper to 3.8.1" should probably be
> renamed to include 3.8.2 since code uses version 3.8.2 of Zookeeper.
>
>
> Additionally, I have verified the following:
> 1. release tag is correctly made after the latest commit on the 3.6
> branch at
> https://github.com/apache/kafka/commit/193d8c5be8d79b64c6c19d281322f09e3c5fe7de
>
> 2. protocol documentation contains the newly introduced error code as
> part of tiered storage
>
> 3. verified that public keys for RM are available at
> https://keys.openpgp.org/
>
> 4. verified that public keys for RM are available at
> https://people.apache.org/keys/committer/
>
> --
> Divij Vaidya
>
> On Tue, Sep 19, 2023 at 12:41 PM Sagar  wrote:
> >
> > Hey Satish,
> >
> > I have commented on KAFKA-15473. I think the changes in the PR look
> fine. I
> > also feel this need not be a release blocker given there are other
> > possibilities in which duplicates can manifest on the response of the end
> > point in question (albeit we can potentially see more in number due to
> > this).
> >
> > Would like to hear others' thoughts as well.
> >
> > Thanks!
> > Sagar.
> >
> >
> > On Tue, Sep 19, 2023 at 3:14 PM Satish Duggana  >
> > wrote:
> >
> > > Hi Greg,
> > > Thanks for reporting the KafkaConnect issue. I replied to this issue
> > > on "Apache Kafka 3.6.0 release" email thread and on
> > > https://issues.apache.org/jira/browse/KAFKA-15473.
> > >
> > > I would like to hear other KafkaConnect experts' opinions on whether
> > > this issue is really a release blocker.
> > >
> > > Thanks,
> > > Satish.
> > >
> > >
> > >
> > >
> > > On Tue, 19 Sept 2023 at 00:27, Greg Harris
> 
> > > wrote:
> > > >
> > > > Hey all,
> > > >
> > > > I noticed this regression in RC0:
> > > > https://issues.apache.org/jira/browse/KAFKA-15473
> > > > I've mentioned it in the release thread, and I'm working on a fix.
> > > >
> > > > I'm -1 (non-binding) until we determine if this regression is a
> blocker.
> > > >
> > > > Thanks!
> > > >
> > > > On Mon, Sep 18, 2023 at 10:56 AM Josep Prat
> 
> > > wrote:
> > > > >
> > > > > Hi Satish,
> > > > > Thanks for running the release.
> > > > >
> > > > > I ran the following validation steps:
> > > > > - Built from source with Java 11 and Scala 2.13
> > > > > - Verified Signatures and hashes of the artifacts generated
> > > > > - Navigated through Javadoc including links to JDK classes
> > > > > - Run the unit tests
> > > > > - Run integration tests
> > > > > - Run the quickstart in KRaft and Zookeeper mode
> > > > > - Checked License-binary against libs and matched them
> > > > >
> > > > > I +1 this release (non-binding)
> > > > >
> > > > > Best,
> > > > >
> > > > > On Mon, Sep 18, 2023 at 6:02 PM David Arthur 
> wrote:
> > > > >
> > > > > > Hey Satish, thanks for getting the RC underway!
> > > > > >
> > > > > > I noticed that the PR for the 3.6 

[jira] [Resolved] (KAFKA-15399) Enable OffloadAndConsumeFromLeader test

2023-09-01 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15399.
---
Resolution: Fixed

> Enable OffloadAndConsumeFromLeader test
> ---
>
> Key: KAFKA-15399
> URL: https://issues.apache.org/jira/browse/KAFKA-15399
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.6.0
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Blocker
> Fix For: 3.6.0
>
>
> Build / JDK 17 and Scala 2.13 / initializationError – 
> org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15427) Integration tests in TS test harness detect resource leaks

2023-09-01 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-15427:
-

 Summary: Integration tests in TS test harness detect resource leaks
 Key: KAFKA-15427
 URL: https://issues.apache.org/jira/browse/KAFKA-15427
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov
 Fix For: 3.6.0


The pull request (https://github.com/apache/kafka/pull/14116) for adding the 
Tiered Storage test harness uncovered resource leaks as part of the build 
([https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14116/28/testReport/junit/org.apache.kafka.tiered.storage.integration/OffloadAndConsumeFromLeaderTest/Build___JDK_20_and_Scala_2_13___initializationError/)]

 

This can be reproduced locally by running the following command

```

./gradlew --no-parallel --max-workers 1 -PmaxParallelForks=1 storage:test 
--tests org.apache.kafka.server.log.remote.storage.RemoteLogMetadataManagerTest 
--tests 
org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest 
--rerun

```

 

The point of this Jira ticket is to find the resource leak and fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15298) Disable DeleteRecords on Tiered Storage topics

2023-08-09 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15298.
---
Resolution: Won't Fix

> Disable DeleteRecords on Tiered Storage topics
> --
>
> Key: KAFKA-15298
> URL: https://issues.apache.org/jira/browse/KAFKA-15298
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Christo Lolov
>    Assignee: Christo Lolov
>Priority: Major
>  Labels: tiered-storage
>
> Currently the DeleteRecords API does not work with Tiered Storage. We should 
> ensure that this is reflected in the responses that clients get when trying 
> to use the API with tiered topics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-08-09 Thread Christo Lolov
Heya Kamal,

Thank you for going through the KIP and for the question!

I have been thinking about this and as an operator I might find it the most
useful to know all three of them actually.

I would find knowing the size in bytes useful to determine how much disk I
might need to add temporarily to compensate for the slowdown.
I would find knowing the number of records useful, because using the
MessagesInPerSec metric I would be able to determine how old the records
which are facing problems are.
I would find knowing the number of segments useful because I would be able
to correlate this with whether I need to change
*remote.log.manager.task.interval.ms
<http://remote.log.manager.task.interval.ms> *to a lower or higher value.

What are your thoughts on the above? Would you find some of them more
useful than others?

Best,
Christo

On Tue, 8 Aug 2023 at 16:43, Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Christo,
>
> Thanks for the KIP!
>
> The proposed tiered storage metrics are useful. The unit mentioned in the
> KIP is the number of records.
> Each topic can have varying amounts of records in a segment depending on
> the record size.
>
> Do you think having the tier-lag by number of segments (or) size of
> segments in bytes will be useful
> to the operator?
>
> Thanks,
> Kamal
>
> On Tue, Aug 8, 2023 at 8:56 PM Christo Lolov 
> wrote:
>
> > Hello all!
> >
> > I would like to start a discussion for KIP-963: Upload and delete lag
> > metrics in Tiered Storage (https://cwiki.apache.org/confluence/x/sZGzDw
> ).
> >
> > The purpose of this KIP is to introduce a couple of metrics to track lag
> > with respect to remote storage from the point of view of Kafka.
> >
> > Thanks in advance for leaving a review!
> >
> > Best,
> > Christo
> >
>


[DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-08-08 Thread Christo Lolov
Hello all!

I would like to start a discussion for KIP-963: Upload and delete lag
metrics in Tiered Storage (https://cwiki.apache.org/confluence/x/sZGzDw).

The purpose of this KIP is to introduce a couple of metrics to track lag
with respect to remote storage from the point of view of Kafka.

Thanks in advance for leaving a review!

Best,
Christo


[DISCUSS] Cluster-wide disablement of Tiered Storage

2023-08-04 Thread Christo Lolov
Hello all!

I wanted to gather more opinions for
https://issues.apache.org/jira/browse/KAFKA-15267

In summary, the problem which I would like to solve is disabling TS (and
freeing the resources used by RemoteLog*Manager) because I have decided I
no longer want to use it without having to provision a whole new cluster
which just doesn't have it enabled.

My preference would be for option 4.1 without a KIP followed by option 4.2
in the future with a KIP once KIP-950 makes it in.

Please let me know your thoughts!

Best,
Christo


[jira] [Created] (KAFKA-15298) Disable DeleteRecords on Tiered Storage topics

2023-08-02 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-15298:
-

 Summary: Disable DeleteRecords on Tiered Storage topics
 Key: KAFKA-15298
 URL: https://issues.apache.org/jira/browse/KAFKA-15298
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov


Currently the DeleteRecords API does not work with Tiered Storage. We should 
ensure that this is reflected in the responses that clients get when trying to 
use the API with tiered topics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15267) Cluster-wide disablement of Tiered Storage

2023-07-28 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-15267:
-

 Summary: Cluster-wide disablement of Tiered Storage
 Key: KAFKA-15267
 URL: https://issues.apache.org/jira/browse/KAFKA-15267
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov
Assignee: Christo Lolov


h2. Summary

KIP-405 defines the configuration {{remote.log.storage.system.enable}} which 
controls whether all resources needed for Tiered Storage to function are 
instantiated properly in Kafka. However, the interaction between remote data 
and Kafka if that configuration is set to false while there are still topics 
with {{{}remote.storage.enable is undefined{}}}. {color:#ff8b00}*We would like 
to give customers the ability to switch off Tiered Storage on a cluster level 
and as such would need to define the behaviour.*{color}

{{remote.log.storage.system.enable}} is a read-only configuration. This means 
that it can only be changed by *modifying the server.properties* and restarting 
brokers. As such, the {*}validity of values contained in it is only checked at 
broker startup{*}.

This JIRA proposes a few behaviours and a recommendation on a way forward.
h2. Option 1: Change nothing

Pros:
 * No operation.

Cons:
 * We do not solve the problem of moving back to older (or newer) Kafka 
versions not supporting TS.

h2. Option 2: Remove the configuration, enable Tiered Storage on a cluster 
level and do not allow it to be disabled

Always instantiate all resources for tiered storage. If no special ones are 
selected use the default ones which come with Kafka.

Pros:
 * We solve the problem for moving between versions not allowing TS to be 
disabled.

Cons:
 * We do not solve the problem of moving back to older (or newer) Kafka 
versions not supporting TS.
 * We haven’t quantified how much computer resources (CPU, memory) idle TS 
components occupy.
 * TS is a feature not required for running Kafka. As such, while it is still 
under development we shouldn’t put it on the critical path of starting a 
broker. In this way, a stray memory leak won’t impact anything on the critical 
path of a broker.
 * We are potentially swapping one problem for another. How does TS behave if 
one decides to swap the TS plugin classes when data has already been written?

h2. Option 3: Hide topics with tiering enabled

Customers cannot interact with topics which have tiering enabled. They cannot 
create new topics with the same names. Retention (and compaction?) do not take 
effect on files already in local storage.

Pros:
 * We do not force data-deletion.

Cons:
 * This will be quite involved - the controller will need to know when a 
broker’s server.properties have been altered; the broker will need to not 
proceed to delete logs it is not the leader or follower for.

h2. {color:#e6e6e6}Option 4: Do not start the broker if there are topics with 
tiering enabled{color} - Recommended

This option has 2 different sub-options. The first one is that TS cannot be 
disabled on cluster-level if there are *any* tiering topics - in other words 
all tiered topics need to be deleted. The second one is that TS cannot be 
disabled on a cluster-level if there are *any* topics with *tiering enabled* - 
they can have tiering disabled, but with a retention policy set to delete or 
retain (as per 
[KIP-950|https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement]).
 A topic can have tiering disabled and remain on the cluster as long as there 
is no *remote* data when TS is disabled cluster-wide.

Pros:
 * We force the customer to be very explicit in disabling tiering of topics 
prior to disabling TS on the whole cluster.

Cons:
 * You have to make certain that all data in remote is deleted (just a 
disablement of tired topic is not enough). How do you determine whether all 
remote has expired if policy is retain? If retain policy in KIP-950 knows that 
there is data in remote then this should also be able to figure it out.

The common denominator is that there needs to be no *remote* data at the point 
of disabling TS. As such, the most straightforward option is to refuse to start 
brokers if there are topics with the {{remote.storage.enabled}} present. This 
in essence requires customers to clean any tiered topics before switching off 
TS, which is a fair ask. Should we wish to revise this later it should be 
possible.
h2. Option 5: Make Kafka forget about all remote information

Pros:
 * Clean cut

Cons:
 * Data is lost the moment TS is disabled regardless of whether it is reenabled 
later on, which might not be the behaviour expected by customers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Apache Kafka 3.6.0 release

2023-07-20 Thread Christo Lolov
Hello!

A couple of days ago I opened a new KIP for discussion - KIP-952 [1]. I
believe it might be a blocker for the release of 3.6.0, but I wanted to
bring it up here for a decision on its urgency with the current set of
people who are looking at Tiered Storage (Satish, Luke, Ivan, Divij) given
that the date for KIP freeze is fast approaching.
What are your thoughts on the matter?

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-952%3A+Regenerate+segment-aligned+producer+snapshots+when+upgrading+to+a+Kafka+version+supporting+Tiered+Storage

Best,
Christo

On Sat, 8 Jul 2023 at 13:06, Satish Duggana 
wrote:

> Hi Yash,
> Thanks for the update. Added KIP-793 to the release plan. Please feel
> free to update the release wiki with any other updates on the KIP.
>
> ~Satish.
>
> On Fri, 7 Jul 2023 at 10:52, Yash Mayya  wrote:
> >
> > Hi Satish,
> >
> > KIP-793 [1] just passed voting and we should be able to wrap up the
> > implementation in time for the 3.6.0 feature freeze. Could we add it to
> the
> > release plan?
> >
> > [1] -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs
> >
> > Thanks,
> > Yash
> >
> > On Mon, Jun 12, 2023 at 3:52 PM Satish Duggana  >
> > wrote:
> >
> > > Hi,
> > > I have created a release plan for Apache Kafka version 3.6.0 on the
> > > wiki. You can access the release plan and all related information by
> > > following this link:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.6.0
> > >
> > > The release plan outlines the key milestones and important dates for
> > > version 3.6.0. Currently, the following dates have been set for the
> > > release:
> > >
> > > KIP Freeze: 26th July 23
> > > Feature Freeze : 16th Aug 23
> > > Code Freeze : 30th Aug 23
> > >
> > > Please review the release plan and provide any additional information
> > > or updates regarding KIPs targeting version 3.6.0. If you have
> > > authored any KIPs that are missing a status or if there are incorrect
> > > status details, please make the necessary updates and inform me so
> > > that I can keep the plan accurate and up to date.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Mon, 17 Apr 2023 at 21:17, Luke Chen  wrote:
> > > >
> > > > Thanks for volunteering!
> > > >
> > > > +1
> > > >
> > > > Luke
> > > >
> > > > On Mon, Apr 17, 2023 at 2:03 AM Ismael Juma 
> wrote:
> > > >
> > > > > Thanks for volunteering Satish. +1.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Sun, Apr 16, 2023 at 10:08 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I would like to volunteer as release manager for the next
> release,
> > > > > > which will be Apache Kafka 3.6.0.
> > > > > >
> > > > > > If there are no objections, I will start a release plan a week
> after
> > > > > > 3.5.0 release(around early May).
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > >
> > >
>


[DISCUSS] KIP-952: Regenerate segment-aligned producer snapshots when upgrading to a Kafka version supporting Tiered Storage

2023-07-17 Thread Christo Lolov
Hello!

A customer upgrading from Kafka < 2.8 to the future release 3.6 and wanting
to enable tiered storage would have to take responsibility for ensuring
that all segments lacking a producer snapshot file have expired and are
deleted before enabling the feature.

In our experience customers are not aware of this limitation and expect to
be able to enable the feature as soon as their upgrade is complete. If they
do this today, however, this results in NPEs. As such, one could argue this
is a blocker for 3.6 due to the non-direct upgrade path from versions < 2.8.

I would like to start a discussion on KIP-952: Regenerate segment-aligned
producer snapshots when upgrading to a Kafka version supporting Tiered
Storage (https://cwiki.apache.org/confluence/x/dIuzDw) which aims to solve
this issue.

Best,
Christo


[jira] [Created] (KAFKA-15195) Regenerate segment-aligned producer snapshots when upgrading to a Kafka version supporting Tiered Storage

2023-07-17 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-15195:
-

 Summary: Regenerate segment-aligned producer snapshots when 
upgrading to a Kafka version supporting Tiered Storage
 Key: KAFKA-15195
 URL: https://issues.apache.org/jira/browse/KAFKA-15195
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 3.6.0
Reporter: Christo Lolov
Assignee: Christo Lolov


As mentioned in KIP-405: Kafka Tiered Storage#Upgrade a customer wishing to 
upgrade from a Kafka version < 2.8.0 to 3.6 and turn Tiered Storage on will 
have to wait for retention to clean up segments without an associated producer 
snapshot.

However, in our experience, customers of Kafka expect to be able to immediately 
enable tiering on a topic once their cluster upgrade is complete. Once they do 
this, however, they start seeing NPEs and no data is uploaded to Tiered Storage 
(https://github.com/apache/kafka/blob/9e50f7cdd37f923cfef4711cf11c1c5271a0a6c7/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/LogSegmentData.java#L61).

To achieve this, we propose changing Kafka to retroactively create producer 
snapshot files on upload whenever a segment is due to be archived and lacks one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Apache Kafka 3.6.0 release

2023-06-30 Thread Christo Lolov
Hello!

I will add KIP-902 to the release plan. I would appreciate a few more
reviews on the pull request (https://github.com/apache/kafka/pull/13260)
for that KIP as the longer we have it in trunk with tests running against
it the more confidence we will have before the release.

Best,
Christo

On Sat, 24 Jun 2023 at 17:14, Chris Egerton  wrote:

> Thanks Satish!
>
> On Sat, Jun 24, 2023 at 7:34 AM Satish Duggana 
> wrote:
>
> > Thanks Chris for the update. I added KIP-875 to the 3.6.0 release plan
> > wiki. Please feel free to update it.
> >
> > ~Satish.
> >
> > On Fri, 23 Jun 2023 at 23:10, Chris Egerton 
> > wrote:
> > >
> > > Hi Satish,
> > >
> > > Could we add KIP-875 [1] to the release plan? It was partially released
> > in
> > > 3.5.0 and mentioned in the release plan [2], and since the rest (APIs
> to
> > > reset and alter offsets for connectors) has now been merged to trunk,
> we
> > > can let people know that the remainder should be available in the next
> > > release.
> > >
> > > [1] -
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect
> > > [2] -
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Jun 20, 2023 at 12:10 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ivan for the update. Added KIP-917 to 3.6.0 Release Plan wiki.
> > > > Please feel free to update the status in the wiki.
> > > >
> > > > On Mon, 19 Jun 2023 at 18:35, Ivan Yurchenko <
> ivan0yurche...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Thank you. If by closing you mean summing up and announcing the
> > result,
> > > > > then already did.
> > > > >
> > > > > Ivan
> > > > >
> > > > >
> > > > > On Mon, 19 Jun 2023 at 15:28, Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ivan,
> > > > > > Sure, KIP freeze date is 26th July 23 for 3.6.0. Please close the
> > > > > > voting for KIP acceptance before that.
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Mon, 19 Jun 2023 at 16:03, Ivan Yurchenko <
> > ivan0yurche...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I would like to propose to include the newly accepted "KIP-917:
> > > > > > Additional
> > > > > > > custom metadata for remote log segment" [1] in the release
> plan.
> > > > Would it
> > > > > > > be possible?
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Best,
> > > > > > > Ivan
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> > > > > > >
> > > > > > > On Mon, 12 Jun 2023 at 13:22, Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I have created a release plan for Apache Kafka version 3.6.0
> > on the
> > > > > > > > wiki. You can access the release plan and all related
> > information
> > > > by
> > > > > > > > following this link:
> > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.6.0
> > > > > > > >
> > > > > > > > The release plan outlines the key milestones and important
> > dates
> > > > for
> > > > > > > > version 3.6.0. Currently, the following dates have been set
> > for the
> > > > > > > > release:
> > > > > > > >
> > > > > > > > KIP Freeze: 26th July 23
> > > > > > > > Feature Freeze : 16th Aug 23
> > > > > > > > Code Freeze : 30th Aug 23
> > > > > > > >
> > > > > > > > Please review the release plan and provide any additional
> > > > information
> > > > > > > > or updates regarding KIPs targeting version 3.6.0. If you
> have
> > > > > > > > authored any KIPs that are missing a status or if there are
> > > > incorrect
> > > > > > > > status details, please make the necessary updates and inform
> > me so
> > > > > > > > that I can keep the plan accurate and up to date.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Satish.
> > > > > > > >
> > > > > > > > On Mon, 17 Apr 2023 at 21:17, Luke Chen 
> > wrote:
> > > > > > > > >
> > > > > > > > > Thanks for volunteering!
> > > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Luke
> > > > > > > > >
> > > > > > > > > On Mon, Apr 17, 2023 at 2:03 AM Ismael Juma <
> > ism...@juma.me.uk>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for volunteering Satish. +1.
> > > > > > > > > >
> > > > > > > > > > Ismael
> > > > > > > > > >
> > > > > > > > > > On Sun, Apr 16, 2023 at 10:08 AM Satish Duggana <
> > > > > > > > satish.dugg...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > I would like to volunteer as release manager for the
> next
> > > > > > release,
> > > > > > > > > > > which will be Apache Kafka 3.6.0.
> > > > > > > > > > >
> > > > > > > > > > > 

Re: [VOTE] KIP-937: Improve Message Timestamp Validation

2023-06-27 Thread Christo Lolov
+1 (non-binding) from me as well! This is the type of problem which is
difficult to become aware of so the more guardrails we put into place the
better.

On Wed, 21 Jun 2023 at 23:30, Beyene, Mehari 
wrote:

> Thank you, Justin. That makes sense.
> I have updated the KIP to remove the concept of ahead/behind. Instead, we
> will use the existing error message that utilizes the acceptable range for
> the timestamps.
>
> Thanks,
> Mehari
>
>


Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

2023-06-07 Thread Christo Lolov
Hey Colin,

I tried the following setup:

* Create 3 EC2 machines.
* EC2 machine named A acts as a KRaft Controller.
* EC2 machine named B acts as a KRaft Broker. (The only configurations
different to the default values: log.retention.ms=3,
log.segment.bytes=1048576, log.retention.check.interval.ms=3,
leader.imbalance.check.interval.seconds=30)
* EC2 machine named C acts as a Producer.
* I attached 1 GB EBS volume to the EC2 machine B (Broker) and configured
the log.dirs to point to it.
* I filled 995 MB of that EBS volume using fallocate.
* I created a topic with 6 partitions and a replication factor of 1.
* From the Producer machine I used `~/kafka/bin/kafka-producer-perf-test.sh
--producer.config ~/kafka/config/client.properties --topic batman
--record-size 524288 --throughput 5 --num-records 150`. The disk on EC2
machine B filled up and the broker shut down. I stopped the producer.
* I stopped the controller on EC2 machine A. I started the controller to
both be a controller and a broker (I need this because I cannot communicate
directly with a controller -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-919%3A+Allow+AdminClient+to+Talk+Directly+with+the+KRaft+Controller+Quorum
).
* I deleted the topic to which I had been writing by using kafka-topics.sh .
* I started the broker on EC2 machine B and it failed due to no space left
on disk during its recovery process. The topic was not deleted from the
disk.

As such, I am not convinced that KRaft addresses the problem of deleting
topics on startup if there is no space left on the disk - is there
something wrong with my setup that you disagree with? I think this will
continue to be the case even when JBOD + KRaft is implemented.

Let me know your thoughts!

Best,
Christo

On Mon, 5 Jun 2023 at 11:03, Christo Lolov  wrote:

> Hey Colin,
>
> Thanks for the review!
>
> I am also skeptical that much space can be reclaimed via compaction as
> detailed in the limitations section of the KIP.
>
> In my head there are two ways to get out of the saturated state -
> configure more aggressive retention and delete topics. I wasn't aware that
> KRaft deletes topics marked for deletion on startup if the disks occupied
> by those partitions are full - I will check it out, thank you for the
> information! On the retention side, I believe there is still a benefit in
> keeping the broker up and responsive - in my experience, people first try
> to reduce the data they have and only when that also does not work they are
> okay with sacrificing all of the data.
>
> Let me know your thoughts!
>
> Best,
> Christo
>
> On Fri, 2 Jun 2023 at 20:09, Colin McCabe  wrote:
>
>> Hi Christo,
>>
>> We're not adding new stuff to ZK at this point (it's deprecated), so it
>> would be good to drop that from the design.
>>
>> With regard to the "saturated" state: I'm skeptical that compaction could
>> really move the needle much in terms of freeing up space -- in most
>> workloads I've seen, it wouldn't. Compaction also requires free space to
>> function as well.
>>
>> So the main benefit of the "satured" state seems to be enabling deletion
>> on full disks. But KRaft mode already has most of that benefit. Full disks
>> (or, indeed, downed brokers) don't block deletion on KRaft. If you delete a
>> topic and then bounce the broker that had the disk full, it will delete the
>> topic directory on startup as part of its snapshot load process.
>>
>> So I'm not sure if we really need this. Maybe we should re-evaluate once
>> we have JBOD + KRaft.
>>
>> best,
>> Colin
>>
>>
>> On Mon, May 22, 2023, at 02:23, Christo Lolov wrote:
>> > Hello all!
>> >
>> > I would like to start a discussion on KIP-928: Making Kafka resilient to
>> > log directories becoming full which can be found at
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
>> > .
>> >
>> > In summary, I frequently run into problems where Kafka becomes
>> unresponsive
>> > when the disks backing its log directories become full. Such
>> > unresponsiveness generally requires intervention outside of Kafka. I
>> have
>> > found it to be significantly nicer of an experience when Kafka maintains
>> > control plane operations and allows you to free up space.
>> >
>> > I am interested in your thoughts and any suggestions for improving the
>> > proposal!
>> >
>> > Best,
>> > Christo
>>
>


Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

2023-06-05 Thread Christo Lolov
Hey Colin,

Thanks for the review!

I am also skeptical that much space can be reclaimed via compaction as
detailed in the limitations section of the KIP.

In my head there are two ways to get out of the saturated state - configure
more aggressive retention and delete topics. I wasn't aware that KRaft
deletes topics marked for deletion on startup if the disks occupied by
those partitions are full - I will check it out, thank you for the
information! On the retention side, I believe there is still a benefit in
keeping the broker up and responsive - in my experience, people first try
to reduce the data they have and only when that also does not work they are
okay with sacrificing all of the data.

Let me know your thoughts!

Best,
Christo

On Fri, 2 Jun 2023 at 20:09, Colin McCabe  wrote:

> Hi Christo,
>
> We're not adding new stuff to ZK at this point (it's deprecated), so it
> would be good to drop that from the design.
>
> With regard to the "saturated" state: I'm skeptical that compaction could
> really move the needle much in terms of freeing up space -- in most
> workloads I've seen, it wouldn't. Compaction also requires free space to
> function as well.
>
> So the main benefit of the "satured" state seems to be enabling deletion
> on full disks. But KRaft mode already has most of that benefit. Full disks
> (or, indeed, downed brokers) don't block deletion on KRaft. If you delete a
> topic and then bounce the broker that had the disk full, it will delete the
> topic directory on startup as part of its snapshot load process.
>
> So I'm not sure if we really need this. Maybe we should re-evaluate once
> we have JBOD + KRaft.
>
> best,
> Colin
>
>
> On Mon, May 22, 2023, at 02:23, Christo Lolov wrote:
> > Hello all!
> >
> > I would like to start a discussion on KIP-928: Making Kafka resilient to
> > log directories becoming full which can be found at
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
> > .
> >
> > In summary, I frequently run into problems where Kafka becomes
> unresponsive
> > when the disks backing its log directories become full. Such
> > unresponsiveness generally requires intervention outside of Kafka. I have
> > found it to be significantly nicer of an experience when Kafka maintains
> > control plane operations and allows you to free up space.
> >
> > I am interested in your thoughts and any suggestions for improving the
> > proposal!
> >
> > Best,
> > Christo
>


Re: [DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

2023-06-05 Thread Christo Lolov
Heya Igor,

Thank you for reading through the KIP and providing feedback!

11. Good question. I will check whether a change is needed in the
processing of the metadata records and come back. My hunch says no as long
as the Kafka broker is still alive to process the metadata records. This
being said, deleting topics is one of the two things I want to achieve. The
other one is to allow retention to be changed and continue to take effect.
As an example, if a person does not want to lose all data, but has realised
that they are storing 7 days of data while the only need the last 1 day
they should be able to make the retention more aggressive and recover space
without deleting the topic. In my opinion, the change to the controller for
ZK mode isn't big - where previously requests were sent only to online
replicas they are now sent to all replicas. I have a preference for it to
make it in, but if reviewers don't find it necessary I am happy to target
just KRaft.

12. Great question! Since the KIP aims to be as non-invasive as possible,
the controller has no knowledge of the saturated state - the brokers do not
propagate any new information. As such they will be reported as having
thrown a KafkaStorageException whenever DescribeReplicaLogDirs is called.
Again, this decision came from me wanting the change to be as least
invasive as possible - the new state could be propagated.

13. Yes, I forgot to add this to the KIP and will amend it in the upcoming
days. I was planning on proposing a metric similar to
kafka.log:type=LogManager,name=OfflineLogDirectoryCount, except that it
will show the count of SaturatedLogDirectory.

14. Great question and I will clarify this in the KIP! No, similarly to
getting out the offline state getting out of the saturated state once space
has been reclaimed would require a bounce of the broker. I have a want
should the KIP be accepted to build upon the proposal to allow
auto-recovery without the need of a restart.

Best,
Christo

On Fri, 2 Jun 2023 at 17:02, Igor Soarez  wrote:

> Hi Christo,
>
> Thank you for the KIP. Kafka is very sensitive to filesystem errors,
> and at the first IO error the whole log directory is permanently
> considered offline. It seems your proposal aims to increase the
> robustness of Kafka, and that's a positive improvement.
>
> I have some questions:
>
> 11. "Instead of sending a delete topic request only to replicas we
> know to be online, we will allow a delete topic request to be sent
> to all replicas regardless of their state. Previously a controller
> did not send delete topic requests to brokers because it knew they
> would fail. In the future, topic deletions for saturated topics will
> succeed, but topic deletions for the offline scenario will continue
> to fail." It seems you're describing ZK mode behavior? In KRaft
> mode the Controller does not send requests to Brokers. Instead
> the Controller persists new metadata records which all online Brokers
> then fetch. Since it's too late to be proposing design changes for
> ZK mode, is this change necessary? Is there a difference in how the
> metadata records should be processed by Brokers?
>
> 12. "We will add a new state to the broker state machines of a log
> directory (saturated) and a partition replica (saturated)."
> How are log directories and partitions replicas in these states
> represented in the Admin API? e.g. `DescribeReplicaLogDirs`
>
> 13. Should there be any metrics indicating the new saturated state for
> log directories and replicas?
>
> 14. "If an IOException due to No space left on device is raised (we
> will check the remaining space at that point in time rather than the
> exception message) the broker will stop all operations on logs
> located in that directory, remove all fetchers and stop compaction.
> Retention will continue to be respected. The same node as the
> current state will be written to in Zookeeper. All other
> IOExceptions will continue to be treated the same way they are
> treated now and will result in a log directory going offline."
> Does a log directory in this "saturated" state transition back to
> online if more storage space becomes available, e.g. due to
> retention policy enforcement or due to topic deletion, or does the
> Broker still require a restart to bring the log directory back to
> full operation?
>
> Best,
>
> --
> Igor
>
>
>


Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-25 Thread Christo Lolov
Heya!

5th of June 16:30 - 17:00 UTC works for me.

Best,
Christo

On Thu, 25 May 2023 at 15:14, Igor Soarez  wrote:

> Hi Divij, Christo,
>
> Thank you for pointing that out.
>
> Let's aim instead for Monday 5th of June, at the same time – 16:30-17:00
> UTC.
>
> Please let me know if this doesn't work either.
>
> Best,
>
> --
> Igor
>
>


Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-25 Thread Christo Lolov
Heya Igor!

I don't have any concerns or suggestions for improvements at this stage -
the overall approach makes sense to me!

I would be quite interested in attending a call, but as Divij has pointed
out the 29th of May is a public holiday, so I won't be able to make that
date. If there is another time I will do my best to appear.

Best,
Christo

On Tue, 23 May 2023 at 17:59, Igor Soarez  wrote:

> Hi everyone,
>
> Someone suggested at the recent Kafka Summit that it may be useful
> to have a video call to discuss remaining concerns.
>
> I'm proposing we have a video call Monday 29th May 16:30-17:00 UTC.
>
> If you'd like to join, please reply to the thread or to me directly so
> I can send you a link.
>
> Please also do let me know if you'd like to attend but the proposed
> time does not work for you.
>
> Thanks,
>
> --
> Igor
>
>


Re: [VOTE] 3.5.0 RC0

2023-05-23 Thread Christo Lolov
Hey Mickael!

I am giving a +1 (non-binding) for this candidate release.

* Built from the binary tar.gz source with Java 17 and Scala 2.13 on Intel
(m5.4xlarge) and ARM (m6g.4xlarge) machines.
* Ran unit and integration tests on Intel and ARM machines.
* Ran the Quickstart in both Zookeeper and KRaft modes on Intel and ARM
machines.

Question:
* I went through https://kafka.apache.org/35/protocol.html and there are
quite a few repetitive __tagged_fileds fields within the same structures -
is this expected?

On Tue, 23 May 2023 at 12:01, Josep Prat 
wrote:

> Hi Mickael,
> I just wanted to point out that I think the documentation you recently
> merged on Kafka site regarding the 3.5.0 version has a problem when it
> states the version number and the sub-menu that links to previous versions.
> Left a comment here:
> https://github.com/apache/kafka-site/pull/513#pullrequestreview-1438927939
>
> Best,
>
> On Tue, May 23, 2023 at 9:29 AM Josep Prat  wrote:
>
> > Hi Mickael,
> >
> > I can +1 this candidate. I verified the following:
> > - Built from source with Java 17 and Scala 2.13
> > - Signatures and hashes of the artifacts generated
> > - Navigated through Javadoc including links to JDK classes
> > - Run the unit tests
> > - Run integration tests
> > - Run the quickstart in KRaft and Zookeeper mode
> >
> > Best,
> >
> > On Mon, May 22, 2023 at 5:30 PM Mickael Maison 
> > wrote:
> >
> >> Hello Kafka users, developers and client-developers,
> >>
> >> This is the first candidate for release of Apache Kafka 3.5.0. Some of
> the
> >> major features include:
> >> - KIP-710: Full support for distributed mode in dedicated MirrorMaker
> >> 2.0 clusters
> >> - KIP-881: Rack-aware Partition Assignment for Kafka Consumers
> >> - KIP-887: Add ConfigProvider to make use of environment variables
> >> - KIP-889: Versioned State Stores
> >> - KIP-894: Use incrementalAlterConfig for syncing topic configurations
> >> - KIP-900: KRaft kafka-storage.sh API additions to support SCRAM for
> >> Kafka Brokers
> >>
> >> Release notes for the 3.5.0 release:
> >> https://home.apache.org/~mimaison/kafka-3.5.0-rc0/RELEASE_NOTES.html
> >>
> >> *** Please download, test and vote by Friday, May 26, 5pm PT
> >>
> >> Kafka's KEYS file containing PGP keys we use to sign the release:
> >> https://kafka.apache.org/KEYS
> >>
> >> * Release artifacts to be voted upon (source and binary):
> >> https://home.apache.org/~mimaison/kafka-3.5.0-rc0/
> >>
> >> * Maven artifacts to be voted upon:
> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >>
> >> * Javadoc:
> >> https://home.apache.org/~mimaison/kafka-3.5.0-rc0/javadoc/
> >>
> >> * Tag to be voted upon (off 3.5 branch) is the 3.5.0 tag:
> >> https://github.com/apache/kafka/releases/tag/3.5.0-rc0
> >>
> >> The PR adding the 35 documentation is not merged yet
> >> (https://github.com/apache/kafka-site/pull/513)
> >> * Documentation:
> >> https://kafka.apache.org/35/documentation.html
> >> * Protocol:
> >> https://kafka.apache.org/35/protocol.html
> >>
> >> * Successful Jenkins builds for the 3.5 branch:
> >> Unit/integration tests: Jenkins is not detecting the 3.5 branch,
> >> working with INFRA to sort it out:
> >> https://issues.apache.org/jira/browse/INFRA-24577
> >> System tests: The build is still running, I'll send an update once I
> >> have the results
> >>
> >> Thanks,
> >> Mickael
> >>
> >
> >
> > --
> > [image: Aiven] 
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io    |
> > 
> >    <
> https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-05-23 Thread Christo Lolov
Hello!

This proposal will address problems with configuration dependencies which I
run into very frequently, so I am fully supporting the development of this
feature!

Best,
Christo

On Mon, 22 May 2023 at 17:18, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi everyone,
>
> I'd like to start a discussion for KIP-935 <
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations
> >
> which proposes extend AlterConfigPolicy with existing configuration to
> enable more complex policies.
>
> There have been related KIPs in the past that haven't been accepted and
> seem retired/abandoned as outlined in the motivation.
> The scope of this KIP intends to be more specific to get most of the
> benefits from previous discussions; and if previous KIPs are resurrected,
> should still be possible to do it if this one is adopted.
>
> Looking forward to your feedback!
>
> Thanks,
> Jorge.
>


Re: [DISCUSS] KIP-934: Add DeleteTopicPolicy

2023-05-23 Thread Christo Lolov
Heya Jorge,

Thank you for the KIP!

This feature sounds great to me since I have encountered problems with
this, so I am supporting it. Do you have any idea why the previous KIPs
were abandoned - I went through the email conversations and pull requests,
but I didn't find a good reason?

Best,
Christo

On Mon, 22 May 2023 at 17:19, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi everyone,
>
> I'd like to start a discussion for KIP-934 <
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy
> >
> which proposes adding a new policy for when deleting topics.
>
> There have been related KIPs in the past that haven't been accepted and
> seem retired/abandoned as outlined in the motivation.
> The scope of this KIP intends to be more specific to get most of the
> benefits from previous discussions; and if previous KIPs are resurrected,
> should still be possible to do it if this one is adopted.
>
> Looking forward to your feedback!
>
> Thanks,
> Jorge.
>


RE: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-22 Thread Christo Lolov
Hello Igor!

I have been working on a KIP to extend the functionality of JBOD broker disk 
failures 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full)
 and was wondering what is the state of this KIP - were you planning on 
starting a vote soon?

Best,
Christo

On 2022/07/27 10:15:22 Igor Soarez wrote:
> Hi all,
> 
> I have proposal to handle JBOD disk failures in KRaft mode.
> 
> With KIP-833 KRaft is being marked production ready and ZK mode is being 
> deprecated but support for JBOD is still a big feature that's missing. 
> 
> Please have a look and share your thoughts:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft
> 
> Thanks,
> 
> --
> Igor
> 

Re: [DISCUSS] KIP-905: Broker interceptors

2023-05-22 Thread Christo Lolov
Hello David,

Thank you for the proposal - it is an interesting read!

I have a few questions about it.

1. Can you take a stance on whether you are proposing the feature just for
producers or for consumers as well? If it is just for producers can we
remove references to consumers? If it is for both can you explicitly call
out the properties used to configure the consumers?

2. What about extending this to altering configurations? One shortcoming of
Kafka today is that a range of values of one configuration affect the range
of values another configuration can have, but the validation framework
within Kafka does not have the capability to make such checks.

3. Does it not make sense for the pattern which determines whether an
interceptor is to be applied or not to be a configuration? Otherwise if
there is a problem with the pattern I have to carry out a whole new
deployment since I need to change the code. In the same line of reasoning,
will there be a way I can query the cluster to understand what interceptors
are currently running and what patterns they are using? Otherwise how would
I know what is the cluster's current configuration?

4. Will there be any new metrics emitted by said interceptors (i.e. number
of records dropped, number of records processed per unit time)? If there
aren't how will I be able to determine the performance of my interceptors?
If I have multiple interceptors how will I be able to determine which one
is the bottleneck?

5. Will records pass through interceptors in the same order as the
interceptors are specified in the list or will there be another way to
specify the ordering?

6. What happens if you have a pipeline of interceptors and some of them are
the same, how will you handle loading the interceptors then? For example, I
can imagine someone saying filter everything above a value X, do some more
complex operation, filter everything above a value X, do another more
complex operation, filter everything above a value X etc.

Let me know your thoughts!

Best,
Christo



On Thu, 9 Feb 2023 at 19:28, David Mariassy 
wrote:

> Hi everyone,
>
> I'd like to get a discussion going for KIP-905
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
> >,
> which proposes the addition of broker interceptors to the stack.
>
> The KIP contains the motivation, and lists the new public interfaces that
> this change would entail. Since my company had its quarterly hack days this
> week, I also took the liberty to throw together a first prototype of the
> proposed new feature here: https://github.com/apache/kafka/pull/13224.
>
> Looking forward to the group's feedback!
>
> Thanks,
> David
>


[DISCUSS] KIP-928: Making Kafka resilient to log directories becoming full

2023-05-22 Thread Christo Lolov
Hello all!

I would like to start a discussion on KIP-928: Making Kafka resilient to
log directories becoming full which can be found at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full
.

In summary, I frequently run into problems where Kafka becomes unresponsive
when the disks backing its log directories become full. Such
unresponsiveness generally requires intervention outside of Kafka. I have
found it to be significantly nicer of an experience when Kafka maintains
control plane operations and allows you to free up space.

I am interested in your thoughts and any suggestions for improving the
proposal!

Best,
Christo


Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

2023-04-18 Thread Christo Lolov
Hello all and thank you for the insightful comments!

As pointed out in the third rejected alternative I think the ideal solution is 
what a few of you have also mentioned - if partitions need to be expanded 
consumer groups should not lose their offsets and be able to find them. I was 
aware that this could only be picked up once the implementation of KIP-848 is 
done, which led to the idea presented in this KIP - there are situations 
involving the current logic which take a lot of time to recover from which can 
be recovered from quicker via a fairly small change.

As far as I understand people are of the general opinion that if we are to 
touch this we should solve it in such a way that nothing gets lost and 
transactions are also taken into account. I am okay with putting this KIP on 
hold until KIP-848 is completed, because even if a new KIP version is agreed 
upon no code can be accepted until that’s done (since this change will be 
fairly intrusive).

Do let me know if I have misunderstood the general sentiment.

Best,
Christo

> On Dec 29, 2022, at 2:18 PM, Christo  wrote:
> 
> Hello!
> 
> I would like to start this discussion thread on KIP-895: Dynamically refresh 
> partition count of __consumer_offsets.
> 
> The KIP proposes to alter brokers so that they refresh the partition count of 
> __consumer_offsets used to determine group coordinators without requiring a 
> rolling restart of the cluster.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> 
> Let me know your thoughts on the matter!
> 
> Best,
> Christo



Re: [VOTE] KIP-902: Upgrade Zookeeper to 3.8.1

2023-04-18 Thread Christo Lolov
Thank you all for the suggestions for improvements on the KIP and the
resulting discussions!

Since the voting has been opened for ~2 months and I have received 3
binding +1 (Colin, Ismael, Mickael) and 1 non-binding +1 (Divij) I will
move the KIP to accepted and update the associated pull request so that it
is ready to merge in trunk as soon as 3.5 has been made public.

Re: Colin,

What I meant to demonstrate by putting the Kafka clients is the upgrade
path for people using the Kafka command line tools and the --zookeeper flag
which was marked as deprecated in Kafka 2.4. I am happy to remove it if you
think it is unnecessary or amend it so that it becomes clearer. Please let
me know what you think!

Best,
Christo

On Mon, 17 Apr 2023 at 16:30, Mickael Maison 
wrote:

> Hi Christo,
>
> +1 (binding)
>
> Thanks for the KIP
>
> On Fri, Apr 14, 2023 at 7:32 PM Colin McCabe  wrote:
> >
> > On Sun, Apr 9, 2023, at 19:17, Ismael Juma wrote:
> > >
> > > On Sun, Apr 9, 2023 at 4:53 PM Colin McCabe 
> wrote:
> > >
> > >> We are going to deprecate ZK mode soon. So if this is indeed a
> requirement
> > >> (no deprecated software in prod), perhaps those users will have to
> move to
> > >> KRaft mode. (Independently of what we decide here)
> > >>
> > >
> > > Not sure where "no deprecated software in prod" is coming from. The
> concern
> > > is regarding end-of-life software - i.e. software that no longer
> receives
> > > security fixes. If we don't upgrade beyond 3.6.x, we'll be in a tough
> > > position when a CVE is fixed only in ZooKeeper 3.7.x, 3.8.x, etc. If
> it's a
> > > serious security problem, then it's likely that an additional release
> of
> > > ZooKeeper 3.6.x might be released. But the more likely case is that a
> > > library dependency will have a CVE that will trigger the compliance
> checks
> > > from enterprise users, but not warrant another ZooKeeper 3.6.x release.
> >
> > Hi Ismael,
> >
> > Fair enough. There is a difference between deprecated and unsupported.
> ZK 3.6.x is unsupported which is worse than deprecated, since it means it
> will not be updated.
> >
> > Overall, I agree with you that we're going to have to move to the new
> version of ZK. This fits in with the overall timeline of one more year of
> Kafka releases supporting ZK. If Apache Kafka 4.0 is April 2024, we'll need
> to be getting security updates for ZK during this time.
> >
> > On Wed, Apr 12, 2023, at 08:45, Christo Lolov wrote:
> > > Hello Colin,
> > >
> > > Thank you for the response!
> > >
> > > 1. I have attached the compatibility matrix in the KIP under the
> section
> > > Compatibility, Deprecation, and Migration Plan.
> >
> > Hi Christo,
> >
> > Thanks for attaching the matrix to the KIP.
> >
> > I don't understand why Kafka clients are part of this matrix. The Kafka
> client doesn't use ZK directly. (Well, certain very ancient pre-1.0 Kafka
> clients did, but that was a long time ago). So let's remove this.
> >
> > If I understand this correctly, the main documentation that will be
> needed is for pre-2.4 Kafka releases. Assuming they keep everything "stock"
> (which in my experience most users do), the net-net is that pre-2.4
> releases need to make an extra hop through a post-2.4, pre-3.6 release. We
> will have to document that as prominently as we can.
> >
> > I am +1 for this with the proviso that we do it in 3.6. We should update
> the version as soon as we can post-3.5 so that any bugs shake out as soon
> as possible.
> >
> > best,
> > Colin
>


Re: [VOTE] KIP-902: Upgrade Zookeeper to 3.8.1

2023-04-12 Thread Christo Lolov
Hello Colin,

Thank you for the response!

1. I have attached the compatibility matrix in the KIP under the section
Compatibility, Deprecation, and Migration Plan.
2. I believe the answer to how many bridge releases (for Kafka) will be
needed to upgrade from 2.0 to 4.0 based on the compatibility matrix is 2 -
one from 2.0 to any of 2.4.x to 3.5.x (now that we are no longer
considering this KIP for 3.5.x) and one from that version to the bridge
release mentioned in KIP-500 and KIP-866 (assuming that bridge release has
a dependency on Zookeeper 3.8.1).
3. What determines whether you need your Zookeeper cluster to first be
upgraded to 3.4.6 is "Upgrading a running ZooKeeper ensemble to 3.5.0
should be done only after upgrading your ensemble to the 3.4.6 release."
(source:
https://zookeeper.apache.org/doc/r3.8.1/zookeeperReconfig.html#ch_reconfig_upgrade).
Continuing from the example in point 2, since Kafka 2.0 had Zookeeper
3.4.13 no second bridge upgrade for Zookeeper is needed. To clarify, you
would go from Zookeeper 3.4.13 to any between 3.5.x and 3.6.x and then to
3.8.1.
4. Ideally, if users make an error since they will be carrying out a
rolling restart of their Zookeeper cluster the errors should start
appearing with the first Zookeeper instance which is rebooted, so if they
have sufficient monitoring in place they should be able to catch it before
it takes down their whole Kafka cluster. To be honest, I have never had to
downgrade a Zookeeper cluster, but I suspect the procedure is the same as
upgrading but in reverse i.e. stop the new binary, remove the new binary,
put the old binary, start the old binary.
5. Fair point, I meant to say that Zookeeper will no longer be a thing when
Kafka 4.0 arrives.
6. I believe Ismael's response answers your last concern better than I
could.

Best,
Christo

On Mon, 10 Apr 2023 at 00:53, Colin McCabe  wrote:

> On Wed, Mar 15, 2023, at 04:58, Christo Lolov wrote:
> > Hello Colin,
> >
> > Thank you for taking the time to review the proposal!
> >
> > I have attached a compatibility matrix to aid the explanation below - if
> the mailing system rejects it I will find another way to share it.
>
> Hi Christo,
>
> The mailing list doesn't take attachments. So perhaps you could share this
> in textual form?
>
> > For the avoidance of doubt, I am not proposing to drop support for
> rolling upgrade from old Kafka versions to new ones. What I am saying is
> that additional care will need to be taken when upgrading to the latest
> Kafka versions depending on the version of the accompanying Zookeeper
> cluster. This additional care means one might have to upgrade to a Kafka
> version which falls in the intersection of the two sets in the accompanying
> diagram before upgrading the accompanying Zookeeper cluster.
>
> I think we are talking about the same thing, just using different
> terminology. If you have to go through multiple upgrades to get from
> version X to version Y, I would not say there is "support for rolling
> upgrade from X to Y." In particular if you have to go through some other
> version B, I would say that B is the "bridge release."
>
> This is different than having an "upgrade path" -- I think everyone agrees
> that there should be an upgrade path between any two kafka versions (well,
> ones that are 0.8 or newer, at least).
>
> So I'd like to understand what the bridge release would be for this kind
> of change, and how many "hops" would be required to get from, say, 2.0 to
> 4.0. Keeping in mind that 4.0 won't have ZK at all.
>
> > As a concrete example let's say you want to upgrade to Kafka 3.5 from
> Kafka 2.3 and Zookeeper 3.4. You will have to:
> > 1. Carry out a rolling upgrade of your Kafka cluster to a version
> between 2.4 and 3.4.
> > 2. Carry out a rolling upgrade of your Zookeeper cluster to 3.8.1 (with
> a possible stop at 3.4.6 due to
> https://zookeeper.apache.org/doc/r3.8.1/zookeeperReconfig.html#ch_reconfig_upgrade
> ).
>
> Hmm, what determines whether I have to make the stop or not?
>
> One thing we haven't discussed in this thread is that a lot of users don't
> upgrade ZK when they do a Kafka upgrade. So I'd also like to understand in
> what situations ZK upgrades would be required as part of Kafka upgrades, if
> we bump this version. Also, what will happen if they forget? I assume the
> cluster would be down for a while. Does ZK have a downgrade procedure?
>
> > 3. Carry out a rolling upgrade of your Kafka cluster from 3.4 to 3.5.
> >
> > It is true that Zookeeper is to be deprecated in Kafka 4.0, but as far
> as I looked there is no concrete release date for that version yet.
>
> ZK is not going to be deprecated in AK 4.0, but removed in 4.0.
>
> >
> > Until this is th

Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-03-17 Thread Christo Lolov
Hello!

What would you suggest as the best way to get more eyes on KIP-902 as I would 
like it to be included it in 3.5.0?

Best,
Christo

> On 16 Mar 2023, at 10:33, Mickael Maison  wrote:
> 
> Hi,
> 
> This is a reminder that KIP freeze is less than a week away (22 Mar).
> For a KIP to be considered for this release, it must be voted and
> accepted by that date.
> 
> Feature freeze will be 3 weeks after this, so if you want KIPs or
> other significant changes in the release, please get them ready soon.
> 
> Thanks,
> Mickael
> 
>> On Tue, Feb 14, 2023 at 10:44 PM Ismael Juma  wrote:
>> 
>> Thanks!
>> 
>> Ismael
>> 
>> On Tue, Feb 14, 2023 at 1:07 PM Mickael Maison 
>> wrote:
>> 
>>> Hi Ismael,
>>> 
>>> Good call. I shifted all dates by 2 weeks and moved them to Wednesdays.
>>> 
>>> Thanks,
>>> Mickael
>>> 
>>> On Tue, Feb 14, 2023 at 6:01 PM Ismael Juma  wrote:
 
 Thanks Mickael. A couple of notes:
 
 1. We typically choose a Wednesday for the various freeze dates - there
>>> are
 often 1-2 day slips and it's better if that doesn't require people
 working through the weekend.
 2. Looks like we're over a month later compared to the equivalent release
 last year (
 https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.0). I
 understand that some of it is due to 3.4.0 slipping, but I wonder if we
 could perhaps aim for the KIP freeze to be one or two weeks earlier.
 
 Ismael
 
 On Tue, Feb 14, 2023 at 8:00 AM Mickael Maison >>> 
 wrote:
 
> Hi,
> 
> I've created a release plan for 3.5.0 in the wiki:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0
> 
> Current dates are:
> 1) KIP Freeze: 07 Apr 2023
> 2) Feature Freeze: 27 Apr 2023
> 3) Code Freeze: 11 May 2023
> 
> Please take a look at the plan. Let me know if there are other KIPs
> targeting 3.5.0.
> Also if you are the author of one of the KIPs that's missing a status
> (or the status is incorrect) please update it and let me know.
> 
> Thanks,
> Mickael
> 
> 
> On Thu, Feb 9, 2023 at 9:23 AM Bruno Cadonna 
>>> wrote:
>> 
>> Thanks, Mickael!
>> 
>> Best,
>> Bruno
>> 
>> On 09.02.23 03:15, Luke Chen wrote:
>>> Hi Mickael,
>>> Thanks for volunteering!
>>> 
>>> Luke
>>> 
>>> On Thu, Feb 9, 2023 at 6:23 AM Chris Egerton <
>>> fearthecel...@gmail.com>
>>> wrote:
>>> 
 Thanks for volunteering, Mickael!
 
 On Wed, Feb 8, 2023 at 1:12 PM José Armando García Sancio
  wrote:
 
> Thanks for volunteering Mickael.
> 
> --
> -José
> 
 
>>> 
> 
>>> 


Re: [VOTE] KIP-902: Upgrade Zookeeper to 3.8.1

2023-03-15 Thread Christo Lolov
Hello Colin,

Thank you for taking the time to review the proposal!

I have attached a compatibility matrix to aid the explanation below - if
the mailing system rejects it I will find another way to share it.

For the avoidance of doubt, I am not proposing to drop support for rolling
upgrade from old Kafka versions to new ones. What I am saying is that
additional care will need to be taken when upgrading to the latest Kafka
versions depending on the version of the accompanying Zookeeper cluster.
This additional care means one might have to upgrade to a Kafka version
which falls in the intersection of the two sets in the accompanying diagram
before upgrading the accompanying Zookeeper cluster.

As a concrete example let's say you want to upgrade to Kafka 3.5 from Kafka
2.3 and Zookeeper 3.4. You will have to:
1. Carry out a rolling upgrade of your Kafka cluster to a version between
2.4 and 3.4.
2. Carry out a rolling upgrade of your Zookeeper cluster to 3.8.1 (with a
possible stop at 3.4.6 due to
https://zookeeper.apache.org/doc/r3.8.1/zookeeperReconfig.html#ch_reconfig_upgrade
).
3. Carry out a rolling upgrade of your Kafka cluster from 3.4 to 3.5.

It is true that Zookeeper is to be deprecated in Kafka 4.0, but as far as I
looked there is no concrete release date for that version yet. Until this
is the case and unless we carry out a Zookeeper version upgrade we leave
users to run on an end-of-life version with unpatched CVEs addressed in
later versions. Some users have compliance requirements to only run on
stable versions of a software and its dependencies and not keeping the
dependencies up to date renders them unable to use Kafka.

Please, let me know your thoughts on the matter!

Best,
Christo

On Thu, 9 Mar 2023 at 21:56, Colin McCabe  wrote:

> Hi,
>
> I'm struggling a bit with this KIP, because dropping support for rolling
> upgrades from old Kafka versions doesn't seem like something we should do
> in a minor release. But on the other hand, the next Kafka release won't
> have ZK at all. Maybe we should punt on this until and unless there is a
> security issue that requires some action from us.
>
> I would also add, that a major ZK version bump is pretty risky. Last time
> we did this we hit several bugs. I remember we hit one where there was an
> incompatible change with regard to formatting (sorry, I can't seem to find
> the JIRA right now).
>
> Sorry, but for now I have to vote -1 until I can understand this better
>
> best,
> Colin
>
>
> On Thu, Feb 23, 2023, at 06:48, Divij Vaidya wrote:
> > Thanks for the KIP Christo.
> >
> > Having Zk 3.6 reach EOL in Dec 2022 is a good enough reason to upgrade,
> > hence I completely agree with the motivation. Your experiments have
> > demonstrated that the new version of Zk is stable at scale and the
> backward
> > compatibility risks are acceptable since Apache Kafka 2.4.x is an EOL
> > version.
> >
> > Vote +1 (non binding)
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Thu, Feb 23, 2023 at 3:32 PM Christo Lolov 
> > wrote:
> >
> >> Hello!
> >>
> >> I would like to start the vote for KIP-902, which upgrades Zookeeper to
> >> version 3.8.1.
> >>
> >> The KIP can be found at
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240882784
> >>
> >> The discussion thread is
> >> https://lists.apache.org/thread/5jbn2x0rtmqz5scyoygbdbj4vo0mpbw1
> >>
> >> Thanks
> >> Christo
>


Re: [DISCUSS] KIP-902: Upgrade Zookeeper to 3.8.1

2023-03-06 Thread Christo Lolov
Hello Ismael,

Thank you for the valid points. I have updated the KIP, but do let me know if 
you believe it is still not clear enough.

Best,
Christo

> On 2 Mar 2023, at 15:21, Ismael Juma  wrote:
> 
> Thanks for the KIP. I think the following is a little confusing since it
> doesn't make it clear the the ZooKeeper deployment is separate from Kafka,
> Kafka only includes the ZooKeeper libraries. I think it would be useful to
> explain the upgrade process for someone running Apache Kafka 2.3 and
> ZooKeeper 3.4 (the hardest case) and the same for someone running Apache
> Kafka 2.4 and ZooKeeper 3.5.
> 
> Also, it's worth clarifying that we actually still test direct kafka
> upgrades from 0.8.2 to 3.4. In practice, we have distinguished "providing
> updates" versus "allowing direct upgrades from". Apache Kafka 4.0 will
> change this since you will have to upgrade to a bridge release before
> upgrading to 4.0, but that's a new development.
> 
> "Users who use Kafka clusters with Zookeeper clients older than 3.5.x won't
> be able to communicate with a Zookeeper cluster using 3.8.1. As mentioned
> in the accompanying JIRA ticket Apache Kafka has been using Zookeeper 3.5.x
> since version 2.4 so versions above and including it should be safe for
> this upgrade. It is acceptable to break compatibility with Apache Kafka
> versions prior to 2.4 as they are considered beyond their end of life and
> are not maintained. (source: Time Based Release Plan#WhatIsOurEOLPolicy)."
> 
> Ismael
> 
>> On Wed, Feb 15, 2023 at 1:47 AM Christo Lolov 
>> wrote:
>> 
>> Hello!
>> 
>> I would like to start a discussion for KIP-902: Upgrade Zookeeper to
>> 3.8.1. The Zookeeper version currently used in Kafka reached its end of
>> life in December 2022. You can find the KIP at
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240882784
>> 
>> Thanks in advance for the reviews.
>> 
>> Christo


Re: [DISCUSS] KIP-902: Upgrade Zookeeper to 3.8.1

2023-03-06 Thread Christo Lolov
Hey Luke,

Thank you for the review! My reasoning for going to ZK 3.8.1 directly is that 
if there isn’t a set date for the Kafka 4.0 release it gives us more time 
before we have to (potentially) upgrade ZK again (similar to what is mentioned 
here https://github.com/apache/kafka/pull/12620#issuecomment-1245590870). ZK 
3.7.1 also doesn’t support earlier client/server combinations different to what 
3.8.1 does.

Best,
Christo

> On 2 Mar 2023, at 07:07, Luke Chen  wrote:
> 
> Hi Christo,
> 
> Thanks for the KIP.
> The motivation of upgrading ZK makes sense.
> And thanks for the great analysis for the ZK upgrading.
> 
> One question:
> Since we are going to remove ZK in v4.0, and we don't need the latest
> feature in the "current release" ZK 3.8.1, why can't we choose the "stable
> release" (3.7.1)?
> 
> Thank you.
> Luke
> 
>> On Wed, Feb 15, 2023 at 5:47 PM Christo Lolov 
>> wrote:
>> 
>> Hello!
>> 
>> I would like to start a discussion for KIP-902: Upgrade Zookeeper to
>> 3.8.1. The Zookeeper version currently used in Kafka reached its end of
>> life in December 2022. You can find the KIP at
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240882784
>> 
>> Thanks in advance for the reviews.
>> 
>> Christo


[VOTE] KIP-902: Upgrade Zookeeper to 3.8.1

2023-02-23 Thread Christo Lolov
Hello!

I would like to start the vote for KIP-902, which upgrades Zookeeper to version 
3.8.1.

The KIP can be found at 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240882784

The discussion thread is 
https://lists.apache.org/thread/5jbn2x0rtmqz5scyoygbdbj4vo0mpbw1

Thanks
Christo

[jira] [Resolved] (KAFKA-13690) Flaky test EosIntegrationTest.shouldWriteLatestOffsetsToCheckpointOnShutdown[at_least_once]

2023-02-23 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-13690.
---
Resolution: Fixed

> Flaky test 
> EosIntegrationTest.shouldWriteLatestOffsetsToCheckpointOnShutdown[at_least_once]
> ---
>
> Key: KAFKA-13690
> URL: https://issues.apache.org/jira/browse/KAFKA-13690
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>
> The _at_least_once_ version of the 
> "{*}EosIntegrationTest.shouldWriteLatestOffsetsToCheckpointOnShutdown"{*} 
> test is occasionally failing with
> h3. Error Message
> java.lang.AssertionError: The committed records do not match what expected 
> Expected: <[KeyValue(0, 0), KeyValue(0, 1), KeyValue(0, 3), KeyValue(0, 6), 
> KeyValue(0, 10), KeyValue(0, 15), KeyValue(0, 21), KeyValue(0, 28), 
> KeyValue(0, 36), KeyValue(0, 45)]> but: was <[KeyValue(0, 0), KeyValue(0, 1), 
> KeyValue(0, 3), KeyValue(0, 6), KeyValue(0, 10), KeyValue(0, 10), KeyValue(0, 
> 11), KeyValue(0, 13), KeyValue(0, 16), KeyValue(0, 20), KeyValue(0, 25), 
> KeyValue(0, 31), KeyValue(0, 38)]>
>  
> Seems we are receiving more than the expected records.
> ...of course, this is an ALOS flavor of the {*}EOS{*}IntegrationTest, so 
> perhaps we shouldn't be running this variant at all? Not sure if this 
> explains the exact output we receive but it certainly seems suspicious
>  
> Added at_least_once in [https://github.com/apache/kafka/pull/11283]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13966) Flaky test `QuorumControllerTest.testUnregisterBroker`

2023-02-23 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-13966.
---
Resolution: Fixed

> Flaky test `QuorumControllerTest.testUnregisterBroker`
> --
>
> Key: KAFKA-13966
> URL: https://issues.apache.org/jira/browse/KAFKA-13966
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: David Arthur
>Priority: Major
>
> We have seen the following assertion failure in 
> `QuorumControllerTest.testUnregisterBroker`:
> {code:java}
> org.opentest4j.AssertionFailedError: expected: <2> but was: <0>
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628)
>   at 
> org.apache.kafka.controller.QuorumControllerTest.testUnregisterBroker(QuorumControllerTest.java:494)
>  {code}
> I reproduced it by running the test in a loop. It looks like what happens is 
> that the BrokerRegistration request is able to get interleaved between the 
> leader change event and the write of the bootstrap metadata. Something like 
> this:
>  # handleLeaderChange() start
>  # appendWriteEvent(registerBroker)
>  # appendWriteEvent(bootstrapMetadata)
>  # handleLeaderChange() finish
>  # registerBroker() -> writes broker registration to log
>  # bootstrapMetadata() -> writes bootstrap metadata to log



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] KIP-902: Upgrade Zookeeper to 3.8.1

2023-02-15 Thread Christo Lolov
Hello!

I would like to start a discussion for KIP-902: Upgrade Zookeeper to 3.8.1. The 
Zookeeper version currently used in Kafka reached its end of life in December 
2022. You can find the KIP at 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240882784

Thanks in advance for the reviews.

Christo

Re: [DISCUSS] KIP-899: Allow clients to rebootstrap

2023-01-23 Thread Christo Lolov
Hello!

Thank you for the KIP. I would like to summarise my understanding of the 
problem in case I am wrong.

Currently a long-running client refreshes their metadata from a set of brokers 
obtained when first contacting the cluster. If they have been “away” for too 
long those brokers might have all changed and upon trying to refresh the 
metadata the client will fail because it cannot find an available broker. What 
you propose is that whenever such a situation is encountered the client should 
try to get the new set of brokers by communicating with the bootstrap-servers 
again.

If I have understood this correctly, then I agree with what is proposed as a 
solution in this KIP. To answer your question, in my opinion this behaviour 
should not be guarded by a configuration and should be the default once 
implemented. As a customer of Kafka, I cannot think of a situation where I 
would prefer my clients to give up if they have stale data without even trying 
to get the latest information. As far as I understand, the new behaviour will 
be entirely constrained to the client code which makes this change easier.

As a starting point can we confirm that this is indeed the current behaviour 
either by a reproducible manual test or by a branch with a failing 
unit/integration test? 

Best,
Christo

> On 18 Jan 2023, at 12:07, Ivan Yurchenko  wrote:
> 
> Hello!
> I would like to start the discussion thread on KIP-899: Allow clients to
> rebootstrap.
> This KIP proposes to allow Kafka clients to repeat the bootstrap process
> when fetching metadata if none of the known nodes are available.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-899%3A+Allow+clients+to+rebootstrap
> 
> A question right away: should we eventually change the default behavior or
> it can remain configurable "forever"? The latter is proposed in the KIP.
> 
> Thank you!
> 
> Ivan


RE: Last sprint to finish line: Replace EasyMock/Powermock with Mockito

2023-01-23 Thread Christo Lolov
Hello!

Below you will find the latest state of the Mockito migration.

81% or 39/48 of streams-related tests have been migrated.
The last 5 pull requests which are in need of reviews are:
* https://github.com/apache/kafka/pull/12449
* https://github.com/apache/kafka/pull/12524
* https://github.com/apache/kafka/pull/12777
* https://github.com/apache/kafka/pull/12607
* https://github.com/apache/kafka/pull/12739 (waiting on author)
Once they are merged we can move Streams to JUnit 5!

60% or 9/15 of connect-related tests have been migrated.
The 2 pull requests which are in need of reviews are:
* https://github.com/apache/kafka/pull/12728
* https://github.com/apache/kafka/pull/12781

Best,
Christo


Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

2023-01-18 Thread Christo Lolov
Greetings,

I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
situation where consumer groups are in an undefined state until a rolling
restart of a cluster is performed. While I have demonstrated the behaviour
using a cluster using Zookeeper I believe the same problem can be shown in
a KRaft cluster. Please let me know your opinions on the problem and the
presented solution.

Best,
Christo

On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
>  wrote:
>
>
> Hello!
> I would like to start this discussion thread on KIP-895: Dynamically
> refresh partition count of __consumer_offsets.
> The KIP proposes to alter brokers so that they refresh the partition count
> of __consumer_offsets used to determine group coordinators without
> requiring a rolling restart of the cluster.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
>
> Let me know your thoughts on the matter!
> Best, Christo
>


[jira] [Resolved] (KAFKA-14199) Installed kafka in ubuntu and not able to access in browser. org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1195725856 larger than 10

2023-01-12 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-14199.
---
Resolution: Fixed

> Installed kafka in ubuntu and not able to access in browser.  
> org.apache.kafka.common.network.InvalidReceiveException: Invalid receive 
> (size = 1195725856 larger than 104857600)
> 
>
> Key: KAFKA-14199
> URL: https://issues.apache.org/jira/browse/KAFKA-14199
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Gops
>Priority: Blocker
>
> I am new to kafka. I have installed the zookeeper and kafka in my local 
> ubuntu machine. When i try to access the kafka in my browser 
> [http://ip:9092|http://ip:9092/]  ia m facing this error.
> +++
> [SocketServer listenerType=ZK_BROKER, nodeId=0] Unexpected error from 
> /127.0.0.1; closing connection (org.apache.kafka.common.network.Selector)
> org.apache.kafka.common.network.InvalidReceiveException: Invalid receive 
> (size = 1195725856 larger than 104857600)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:105)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
>     at kafka.network.Processor.poll(SocketServer.scala:989)
>     at kafka.network.Processor.run(SocketServer.scala:892)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> +++
> Also I have checked by updating the socket.request.max.bytes=5 in 
> ~/kafka/config/server.properties file still getting same error
>  
> pls figure it out. Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13805) Upgrade vulnerable dependencies march 2022

2023-01-12 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-13805.
---
Resolution: Fixed

> Upgrade vulnerable dependencies march 2022
> --
>
> Key: KAFKA-13805
> URL: https://issues.apache.org/jira/browse/KAFKA-13805
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.1, 3.0.1
>Reporter: Shivakumar
>Priority: Blocker
>  Labels: secutiry
>
> https://nvd.nist.gov/vuln/detail/CVE-2020-36518
> |Packages|Package Version|CVSS|Fix Status|
> |com.fasterxml.jackson.core_jackson-databind| 2.10.5.1| 7.5|fixed in 2.13.2.1|
> |com.fasterxml.jackson.core_jackson-databind|2.13.1|7.5|fixed in 2.13.2.1|
> Our security scan detected the above vulnerabilities
> upgrade to correct versions for fixing vulnerabilities



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14324) [CVE-2018-25032] introduced by rocksdbjni:6.29.4.1

2022-11-15 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-14324.
---
Resolution: Fixed

> [CVE-2018-25032] introduced by rocksdbjni:6.29.4.1
> --
>
> Key: KAFKA-14324
> URL: https://issues.apache.org/jira/browse/KAFKA-14324
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.2, 3.2.3, 3.3.1
>Reporter: VZhang
>    Assignee: Christo Lolov
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: 6.29.4.1_to_7.1.2_compat_report.html, 
> 6.29.4.1_to_7.7.3_compat_report.html
>
>
> Hi, Team
> There is an old CVE introduced by rocksdbjni-6.29.4.1, which has already been 
> fixed by 
> [https://github.com/facebook/rocksdb/commit/5dbdb197f19644d3f53f75781a3ef56e4387134b]
> [https://nvd.nist.gov/vuln/detail/cve-2018-25032]
> *Current Description:* 
> zlib before 1.2.12 allows memory corruption when deflating (i.e., when 
> compressing) if the input has many distant matches.
> CVE-2018-25032 - CVSS Score:{*}7.5{*} (v3.0) (zlib-1.2.11)
> Please help to upgrade the rocksdb.
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: Last sprint to finish line: Replace EasyMock/Powermock with Mockito

2022-11-08 Thread Christo Lolov
Hello!

This email summarises the current state of Kafka's Mockito migration.

The JIRA tickets used to track the progress are 
https://issues.apache.org/jira/browse/KAFKA-14132 
 and 
https://issues.apache.org/jira/browse/KAFKA-14133 
.

—

Breakdown of https://issues.apache.org/jira/browse/KAFKA-14133 

19/46 ~ 41% are merged
27/46 ~ 59% are in review

A list of pull requests awaiting a review from a committer:
https://github.com/apache/kafka/pull/12739 

https://github.com/apache/kafka/pull/12505 

https://github.com/apache/kafka/pull/12524 

https://github.com/apache/kafka/pull/12818 


—

Breakdown of https://issues.apache.org/jira/browse/KAFKA-14132 

7/17 ~ 41% are merged
6/17 ~ 35% are in review
4/17 ~ 24% are in progress

A list of pull requests awaiting a review from a committer:
https://github.com/apache/kafka/pull/12728 

https://github.com/apache/kafka/pull/12821 


—

A list of pull requests which have been merged since the last update:
https://github.com/apache/kafka/pull/12527 

https://github.com/apache/kafka/pull/12725 

https://github.com/apache/kafka/pull/12823 


A big thank you to Shekhar Prasad Rajak (who recently joined our effort), 
Matthew de Detrich, Dalibor Plavcic, and everyone who has provided reviews over 
the last month!

Best,
Christo



Re: Last sprint to finish line: Replace EasyMock/Powermock with Mockito

2022-09-24 Thread Christo Lolov
Hello,

I have not been able to make a lot of progress on the Mockito migration myself, 
but Yash and Divij opened and merged a PR each.

The following PRs have made it into trunk:

https://github.com/apache/kafka/pull/12615
https://github.com/apache/kafka/pull/12677
https://github.com/apache/kafka/pull/12492

Thank you to Yash and Divij for authoring them and to Chris and Bruno for the 
reviews!

The following PRs are in progress:

In need of reviewer attention - RA
In need of author attention - AA

https://github.com/apache/kafka/pull/12409 (RA)
https://github.com/apache/kafka/pull/12418 (AA)
https://github.com/apache/kafka/pull/12465 (RA)
https://github.com/apache/kafka/pull/12505 (AA)
https://github.com/apache/kafka/pull/12524 (RA)
https://github.com/apache/kafka/pull/12527 (RA)
https://github.com/apache/kafka/pull/12607 (AA)

A summary of the Mockito migration:

https://issues.apache.org/jira/browse/KAFKA-14133 
 - a little under 1/3 of all 
classes using EasyMock have been moved and merged; a bit over 1/3 are in PRs 
and around 1/3 are remaining.

https://issues.apache.org/jira/browse/KAFKA-14132 
 - a little under 1/3 of all 
classes using PowerMock have been moved and merged; a bit over 2/3 are 
remaining.

Best,
Christo



RE: Last sprint to finish line: Replace EasyMock/Powermock with Mockito

2022-09-07 Thread Christo Lolov
Hello!

This is the (roughly) bi-weekly update on the Mockito migration.

Firstly, the following PRs have been merged since the last email so thank you 
to the writers (Yash and Divij) and reviewers (Dalibor, Mickael, Yash, Bruno 
and Chris):

https://github.com/apache/kafka/pull/12459 

https://github.com/apache/kafka/pull/12473 

https://github.com/apache/kafka/pull/12509 


Secondly, this is the latest list of PRs that are in need of a review to get 
them over the line:

https://github.com/apache/kafka/pull/12409 

https://github.com/apache/kafka/pull/12418 
 (I need to respond to the comments 
on this one, so the first action is on me)
https://github.com/apache/kafka/pull/12465 

https://github.com/apache/kafka/pull/12492 

https://github.com/apache/kafka/pull/12505 
 (I need to respond to Dalibor’s 
comment on this one, but the overall PR could use some more eyes)
https://github.com/apache/kafka/pull/12524 

https://github.com/apache/kafka/pull/12527 


Lastly, I am keeping https://issues.apache.org/jira/browse/KAFKA-14133 
 and 
https://issues.apache.org/jira/browse/KAFKA-14132 
 up to date, so if anyone 
has spare bandwidth and would like to assign themselves some of the unassigned 
tests their contributions would be welcome :)

Best,
Christo

  1   2   >