Hi David,
Thanks for the summary. It seems to me that the Flow-like option is best because
it can easily handle cancellations and exceptions, returning the topic
partition information
and signalling when the last of the results have been returned. I think it’s
also equally
applicable to any of th
Andrew/Jose, I like the suggested Flow API. It's also similar to the stream
observers in GPRC. I'm not sure we should expose something as complex as
the Flow API directly in KafkaAdminClient, but certainly we can provide a
similar interface.
---
Cancellations:
Another thing not yet discussed is h
Hi Calvin
On Fri, Feb 23, 2024 at 9:23 AM Calvin Liu wrote:
> As we agreed to implement the pagination for the new API
> DescribeTopicPartitions, the client side must also add a proper interface
> to handle the pagination.
> The current KafkaAdminClient.describeTopics returns
> the DescribeTopics
This is an interesting problem. While it would be possible to use a Consumer,
it can’t handle exceptions really. A java.util.Stream has a similar problem.
I wonder whether an interface which looks like
java.util.concurrent.Flow.Subscriber
would be good. Something like:
public interface AdminResu
Thanks for raising this here, Calvin. Since this is the first "streaming
results" type API in KafkaAdminClient (as far as I know), we're treading
new ground here.
As you mentioned, we can either accept a consumer or return some iterable
result. Returning a java.util.Stream is also an option, and a
Hey,
As we agreed to implement the pagination for the new API
DescribeTopicPartitions, the client side must also add a proper interface
to handle the pagination.
The current KafkaAdminClient.describeTopics returns
the DescribeTopicsResult which is the future for querying all the topics.
It is awkwa
Hi David,
Thanks for the comment.
Yes, we can separate the ELR enablement from the metadata version. It is
also helpful to avoid blocking the following MV releases if the user is not
ready for ELR.
One thing to correct is that, the Unclean recovery is controlled
by unclean.recovery.manager.enabled,
One thing we should consider is a static config to totally enable/disable
the ELR feature. If I understand the KIP correctly, we can effectively
disable the unclean recovery by setting the recovery strategy config to
"none".
This would make development and rollout of this feature a bit smoother.
C
Hi Jun
-Good catch, yes, we don't need the -1 in the DescribeTopicRequest.
-No new value is added. The LeaderRecoveryState will still be set to 1 if
we have an unclean leader election. The unclean leader election includes
the old random way and the unclean recovery. During the unclean recovery,
the
Hi, Calvin,
Another thing. Currently, when there is an unclean leader election, we set
the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord to 1.
With the KIP, will there be new values for LeaderRecoveryState? If not,
when will LeaderRecoveryState be set to 1?
Thanks,
Jun
On Tue
Hi, Calvin,
One more comment.
"The first partition to fetch details for. -1 means to fetch all
partitions." It seems that FirstPartitionId of 0 naturally means fetching
all partitions?
Thanks,
Jun
On Tue, Oct 10, 2023 at 12:40 PM Calvin Liu
wrote:
> Hi Jun,
> Yeah, with the current Metadata
Hi Jun,
Yeah, with the current Metadata request handling, we only return errors on
the Topic level, like topic not found. It seems that querying a specific
partition is not a valid use case. Will update.
Thanks
On Tue, Oct 10, 2023 at 11:55 AM Jun Rao wrote:
> Hi, Calvin,
>
> 60. If the range q
Hi, Calvin,
60. If the range query has errors for some of the partitions, do we expect
different responses when querying particular partitions?
Thanks,
Jun
On Tue, Oct 10, 2023 at 10:50 AM Calvin Liu
wrote:
> Hi Jun
> 60. Yes, it is a good question. I was thinking the API could be flexible t
Hi Jun
60. Yes, it is a good question. I was thinking the API could be flexible to
query the particular partitions if the range query has errors for some of
the partitions. Not sure whether it is a valid assumption, what do you
think?
61. Good point, I will update them to partition level with the
Hi, Calvin,
A few more minor comments on your latest update.
60. DescribeTopicRequest: When will the Partitions field be used? It seems
that the FirstPartitionId field is enough for AdminClient usage.
61. Could we make the limit for DescribeTopicRequest, ElectLeadersRequest,
GetReplicaLogInfo co
Hi David,
Thanks for the comments.
I thought that a new snapshot with the downgraded MV is created in this
case. Isn’t it the case?
Yes, you are right, a metadata delta will be generated after the MV
downgrade. Then the user can start the software downgrade.
-
Could you also elaborate a bi
Hi Calvin,
I thought that a new snapshot with the downgraded MV is created in this
case. Isn’t it the case?
Could you also elaborate a bit more on the reasoning behind adding the
limits to the admin RPCs? This is a new pattern in Kafka so it would be
good to clear on the motivation.
Could you al
Hi Jun,
After the MV downgrade, the controller will write in the old version of the
PartitionRecord/PartitionChangeRecord. If I understand correctly, it is
possible to downgrade the software version if the controller only has to
handle old version records.
However, the controller will not automatic
Hi, Calvin and Justine,
Historically, when we change the record format in the log, we don't support
software version downgrading.
For the record format change in the metadata log, have we thought about
forcing the write of the latest metadata records with the old version
during MV downgrading? Th
Sorry -- not MV but software version.
On Wed, Oct 4, 2023 at 9:51 AM Justine Olshan wrote:
> Catching up with this discussion.
>
> I was just curious -- have we had other instances where downgrading MV is
> not supported? I think Kafka typically tries to support downgrades, and I
> couldn't thin
Catching up with this discussion.
I was just curious -- have we had other instances where downgrading MV is
not supported? I think Kafka typically tries to support downgrades, and I
couldn't think of other examples.
Thanks,
Justine
On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu
wrote:
> Hi Jun,
> 5
Hi Jun,
54. Marked the software downgrading is not supported. As the old controller
will not understand the new PartitionRecord and PartitionChangeRecord.
Thanks!
On Wed, Oct 4, 2023 at 9:12 AM Jun Rao wrote:
> Hi, Calvin,
>
> Thanks for the reply. Just one more comment.
>
> 54. It seems that do
Hi, Calvin,
Thanks for the reply. Just one more comment.
54. It seems that downgrading MV is supported. Is downgrading the software
version supported? It would be useful to document that.
Thanks,
Jun
On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits
wrote:
> Hi Colin,
>
> I think in your example
Hi Colin,
I think in your example "do_unclean_recovery" would need to do different
things depending on the strategy.
do_unclean_recovery() {
if (unclean.recovery.manager.enabled) {
if (strategy == Aggressive)
use UncleanRecoveryManager(waitLastKnownERL=false) // just inspect
logs fr
Hi Jun,
Thanks for the comments. And thanks to Colin's explanation.
41. The unclean recovery manager may need time to be ready for production.
So adding a flag to disable the new feature is in case it has any problems.
The config may be deprecated or removed when we are ready.
50. Sorry for the c
Calvin, thanks for the KIP!
I'm getting up to speed on the discussion. I had a few questions
57. When is the CleanShutdownFile removed? I think it probably happens
after registering with the controller, but it would be good to clarify this.
58. Since the broker epoch comes from the controller, w
On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote:
> Hi, Calvin,
>
> Thanks for the update KIP. A few more comments.
>
> 41. Why would a user choose the option to select a random replica as the
> leader instead of using unclean.recovery.strateg=Aggressive? It seems that
> the latter is strictly better?
Hi, Calvin,
Thanks for the update KIP. A few more comments.
41. Why would a user choose the option to select a random replica as the
leader instead of using unclean.recovery.strateg=Aggressive? It seems that
the latter is strictly better? If that's not the case, could we fold this
option under un
Hi Jun
Thanks for the comments.
40. If we change to None, it is not guaranteed for no data loss. For users
who are not able to validate the data with external resources, manual
intervention does not give a better result but a loss of availability. So
practically speaking, the Balance mode would be
On Mon, Sep 25, 2023, at 15:48, Jun Rao wrote:
> Hi, Calvin,
>
> Thanks for the updated KIP. Made another pass. A few more comments below.
>
> 40. unclean.leader.election.enable.false ->
> unclean.recovery.strategy.Balanced: The Balanced mode could still lead to
> data loss. So, I am wondering if u
Hi, Calvin,
Thanks for the updated KIP. Made another pass. A few more comments below.
40. unclean.leader.election.enable.false ->
unclean.recovery.strategy.Balanced: The Balanced mode could still lead to
data loss. So, I am wondering if unclean.leader.election.enable.false
should map to None?
41
Hi Calvin,
Thanks for the explanations. I like the idea of using none, balanced,
aggressive. We also had an offline discussion about why it is good to use a new
config key (basically, so that we can deprecate the old one which had only
false/true values in 4.0) With these changes, I am +1.
bes
Hi Colin,
Also, can we deprecate unclean.leader.election.enable in 4.0? Before that,
we can have both the config unclean.recovery.strategy and
unclean.leader.election.enable
and using the unclean.recovery.Enabled to determine which config to use
during the unclean leader election.
On Mon, Sep 18,
Hi Colin,
For the unclean.recovery.strategy config name, how about we use the
following
None. It basically means no unclean recovery will be performed.
Aggressive. It means availability goes first. Whenever the partition can't
elect a durable replica, the controller will try the unclean recovery.
B
Hi Colin,
> So, the proposal is that if someone sets "unclean.leader.election.enable
= true"...
The idea is to use one of the unclean.leader.election.enable and
unclean.recovery.strategy based on the unclean.recovery.Enabled. A possible
version can be
If unclean.recovery.Enabled:
{
Check uncl
On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote:
> Hi Colin
> 1. I think using the new config name is more clear.
>a. The unclean leader election is actually removed if unclean
> recovery is in use.
>b. Using multiple values in unclean.leader.election.enable is
> confusing and it w
Hi Colin
1. I think using the new config name is more clear.
a. The unclean leader election is actually removed if unclean
recovery is in use.
b. Using multiple values in unclean.leader.election.enable is
confusing and it will be more confusing after people forget about this
discussio
Hi Calvin,
Thanks for the changes.
1. Earlier I commented that creating "unclean.recovery.strategy " is not
necessary, and we can just reuse the existing "unclean.leader.election.enable"
configuration key. Let's discuss that.
2.I also don't understand why you didn't add a configuration to enab
Hi Colin
Thanks for the comments!
I did the following changes
1.
Simplified the API spec section to only include the diff.
2.
Reordered the HWM requirement section.
3.
Removed the URM implementation details to keep the necessary
characteristics to perform the unclean recov
On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote:
> Hi Colin
> Thanks for the comments!
>
Hi Calvin,
Thanks again for the KIP.
One meta-comment: it's usually better to just do a diff on a message spec file
or java file if you're including changes to it in the KIP. This is easier to
read than l
Hi Colin
Thanks for the comments!
Rewrite the Additional High Watermark advancement requirement
There was feedback on this section that some readers may not be familiar
with HWM and Ack=0,1,all requests. This can help them understand the
proposal. I will rewrite this part for more readability.
Un
Hi Jun,
I have a second thought on the kafka.replication.electable_replicas_count.
I don't think it is a useful metric and it seems wasteful to have a per
partition metric to track the sum(ISR, ELR). I will remove this metric.
On Tue, Sep 12, 2023 at 3:35 PM Calvin Liu wrote:
> Hi Jun,
> Thanks
Hi Jun,
Thanks for the follow-up. Adjusted the KIP as following.
16. The controller does not know whether a broker is live or not when it is
fenced. Updated the behavior. In Balanced mode, the URM will start recovery
when all the members in the LastKnownELR are all unfenced. In the Proactive
mode,
Also, we should have metrics that show what is going on with regard to the
eligible replica set. I'm not sure exactly what to suggest, but something that
could identify when things are going wrong in the clsuter.
For example, maybe a metric for partitions containing replicas that are
ineligible
Hi Calvin,
Thanks for the KIP. I think this is a great improvement.
> Additional High Watermark advance requirement
Typo: change "advance" to "advancement"
> A bit recap of some key concepts.
Typo: change "bit" to "quick"
> Ack=1/all produce request. It defines when the Kafka server should re
Hi, Calvin,
Thanks for the reply and the updated KIP. A few more comments.
16. "The URM will query all the replicas including the fenced
replicas." Currently, could we tell whether a fenced broker is alive or
not? Ideally, we only want to query the live brokers. Otherwise URM will be
forced to wa
Hi Artem
Thanks so much for the comments!
1. Yes, you are right, when the leader gets fenced, it will be put into
ELR. The unclean recovery can only be triggered if the mode is Proactive.
Let me clarify the trigger requirement in the KIP.
2. Good point, the controller should wait for all the Last
Hi Jun
Thanks for the comments!
10. Updated
11. It is mentioned in the ELR invariants section. It is mainly to save
metadata space.
12. Good catch, let me update the graph.
13. The thing we did not change is the original HWM advance requirement
“replicate to maximum ISR”. The proposal adds anot
Hi Calvin,
Thanks for the KIP. The new ELR protocol looks good to me. I have some
questions about unclean recovery, specifically in "balanced" mode:
1. The KIP mentions that the controller would trigger unclear recovery when
the leader is fenced, but my understanding is that when a leader is fe
Hi, Calvin,
Thanks for the KIP. A few comments below.
10. "The High Watermark forwarding still requires a quorum within the ISR."
Should it say that it requires the full ISR?
11. "remove the duplicate member in both ISR and ELR from ELR." Hmm, not
sure that I follow since the KIP says that ELR d
Hi Jack
Thanks for the comment.
I have updated the reassignment part. Now the reassignment can only be
completed/canceled if the final ISR size is larger than min ISR.
Thanks to your efforts of the TLA+! It has been a great help to the KIP!
On Wed, Sep 6, 2023 at 6:32 AM Jack Vanlightly
wrote:
>
Hi Calvin,
Regarding partition reassignment, I have two comments.
I notice the KIP says "The AlterPartitionReassignments will not change the ELR"
however, when a reassignment completes (or reverts) any replicas removed from
the replica set would be removed from the ELR. Sounds obvious but I fig
Hi Justine
1. With the new proposal, in order to let the consumer consume a message
when only 1 replica commits it to its log, the min ISR has to be set to 1.
2. Yes, we will need an unclean recovery if the leader has an unclean
shutdown.
3. If the min ISR config is changed to a larger value, the I
Hey Calvin,
Thanks for the responses. I think I understood most of it, but had a few
follow up questions
1. For the acks=1 case, I was wondering if there is any way to continue
with the current behavior (ie -- we only need one ack to produce to the log
and consider the request complete.) My under
Hi Justine
Thanks for the questions!
*a. For my understanding, will we block replication? Or just the high
watermark advancement?*
- The replication will not be blocked. The followers are free to
replicate messages above the HWM. Only HWM advancement is blocked.
b. *Also in the acks=1 case,
Hey Calvin,
Thanks for the KIP. This will close some of the gaps in leader election! I
has a few questions:
*>* *High Watermark can only advance if the ISR size is larger or equal
to min.insync.replicas*.
For my understanding, will we block replication? Or just the high watermark
advancement?
Al
1. Yes, the new protocol requires 2 things to advance the HWM. a) The
messages have been replicated to the controller-committed ISR members. b)
The number of ISR members should be at least the min ISR.
2. With the current protocol, we are not able to select broker 1 as the
leader. If
Hi Calvin,
Thanks for the KIP! I'm still digesting it but I have two questions:
> In the scenario raised in the motivation section, the server may receive
ack=1 messages during T1 and advance High Watermark when the leader
is the only one in ISR.
To confirm, the current protocol allows advancing
Hi everyone,
I'd like to discuss a series of enhancement to the replication protocol.
A partition replica can experience local data loss in unclean shutdown
scenarios where unflushed data in the OS page cache is lost - such as an
availability zone power outage or a server error. The Kafka replicat
59 matches
Mail list logo