Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-05 Thread Ron Dagostino
> How to attract more feedback from committers for this proposal Hi Igor. I'll review the KIP later this week Ron On Mon, Jun 5, 2023 at 1:37 PM Igor Soarez wrote: > > Hi all, > > We just had a video call to discuss this KIP and I just wanted > update this thread with a note on the meeting. >

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-05 Thread Igor Soarez
Hi all, We just had a video call to discuss this KIP and I just wanted update this thread with a note on the meeting. Attendees: - Igor - Christo - Divij - Colt Items discussed: - Context, motivation and overview of the proposal. - How log directories are identified by each Broker. - How old

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-31 Thread Igor Soarez
Hi all, I have created a TLA+ specification for this KIP, available here: https://github.com/soarez/kafka/blob/kip-858-tla-plus/tla/Kip858.tla If there are no further comments I'll start a voting thread next week. -- Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-30 Thread Igor Soarez
Hi Alexandre, Thank you for having a look at this KIP, and thank you for pointing this out. I like the idea of expanding the health status of a log directory beyond just online/offline status. This KIP currently proposes a single logdir state transition, from online to offline, conveyed in a

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-25 Thread Christo Lolov
Heya! 5th of June 16:30 - 17:00 UTC works for me. Best, Christo On Thu, 25 May 2023 at 15:14, Igor Soarez wrote: > Hi Divij, Christo, > > Thank you for pointing that out. > > Let's aim instead for Monday 5th of June, at the same time – 16:30-17:00 > UTC. > > Please let me know if this doesn't

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-25 Thread Igor Soarez
Hi Divij, Christo, Thank you for pointing that out. Let's aim instead for Monday 5th of June, at the same time – 16:30-17:00 UTC. Please let me know if this doesn't work either. Best, -- Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-25 Thread Alexandre Dupriez
Hi, Igor, Thanks for the excellent, thorough and very comprehensive KIP. Although not directly in scope of the KIP, but related to it, I would have the following question about a potential future work on disk degradation. Today, what characterises as a disk failure in Kafka is an I/O exception

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-25 Thread Christo Lolov
Heya Igor! I don't have any concerns or suggestions for improvements at this stage - the overall approach makes sense to me! I would be quite interested in attending a call, but as Divij has pointed out the 29th of May is a public holiday, so I won't be able to make that date. If there is

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-24 Thread Divij Vaidya
Hey Igor Just calling out that Monday, 29th May is a bank holiday in lots of EU countries (including the UK) which may restrict attendance?! -- Divij Vaidya On Tue, May 23, 2023 at 6:59 PM Igor Soarez wrote: > Hi everyone, > > Someone suggested at the recent Kafka Summit that it may be

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-23 Thread Igor Soarez
Hi everyone, Someone suggested at the recent Kafka Summit that it may be useful to have a video call to discuss remaining concerns. I'm proposing we have a video call Monday 29th May 16:30-17:00 UTC. If you'd like to join, please reply to the thread or to me directly so I can send you a link.

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-23 Thread Igor Soarez
Hi Christo, Thank you for your interest in this KIP. Indeed, I'd like to open up voting ASAP. I'm hoping there will still be a bit more feedback, but if not I'll probably request a vote next week or so. Do you have any concerns or suggestions regarding this KIP? I'll have a look at your KIP

RE: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-22 Thread Christo Lolov
Hello Igor! I have been working on a KIP to extend the functionality of JBOD broker disk failures (https://cwiki.apache.org/confluence/display/KAFKA/KIP-928%3A+Making+Kafka+resilient+to+log+directories+becoming+full) and was wondering what is the state of this KIP - were you planning on

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-04-26 Thread Igor Soarez
Thank you for another review Ziming, much appreciated! 1. and 2. You are correct, it would be a big and perhaps strange difference. Since our last exchange of emails, the proposal has changed and now it does follow your suggestion to bump metadata.version. The KIP mentions it under

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-04-24 Thread ziming deng
Thank you for the continuous work, I have some small problems related to the implementation details: 1. We have decided to add a new metadata.version, and you have said that "We should also avoid gating on metadata.version to include the `OnlineLogDirectories` field in the broker

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-04-17 Thread Igor Soarez
Hi Jun, Thank you for sharing your questions, please find my answers below. 41. There can only be user partitions on `metadata.log.dir` if that log dir is also listed in `log.dirs`. `LogManager` does not specifically load contents from `metadata.log.dir`. The broker will communicate UUIDs to

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-24 Thread Igor Soarez
Hi all, I’ve had to step away from work for personal reasons for a couple of months – until mid April 2023. I don’t think I’ll be able to continue to address feedback or update this KIP before then. -- Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-17 Thread Jun Rao
Hi, Igor, Thanks for the reply. A few more replies and comments. 31. Thanks for the explanation. This looks good to me then. 35. Yes, you are right. 36. Yes, this seems fine since the KRaft controller allows broker requests before it's being unfenced. 37. Yes, it's probably simpler without

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-06 Thread Igor Soarez
Hi David, Thank you for your suggestions and for having a look at this KIP. 1. Yes, that should be OK. I have updated the section "Migrating a cluster in ZK mode running with JBOD" to reflect this. 2. I've updated the motivation section to state that. Best, -- Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-03 Thread Igor Soarez
Hi Jun, Thank you for your comments and questions. 30. Thank you for pointing this out. The isNew flag is not available in KRaft mode. The broker can consider the metadata records: If, and only if, the logdir assigned is Uuid.ZERO then the replica can be considered new. Being able to determine

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-03 Thread Igor Soarez
Hi Tom, Thank you for having another look. 20. That is a good point. Thinking about your suggestion: How would this look like in a non-JBOD KRraft cluster upgrade to JBOD mode? Upgrading to version that includes the JBOD support patch would automatically update meta.properties to include the

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-01 Thread David Arthur
Igor, thanks for the KIP! 1. For the ZK migration part, I wonder if we could avoid monitoring the directory failure ZNode while in dual-write mode. So far in the migration design, we have avoided reading anything from ZK after the initial metadata migration. We have modified the ZK brokers to use

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-01-25 Thread Jun Rao
Hi, Igor, Thanks for the reply. A few more comments/questions. 30. In ZK mode, LeaderAndIsr has a field isNew that indicates whether a replica is newly created or not. How do we convey the same thing with metadata records in KRaft mode? 31. If one replaces a log dir with a disk, I guess we need

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-01-23 Thread Tom Bentley
Hi Igor, Thanks for your replies on points 21-25. Those all LGTM. See below for a further thought on the meta.properties upgrade. Kind regards, Tom On Fri, 13 Jan 2023 at 18:09, Igor Soarez wrote: > Hi Tom, > > Thank you for having another look. > > 20. Upon a downgrade to a Kafka version

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-01-13 Thread Igor Soarez
Hi Tom, Thank you for having another look. 20. Upon a downgrade to a Kafka version that runs the current "version == 1" assertion, then yes — a downgrade would not be possible without first updating (manually) the meta.properties files back to the previous version. We could prevent this issue

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-01-10 Thread Tom Bentley
Hi Igor, 20. The description of the changes to meta.properties says "If there any meta.properties file is missing directory.id a new UUID is generated, and assigned to that log directory by updating the file", and the upgrade/migration section says "As the upgraded brokers come up, the existing

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-01-03 Thread Igor Soarez
Hi Jun, Thank you for having another look. 11. That is correct. I have updated the KIP in an attempt to make this clearer. I think the goal should be to try to minimize the chance that a log directory may happen while the metadata is incorrect about the log directory assignment, but also have a

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-12-21 Thread Jun Rao
Hi, Igor, Thanks for the reply. 11. Yes, your proposal could work. Once the broker receives confirmation of the metadata change, I guess it needs to briefly block appends to the old replica, make sure the future log fully catches up and then make the switch? 13 (b). The kafka-storage.sh is only

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-12-01 Thread Igor Soarez
Hi Jun, Thank you for reviewing the KIP. Please find my replies to your comments below. 10. Thanks for pointing out this typo; it has been corrected. 11. I agree that the additional delay in switching to the future replica is undesirable, however I see a couple of issues if we forward the

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-11-16 Thread Jun Rao
Hi, Igor, Thanks for the KIP. A few comments below. 10. In the example section, there were two /mnt/d1/meta.properties. One of them should be d2. 11. "When replicas are moved between directories, using the existing AlterReplicaLogDirs RPC, when the future replica has caught up with the current

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-11-02 Thread Igor Soarez
Hi Tom, Thank you for having a look. 0. Thanks for pointing this out. I think it's worth having a new sub-command in `kafka-storage.sh` — `update-directories` — more suitable for situations where `metadata.properties` already exists. I've updated the section with further detail. When upgrading

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-11-01 Thread Tom Bentley
Hi Igor, Thanks for the KIP, I've finally managed to take an initial look. 0. You mention the command line tools (which one?) in the public interfaces section, but don't spell out how it changes -- what options are added. Reading the proposed changes it suggests that there are no changes to the

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-10-25 Thread Igor Soarez
Hello, There’s now a proposal to address ZK to KRaft migration — KIP-866 — but currently it doesn't address JBOD so I've decided to update this proposal to address that migration scenario. So given that: - When migrating from a ZK cluster running JBOD to KRaft, brokers registering in KRaft

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-09-27 Thread Igor Soarez
Hi Ziming, Thank you for having another look at this KIP, and please accept my apologies as to my delay in replying. The migration introduces JBOD support, so before the migration there should only be one log directory configured per broker. This assumption simplifies how the controller

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-09-01 Thread deng ziming
Hi Igor, I think this KIP can solve the current problems, I have some problems relating to the migration section. Since we have bumped broker RPC version and metadata record version, there will be some problems between brokers/controllers of different versions. In ZK mode we use IBP as a flag

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-08-18 Thread Igor Soarez
Hi Ziming, I'm sorry it took me a while to reply. Thank you for having a look at this KIP and providing feedback. > 1. We have a version field in meta.properties, currently it’s 1, and we can > set it to 2 in this KIP, and we can give an example of server.properties and > it’s corresponding

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-08-18 Thread Igor Soarez
Hi Jason, Apologies for the delay in this reply. Thank you for having having a look at this KIP and sharing your suggestions. > 1. (nit): Instead of "storage id," maybe we could call it "directory id"? > It seems a little clear since each log dir gets a unique id. I agree, "directory id" is a

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-08-09 Thread deng ziming
Hi, Igor, Thanks for this great work, left some questions, 1. We have a version field in meta.properties, currently it’s 1, and we can set it to 2 in this KIP, and we can give an example of server.properties and it’s corresponding meta.properties generated by the storage command tool. 2. When

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-08-08 Thread Jason Gustafson
Hi Igor, Thanks for the KIP. It looks like it's on a good track. I have a few suggestions to throw into the mix: 1. (nit): Instead of "storage id," maybe we could call it "directory id"? It seems a little clear since each log dir gets a unique id. 2. Rather than introducing a new RPC to

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-08-02 Thread Igor Soarez
Hi José, Thanks for having a look at this KIP and thanks for pointing this out, I've had a look at KIP-856. It's good to see there's some overlap in our proposals. we're both proposing: - Identifying log directories with a UUID - Extending the storage tool to ensure each log directory has a

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-07-27 Thread José Armando García Sancio
Hi Igor, Thanks for the KIP. Looking forward to this improvement. I'll review your KIP. I should mention that I started a discussion thread on KIP-856: KRaft Disk Failure Recovery at https://lists.apache.org/thread/ytv0t18cplwwwqcp77h6vry7on378jzj Both keep introducing similar concepts. For