Hi Bart, Before changing anything, I would verify whether or not the affected broker is trying to catch up. Have you looked at the broker’s log? Do you see any errors? Check your metrics or the partition directories themselves to see if data is flowing into the broker.
If you do want to reset the broker to have it start a fresh resync, stop the kafka broker service/process, 'rm -rf /path/to/kafka-logs' — check the value of your log.dir or log.dirs property in your server.properties file for the path — and then start the service again. It should check in with zookeeper and then start following the topic partition leaders for all the topic partition replicas assigned to it. -- Peter >> On Oct 18, 2019, at 12:16 AM, Bart van Deenen <[email protected]> >> wrote: > Hi all > > We had a Kafka broker failure (too many open files, stupid), and now the > partitions on that broker will no longer become part of the ISR set. It's > been a few days (organizational issues), and we have significant amounts of > data on the ISR partitions. > > In order to make the partitions on the broker become part of the ISR set > again, should I: > > * increase `replica.lag.time.max.ms` on the broker to the number of ms that > the partitions are behind. I can guesstimate the value to about 7 days, or > should I measure it somehow? > * stop the broker and wipe files (which ones?) and then restart it. Should I > also do stuff on zookeeper ? > > Is there any _official_ information on how to deal with this situation? > > Thanks for helping!
