I would have thought you'd want ZK up before Kafka started, but I don't have any strong data to back that up. On Sat, 8 Aug 2015 at 7:59 AM Steve Miller <[email protected]> wrote:
> So... we had an extensive recabling exercise, during which we had to > shut down and derack and rerack a whole Kafka cluster. Then when we > brought it back up, we discovered the hard way that two hosts had their > "rebuild on reboot" flag set in Cobbler. > > Everything on those hosts is gone as a result, of course. And a total > of four partitions had their primary and their replica on the two hosts > that were nuked. > > This isn't the end of the world, in some sense: it's annoying, but > that's why we did this now before we brought the cluster into "real" > production rather than being in a pre-production state. The data is all > transient anyway (well, except for _schemas, of course, which in accordance > to Murphy's law was one of the topics affected, but we have that mirrored > elsewhere). > > Still, if there's an obvious way to recover from this, I couldn't find > it googling around for a while. > > What's the recommended approach here? Do we need to delete these > topics and start over? Do we need to delete *everything* and start over? > > (Also, other than "don't do that!" what's the recommended way to deal > with the situation where you need to take a whole cluster down all at > once? Any order of operations related to how you shut down all the Kafka > nodes, especially WRT how you shut down Zookeeper? We deliberately brought > Kafka up first *without* ZK, then brought up ZK, so that the brokers > wouldn't go nuts with leader election and the like, which seemed to make > sense, FWIW.) > > -Steve > -- -- Daniel
