Hi Helix team, I'm curious if there are any recommendations on how to manage Helix operationally. We periodically seem to get ourselves into a state where Helix shards are not getting assigned, but it's really hard to figure out *why* that's happening. Sometimes it's silly things like we're in maintenance mode and didn't realize it or a node got added with HELIX_ENABLED:false in its InstanceConfig, but sometimes (like right now with an issue we're debugging) we genuinely have no idea.
Is there a good way to figure out what the Helix controller thinks it is doing? Or what it is trying to do and can't? Unfortunately, the default logging from the controller (we're using a STANDALONE) has so much stuff in it that it's almost unusable for us. Is there a particular logging configuration you use? Or a particular set of metrics to monitor? How do you go about diagnosing scenarios where shards are not getting assigned? Thanks! Brent
