Hi Helix team,

I'm curious if there are any recommendations on how to manage Helix
operationally.  We periodically seem to get ourselves into a state where
Helix shards are not getting assigned, but it's really hard to figure out
*why* that's happening.  Sometimes it's silly things like we're in
maintenance mode and didn't realize it or a node got added with
HELIX_ENABLED:false in its InstanceConfig, but sometimes (like right now
with an issue we're debugging) we genuinely have no idea.

Is there a good way to figure out what the Helix controller thinks it is
doing? Or what it is trying to do and can't?  Unfortunately, the default
logging from the controller (we're using a STANDALONE) has so much stuff in
it that it's almost unusable for us.  Is there a particular logging
configuration you use?  Or a particular set of metrics to monitor?

How do you go about diagnosing scenarios where shards are not getting
assigned?

Thanks!

Brent

Reply via email to