I have a 5 VM cluster with 16 gig 8 core machines, and 3 of the machines are 
worker nodes.  Can anyone give input on how many topologies should/can be run 
on the cluster?  We are currently running 40 topologies in this dev cluster and 
having tons of stability and topology startup issues.  These topologies are 
running bursty workloads in a few of these topologies, but mostly they are 
doing nothing.

I'm looking for a sanity check as we are having severe stability issues with 
(nimbus crashing, supervisors crashing, topologies failing to startup.).  Our 
topologies are failing to startup because the 3 workers launch with too big a 
time lag between them (~3 minutes),and by the time the 2nd and 3rd startup the 
first have given up making netty connections to the others.  Once our 
topologies fail to connect they give up.

We have tuned the retry params used for the worker instances to connect so that 
they retry slower, and now they are connecting, but still hanging after 
connecting.  By hanging I mean the PIDs are still alive, but the logs stop 
logging and zero kafka message flow into even the 1 good worker topology that 
is still logging.    We are wondering if the retries are filling up 
stdout/stderr and if that is causing the thread to block.

Any input and help wold be appreciated.

Reply via email to