Hi all, we have a recurring problem with our virtual routers. By the log messages it seems that com.cloud.agent.api.CheckRouterCommand runs into a timeout and therefore switches to UNKNOWN.
All network traffic through the routers is still working. They can be accessed by their link-local IP adresses, and configuration looks good at a first sight. But configuration changes through the CloudStack API do no longer reach the routers. A reboot fixes the problem. I would like to investigate a little further but lack understanding about how the checkRouter command is trying to access the virtual router. Could someone point me to some relevant documentation or give a short overview how the connection from CS-Management is done and where such an timeout could occur? As background information - the sequence from the management log looks kind of this: --- x Every few seconds the com.cloud.agent.api.CheckRouterCommand returns a state BACKUP or MASTER correctly x When the problem occurs the log messages change. Some snippets below x ... Waiting some more time because this is the current command x ... Waiting some more time because this is the current command x Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions x Timed out on Seq 28-2352567855348137104 x Seq 28-2352567855348137104: Cancelling. x Operation timed out: Commands 2352567855348137104 to Host 28 timed out after 60 x Unable to update router r-2594-VM's status x Redundant virtual router (name: r-2594-VM, id: 2594) just switch from MASTER to UNKNOWN x Those error messages are now repeated for each following CheckRouterCommand until the virtual router is rebootet Greetings, Melanie -- -- Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030 / 40 50 51 - 0 Fax: 030 / 40 50 51 - 19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin