I'm sure I'm missing something simple here but dang if I can find it. One of servers in our FOG died and was out of service for a month. We recently returned the machine to service with a fresh OS and Sunray 3.1 install into the FOG. Everything is working normally as far as load balance goes but utfwsync is failing on the replaced server. Log entries in auth_log on the replaced server show token communication errors to all other servers in the FOG whenever I try to run a utfwsync from the primary server. Likewise, the primary server shows the same errors to the replaced server during fwsync. Ping works to and from all interfaces on the dedicated interconnect between all servers so I know they can see each other. All servers are at srss3.1 with the same patch level. Firmware download does work from the replaced server because DTU's that reboot and connect to it are getting firmware updates. Running utgstatus on all FOG members gives identical output and is correct. Also, utpolicy output is identical and utreplica -l looks okay. I ran a snoop against the interconnect and I can see the multicast traffic between the servers coming and going.

The only thing different between the replaced server then and now is that we used a different physical interface for the Lan connection which would cause the mac address of the lan connection to be different from before.

We reboot these machines once a week and run utfwsync to balance the pseudo session load after all servers have rebooted. The effect now is that the sync runs on 3 of the 4 servers causing all DTUs to connect to the one where the sync fails defeating the purpose of the whole thing.

Thanks in advance.
_______________________________________________
SunRay-Users mailing list
[email protected]
http://www.filibeto.org/mailman/listinfo/sunray-users

Reply via email to