Willow Garage is designing a robotic research platform and completely open-source robotic software framework. We are attempting to use NTP to handle the task of maintaining synchronization of the clocks within our system. Unfortunately, we are having an extremely difficult time finding an appropriate configuration. We are looking for someone to help us figure out the correct NTP configuration for our use case, or determine if NTP is even capable of doing what we want.
Our configuration is 4 machines connected on a local gigabit network located on a mobile robotic base. These machines are subject to frequently being powered down or restarted. In order to use the robot, the clocks on these machines must be self-synchronized to less than 1 millisecond. Ping times between machines on this local network vary between 100 microseconds, and 1ms depending on saturation of the network by sensor data streams. The 4 machines are connected to the rest of the world through a wireless link. The delay time on the wireless link is much more variable: in the range of 2ms to 300ms depending on the quality of the link and the amount of data going over the wire. We don't care nearly as much about synchronization between the robot and the outside world, though it would be nice to avoid unbounded drift. A synchronization in the range of 10's of ms would be acceptable. Our present configuration is made up of 1 machine syncing to an external server over the wireless link and acting as a local server for the robot. The remaining 3 machines then sync to this local server. Operating under "stable" conditions, this configuration seems to work well and eventually converges to our sub-millisecond criteria. However, we have 2 large problems. 1) When the operating conditions suddenly change, the system diverges dramatically, and sometimes becomes unstable/divergent. In particular, a pathological case we have seen is when the wireless link is near saturation for an extended period of time such as when copying over multi-gigabyte log files over the course of several hours. Once the transfer completes and the wireless link opens up again, the delay time across the wireless link plummets, the local server immediately diverges from the external server by around 30 ms. After this initial divergence, the local server stops qualifying as a good source of time, and the remaining 3 machines start drifting apart in independent directions. 2) When the system is in a non-converged state, such as after diverging in case 1, or on boot, the time it takes for the system to converge is unacceptably long. If I disable NTP, and run ntpdate on each of the client machines, I can synchronize them to within 1 ms, but as soon as I start NTP again, all of the clocks begin to diverge, often taking hours to re-converge back to to steady state. We are looking for a way of configuring the system to be robust to sudden changes in otherwise stable network latency, and additionally looking for a way to get the local system to converge to sub-ms offsets on the order of minutes instead of hours. Does anyone have suggestions for best practices in configuring an NTP network for these conditions? Thanks, --Jeremy Leibs _______________________________________________ questions mailing list [email protected] https://lists.ntp.org/mailman/listinfo/questions
