> On Jan. 30, 2016, 9:53 p.m., Joris Van Remoortere wrote: > > src/tests/group_tests.cpp, lines 451-452 > > <https://reviews.apache.org/r/42988/diff/4/?file=1226926#file1226926line451> > > > > Maybe a comment explaining that we're triggering the timeout? Or is > > this too self-explanatory?
Done. - Neil ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42988/#review117118 ----------------------------------------------------------- On Jan. 30, 2016, 10:20 p.m., Neil Conway wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42988/ > ----------------------------------------------------------- > > (Updated Jan. 30, 2016, 10:20 p.m.) > > > Review request for mesos and Joris Van Remoortere. > > > Bugs: MESOS-4546 > https://issues.apache.org/jira/browse/MESOS-4546 > > > Repository: mesos > > > Description > ------- > > The previous implementation of `GroupProcess` tried to establish a single > ZooKeeper connection on startup, but didn't attempt to retry. ZooKeeper will > retry internally, but it only retries by attempting to reconnect to a list of > previously resolved IPs; it doesn't attempt to re-resolve those IPs to pickup > updates to DNS configuration. Because DNS configuration can be quite dynamic, > we now close the current Zk handle and open a new one if we've seen a > successful `zookeeper_init` but haven't been connected within the ZooKeeper > session timeout. > > > Diffs > ----- > > src/tests/group_tests.cpp 77349465e0163c8aa6bed6deefe3f98efb442f3d > src/zookeeper/group.hpp cf82fec290a2fa9bec122539c2eb0f12b45c2fb2 > src/zookeeper/group.cpp 2ae3193e0e138c90b205d45400d80e80853e1b99 > src/zookeeper/zookeeper.cpp 3c4fdad972dcd1728c52a05970646c713dcf98c8 > > Diff: https://reviews.apache.org/r/42988/diff/ > > > Testing > ------- > > make check, on both OSX and Arch Linux. Manually configured a situation in > which the Mesos agent uses stale DNS information in a loop: validated that > without the patch, we don't pickup DNS changes, whereas with the patch, we do. > > Also added a new unit test. Verified that the test fails w/o this patch > applied and passes deterministically (`gtest_repeat=100`) with the patch > applied. > > > Thanks, > > Neil Conway > >