> On June 3, 2015, 2:24 p.m., Ben Mahler wrote:
> > src/tests/partition_tests.cpp, lines 647-650
> > <https://reviews.apache.org/r/29507/diff/7/?file=973888#file973888line647>
> >
> >     This sounds like testing two particular behaviors in a single test:
> >     
> >     (1) A registered slave that never receives any Ping messages will 
> > attempt to reregister after the master's ping timeout.
> >     
> >     (2) Master will consider it shutdown after the same timeout.
> >     
> >     I believe we already have tests for both of these. For (1) we could 
> > just add the shutdown message expectation to 
> > `PartitionTest.PartitionedSlave`. For (2), we have 
> > SlaveTest.PingTimeoutNoPings, SlaveTest.PingTimeoutNoPings.
> 
> Adam B wrote:
>     Yes, I see now that SlaveTest.PingTimeoutNoPings already satisfies (1), 
> and PartitionTest.PartitionedSlave could be modified to check for the 
> ShutdownMessage() to satisfy (2).
>     I originally started trying to write unit tests for 
> registered/unregistered slaves that encountered each type of a one-way 
> partition, with both shorter and longer-than-default ping timeouts, largely 
> for my own understanding of the process. Testing longer timeouts proved 
> unnecessary, and testing unregistered slaves didn't actually test my feature. 
> I posted these two since they worked and explicitly tested the one-way 
> partitions I was trying to test. But this one can almost certainly be covered 
> by the tests you mentioned. I will update in a follow-up patch.

It turns out that the shutdown message expectation is already covered in 
SlaveRecoveryTest.PartitionedSlave, which is similar to, but in many ways a 
superset of PartitionTest.PartitionedSlave. I have removed my two new tests 
completely and relied on SlaveRecoveryTest.PartitionedSlave and 
SlaveTest.PingTimeoutNoPings, with the latter updated to use custom ping 
timeout values.
For the updated test, see new RR: https://reviews.apache.org/r/35958


> On June 3, 2015, 2:24 p.m., Ben Mahler wrote:
> > src/tests/partition_tests.cpp, lines 600-602
> > <https://reviews.apache.org/r/29507/diff/7/?file=973888#file973888line600>
> >
> >     I'm a bit confused by this test, why are you expecting the first ping? 
> > Why are you dropping the pongs?
> >     
> >     I would expect the pings to be dropped entirely here so that advancing 
> > the clock after a slave is registered leads to a re-registration.
> >     
> >     Also, did you know about SlaveTest.PingTimeoutNoPings and 
> > SlaveTest.PingTimeoutSomePings tests? Is there something being tested here 
> > that is not captured in these?
> 
> Adam B wrote:
>     This test was intended to simulate a one-way partition where the slave 
> can no longer reach the master, after the slave initially successfully 
> registered. Hence the PONGs are dropped, but not the PINGs.

Even though we're dropping PONGs instead of PINGs here, it's still only testing 
that the Slave reregisters after the timeout. If anything, it should be testing 
that a master that never receives PONGs will shutdown the slave. This behavior 
is already tested by SlaveRecoveryTest.PartitionedSlave. No new test needed.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29507/#review86461
-----------------------------------------------------------


On June 26, 2015, 3:12 a.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29507/
> -----------------------------------------------------------
> 
> (Updated June 26, 2015, 3:12 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2110
>     https://issues.apache.org/jira/browse/MESOS-2110
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added new --slave_ping_timeout and --max_slave_ping_timeouts flags
> to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs)
> and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5).
>    
> These can be extended if slaves are expected/allowed to be down for
> longer than a minute or two.
> 
> Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage.
>   
> Beware that this affects recovery from network timeouts as well as
> actual slave node/process failover.
> 
> Also fixed the log message in recoveredSlavesTimeout() to correctly
> reference flags.slave_reregister_timeout instead of the unrelated
> ping timeouts.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md aaf65bf 
>   docs/upgrades.md 73e503c 
>   src/master/constants.hpp 072d59c 
>   src/master/constants.cpp 997b792 
>   src/master/flags.hpp 55ed3a9 
>   src/master/flags.cpp 4377715 
>   src/master/master.cpp 0782b54 
>   src/messages/messages.proto 1c8d79e 
>   src/slave/constants.hpp 84927e5 
>   src/slave/constants.cpp d8d2f98 
>   src/slave/slave.hpp f1cf3b8 
>   src/slave/slave.cpp b3e1ccc 
>   src/tests/partition_tests.cpp f7ee3ab 
>   src/tests/slave_recovery_tests.cpp c036e9c 
>   src/tests/slave_tests.cpp e9002e8 
> 
> Diff: https://reviews.apache.org/r/29507/diff/
> 
> 
> Testing
> -------
> 
> Manually tested slave failover/shutdown with master using different 
> --slave_ping_timeout and --max_slave_ping_timeouts.
> Ran unit tests with shorter non-default values for ping timeouts.
> `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and 
> ShortPingTimeoutUnreachableSlave
> 
> 
> Thanks,
> 
> Adam B
> 
>

Reply via email to