Re: Review Request 53411: Use a random TCP port for framework tests.
> On Nov. 3, 2016, 8:16 p.m., Vinod Kone wrote: > > src/tests/balloon_framework_test.sh, line 73 > > <https://reviews.apache.org/r/53411/diff/1/?file=1552377#file1552377line73> > > > > why did you pick these particular numbers? Chosen at random; we could use anything between 1024 and 65535. - David --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53411/#review154795 --- On Nov. 2, 2016, 11:43 p.m., David Robinson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53411/ > --- > > (Updated Nov. 2, 2016, 11:43 p.m.) > > > Review request for mesos. > > > Bugs: MESOS-6537 > https://issues.apache.org/jira/browse/MESOS-6537 > > > Repository: mesos > > > Description > --- > > Use a random TCP port for framework tests. > > > Diffs > - > > src/tests/balloon_framework_test.sh > 7d0a4fee200577a5f7b2edbb3b310a07678879dd > src/tests/disk_full_framework_test.sh > 83ae0947d4f5ec0a960e8cefa925cacbe6c6808e > > Diff: https://reviews.apache.org/r/53411/diff/ > > > Testing > --- > > Ran tests, ensured the test succeeded and the port wasn't 5432. > > DEBUG: [ RUN ] ExamplesTest.DiskFullFramework > ... > DEBUG: I1102 22:39:12.339356 46828 master.cpp:380] Master > 550c3c82-063a-427d-b130-60044fbd4898 (localhost) started on 127.0.0.1:24269 > ... > DEBUG: I1102 22:39:14.312568 46937 slave.cpp:915] New master detected at > master@127.0.0.1:24269 > ... > DEBUG: [ OK ] ExamplesTest.DiskFullFramework (17988 ms) > > > Thanks, > > David Robinson > >
Review Request 53411: Use a random TCP port for framework tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53411/ --- Review request for mesos. Bugs: MESOS-6537 https://issues.apache.org/jira/browse/MESOS-6537 Repository: mesos Description --- Use a random TCP port for framework tests. Diffs - src/tests/balloon_framework_test.sh 7d0a4fee200577a5f7b2edbb3b310a07678879dd src/tests/disk_full_framework_test.sh 83ae0947d4f5ec0a960e8cefa925cacbe6c6808e Diff: https://reviews.apache.org/r/53411/diff/ Testing --- Ran tests, ensured the test succeeded and the port wasn't 5432. DEBUG: [ RUN ] ExamplesTest.DiskFullFramework ... DEBUG: I1102 22:39:12.339356 46828 master.cpp:380] Master 550c3c82-063a-427d-b130-60044fbd4898 (localhost) started on 127.0.0.1:24269 ... DEBUG: I1102 22:39:14.312568 46937 slave.cpp:915] New master detected at master@127.0.0.1:24269 ... DEBUG: [ OK ] ExamplesTest.DiskFullFramework (17988 ms) Thanks, David Robinson
Re: Review Request 50177: Add systemd watchdog support.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50177/#review142968 --- Overall it LGTM. configure.ac (line 615) <https://reviews.apache.org/r/50177/#comment208618> This doesn't read correctly and it sort of states the obvious, consider simplying it: # Check for systemd and link it in if present. src/linux/systemd.cpp (lines 198 - 200) <https://reviews.apache.org/r/50177/#comment208621> The terminology here is a little confusing. The 'watchdog daemon' is the process running this function, and we're not pinging the watchdog we're sending a message to systemd. Consider rewording, eg: // If the WATCHDOG_USEC environment variable is set systemd expects a watchdog message every WATCHDOG_USEC / 2 microseconds. The message indicates liveliness to systemd. - David Robinson On July 19, 2016, 1:14 a.m., Lawrence Wu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50177/ > --- > > (Updated July 19, 2016, 1:14 a.m.) > > > Review request for mesos, David Robinson and Ian Downes. > > > Repository: mesos > > > Description > --- > > Adds systemd watchdog support (see > http://0pointer.de/blog/projects/watchdog.html for context). > > > Diffs > - > > configure.ac d2136909b7305498ae901a5ea00133142b77f9e6 > src/linux/systemd.hpp 91134f1d4b100759e45931bd09ca4e1e1aeaaf8a > src/linux/systemd.cpp 619aa2778da5f99d3a078a8e1208bdaa9dc77581 > src/slave/main.cpp 4624392d30cf391015dcd63f447fe2414a47a16a > > Diff: https://reviews.apache.org/r/50177/diff/ > > > Testing > --- > > Tested by sending SIGSTOP to running mesos and verifying via journalctl that > it was killed by the watchdog. > > TODO: write actual test by using and forking a mock binary that initializes > systemd and sending SIGSTOP to that binary. test also needs a unit file for > that binary and for systemd to run so we can verify that systemd killed it. > > > File Attachments > > > mesos gets owned by watchdog > > https://reviews.apache.org/media/uploaded/files/2016/07/19/3c73b595-d28e-45d4-adfe-697801ba02cc__Screen_Shot_2016-07-18_at_2.09.31_PM.png > > > Thanks, > > Lawrence Wu > >
Re: Review Request 47209: Establish TCP connection after backing off.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47209/ --- (Updated May 13, 2016, 3:37 a.m.) Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. Bugs: MESOS-5330 https://issues.apache.org/jira/browse/MESOS-5330 Repository: mesos Description --- Establish TCP connection after backing off. Diffs (updated) - src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 Diff: https://reviews.apache.org/r/47209/diff/ Testing --- make check Also started a master and agent and captured a tcpdump to verify the connection is only established _after_ the registration backoff. Thanks, David Robinson
Re: Review Request 47209: Establish TCP connection after backing off.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47209/ --- (Updated May 12, 2016, 6:18 p.m.) Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. Bugs: MESOS-5330 https://issues.apache.org/jira/browse/MESOS-5330 Repository: mesos Description --- Establish TCP connection after backing off. Diffs (updated) - src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 Diff: https://reviews.apache.org/r/47209/diff/ Testing --- make check Also started a master and agent and captured a tcpdump to verify the connection is only established _after_ the registration backoff. Thanks, David Robinson
Review Request 47209: Establish TCP connection after backing off.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47209/ --- Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. Bugs: MESOS-5330 https://issues.apache.org/jira/browse/MESOS-5330 Repository: mesos Description --- Establish TCP connection after backing off. Diffs - src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 Diff: https://reviews.apache.org/r/47209/diff/ Testing --- make check Also started a master and agent and captured a tcpdump to verify the connection is only established _after_ the registration backoff. Thanks, David Robinson
Review Request 47154: Establish TCP connection after backing off.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47154/ --- Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. Bugs: MESOS-5330 https://issues.apache.org/jira/browse/MESOS-5330 Repository: mesos Description --- Establish TCP connection after backing off. Diffs - src/slave/slave.hpp b72438033708de473046d321c493d9fbcd7a9b43 src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 Diff: https://reviews.apache.org/r/47154/diff/ Testing --- make check Also started a master and agent and captured a tcpdump to verify the connection is only established _after_ the registration backoff. Thanks, David Robinson
Re: Review Request 47080: Agent should backoff when establishing a socket.
> On May 9, 2016, 5:20 a.m., Cong Wang wrote: > > src/slave/slave.cpp, line 954 > > <https://reviews.apache.org/r/47080/diff/1/?file=1375586#file1375586line954> > > > > One thing to note is the authentication could need a connect with > > master, so this could impact it. But essentially we need a backoff for > > authentication too. Yeah, perhaps the call to authenticate() should be moved within doReliableRegistration() too. - David --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47080/#review132205 --- On May 6, 2016, 10:51 p.m., David Robinson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47080/ > --- > > (Updated May 6, 2016, 10:51 p.m.) > > > Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. > > > Bugs: MESOS-5330 > https://issues.apache.org/jira/browse/MESOS-5330 > > > Repository: mesos > > > Description > --- > > Agent should backoff when establishing a socket. > > > Diffs > - > > src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 > > Diff: https://reviews.apache.org/r/47080/diff/ > > > Testing > --- > > make tests > make check > > Also started a master and agent and captured a tcpdump to verify the > connection is only established _after_ the registration backoff. > > > Thanks, > > David Robinson > >
Re: Review Request 47080: Agent should backoff when establishing a socket.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47080/ --- (Updated May 6, 2016, 10:51 p.m.) Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. Bugs: MESOS-5330 https://issues.apache.org/jira/browse/MESOS-5330 Repository: mesos Description --- Agent should backoff when establishing a socket. Diffs - src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 Diff: https://reviews.apache.org/r/47080/diff/ Testing (updated) --- make tests make check Also started a master and agent and captured a tcpdump to verify the connection is only established _after_ the registration backoff. Thanks, David Robinson
Review Request 47080: Agent should backoff when establishing a socket.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47080/ --- Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang. Bugs: MESOS-5330 https://issues.apache.org/jira/browse/MESOS-5330 Repository: mesos Description --- Agent should backoff when establishing a socket. Diffs - src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 Diff: https://reviews.apache.org/r/47080/diff/ Testing --- make tests Also started a master and agent and captured a tcpdump to verify the connection is only established _after_ the registration backoff. Thanks, David Robinson
Re: Review Request 44230: Added FS_DEFAULT case in rmdir.
> On March 15, 2016, 8:02 p.m., Neil Conway wrote: > > Is it feasible/portable to have a test case for this change? > > Cong Wang wrote: > Yes, like in our case, you can create some socket or device file and try > to remove the directory contains it, it would fail without this patch. AFAICT tests have already been added: https://github.com/apache/mesos/commit/c55ca2941b0119c13764bccdebcea46119e69e4e - David --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44230/#review123733 --- On March 1, 2016, 10:01 p.m., Jojy Varghese wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44230/ > --- > > (Updated March 1, 2016, 10:01 p.m.) > > > Review request for mesos and Jie Yu. > > > Repository: mesos > > > Description > --- > > We currently dont handle special files like device files in rmdir. This change > adds FS_DEFAULT as one of the cases where we try to unlink a file. > > Reference: http://man7.org/linux/man-pages/man3/fts.3.html > > > Diffs > - > > 3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/rmdir.hpp > bc420c9c10d93ddd619a9eb2c5f4db67f31d722f > > Diff: https://reviews.apache.org/r/44230/diff/ > > > Testing > --- > > make check. > > > Thanks, > > Jojy Varghese > >
Re: Review Request 42980: Increased the default registry_store_timeout.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42980/#review117040 --- Ship it! Ship It! - David Robinson On Jan. 29, 2016, 10:46 p.m., Ben Mahler wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42980/ > --- > > (Updated Jan. 29, 2016, 10:46 p.m.) > > > Review request for mesos, David Robinson and Vinod Kone. > > > Repository: mesos > > > Description > --- > > In order to be more tolerant of disk bandwidth interference > on the master machines, we should increase the default > registry store timeout. As it stands, the master is prone > to induce a failover when the master is temporarily starved > of disk bandwidth. > > > Diffs > - > > docs/configuration.md 2f0628a71539ad66fbd96c6a7ff2c7c8eb518e32 > src/master/flags.cpp 4a0e21092bfad8baee0279d473e6056842139be7 > > Diff: https://reviews.apache.org/r/42980/diff/ > > > Testing > --- > > N/A > > > Thanks, > > Ben Mahler > >
Re: Review Request 41245: Use ethtool -k lo to check ethtool command
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41245/#review109965 --- Ship it! Ship It! - David Robinson On Dec. 11, 2015, 5:20 a.m., Cong Wang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/41245/ > --- > > (Updated Dec. 11, 2015, 5:20 a.m.) > > > Review request for mesos, David Robinson, Ian Downes, and Jie Yu. > > > Repository: mesos > > > Description > --- > > David reported that on CentOS5 part of ethtool --version output sends to > stderr instead of stdout, which leads this output in our log. > > > Diffs > - > > src/slave/containerizer/mesos/isolators/network/port_mapping.cpp > 7c1724a4ca5c45bb214e5129743e0644c9ca9661 > > Diff: https://reviews.apache.org/r/41245/diff/ > > > Testing > --- > > Manually start mesos slave > > > Thanks, > > Cong Wang > >
Re: Review Request 41158: Turn off rx checksum offloading for veth in container
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41158/#review109903 --- src/slave/containerizer/mesos/isolators/network/port_mapping.cpp (lines 1097 - 1098) <https://reviews.apache.org/r/41158/#comment169582> This writes to stderr, which can end up in the logs. [root@server ~]# ethtool --version 1> /dev/null ethtool version 6 Usage: ethtool DEVNAME Display standard information about device Log snippet: I1211 01:05:13.215730 10885 main.cpp:190] Build: 2015-12-10 22:54:33 by mockbuild I1211 01:05:13.215859 10885 main.cpp:192] Version: 0.26.0-tw5 I1211 01:05:13.215996 10885 containerizer.cpp:142] Using isolation: cgroups/cpu,cgroups/mem,network/port_mapping,posix/disk,cgroups/perf_event,filesystem/posix ethtool version 6 Usage: ethtool DEVNAME Display standard information about device I1211 01:05:13.251729 10885 port_mapping.cpp:1255] Using eth0 as the public interface I1211 01:05:13.252707 10885 port_mapping.cpp:1280] Using lo as the loopback interface - David Robinson On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/41158/ > --- > > (Updated Dec. 9, 2015, 11:15 p.m.) > > > Review request for mesos, Ian Downes and Jie Yu. > > > Bugs: MESOS-4105 > https://issues.apache.org/jira/browse/MESOS-4105 > > > Repository: mesos > > > Description > --- > > We noticed that in some cases we delivered some corrupt packets to > applications running in our containers. This is clearly wrong. > > Here is what happens: > > 1) We receive a corrupt packet externally > 2) The hardware driver is able to checksum it and notices it has a bad > checksum > 3) The driver delivers this packet anyway to wait for TCP layer to checksum > it again and then drop it > 4) This packet is moved to a veth interface because it is for a container > 5) Both sides of the veth pair have RX checksum offloading by default > 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer > device has rx checksum offloading > 7) Packet is moved into the container TCP/IP stack > 8) TCP layer is not going to checksum it since it is not necessary > 9) The packet gets delivered to application layer > > > Diffs > - > > src/slave/containerizer/mesos/isolators/network/port_mapping.cpp > 89bb36f936417de8169a2442729fbd7c9d60acb7 > > Diff: https://reviews.apache.org/r/41158/diff/ > > > Testing > --- > > 1) Turn rx checksum off manually and the bug is gone > 2) Test this patch and verify rx checksum is turned off as expected. > 3) I don't see any noticable performance issue after turning this off > > > Thanks, > > Cong Wang > >