Re: Review Request 53411: Use a random TCP port for framework tests.

2016-11-03 Thread David Robinson


> On Nov. 3, 2016, 8:16 p.m., Vinod Kone wrote:
> > src/tests/balloon_framework_test.sh, line 73
> > <https://reviews.apache.org/r/53411/diff/1/?file=1552377#file1552377line73>
> >
> > why did you pick these particular numbers?

Chosen at random; we could use anything between 1024 and 65535.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53411/#review154795
---


On Nov. 2, 2016, 11:43 p.m., David Robinson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53411/
> ---
> 
> (Updated Nov. 2, 2016, 11:43 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-6537
> https://issues.apache.org/jira/browse/MESOS-6537
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Use a random TCP port for framework tests.
> 
> 
> Diffs
> -
> 
>   src/tests/balloon_framework_test.sh 
> 7d0a4fee200577a5f7b2edbb3b310a07678879dd 
>   src/tests/disk_full_framework_test.sh 
> 83ae0947d4f5ec0a960e8cefa925cacbe6c6808e 
> 
> Diff: https://reviews.apache.org/r/53411/diff/
> 
> 
> Testing
> ---
> 
> Ran tests, ensured the test succeeded and the port wasn't 5432.
> 
> DEBUG: [ RUN  ] ExamplesTest.DiskFullFramework
> ...
> DEBUG: I1102 22:39:12.339356 46828 master.cpp:380] Master 
> 550c3c82-063a-427d-b130-60044fbd4898 (localhost) started on 127.0.0.1:24269
> ...
> DEBUG: I1102 22:39:14.312568 46937 slave.cpp:915] New master detected at 
> master@127.0.0.1:24269
> ...
> DEBUG: [   OK ] ExamplesTest.DiskFullFramework (17988 ms)
> 
> 
> Thanks,
> 
> David Robinson
> 
>



Review Request 53411: Use a random TCP port for framework tests.

2016-11-02 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53411/
---

Review request for mesos.


Bugs: MESOS-6537
https://issues.apache.org/jira/browse/MESOS-6537


Repository: mesos


Description
---

Use a random TCP port for framework tests.


Diffs
-

  src/tests/balloon_framework_test.sh 7d0a4fee200577a5f7b2edbb3b310a07678879dd 
  src/tests/disk_full_framework_test.sh 
83ae0947d4f5ec0a960e8cefa925cacbe6c6808e 

Diff: https://reviews.apache.org/r/53411/diff/


Testing
---

Ran tests, ensured the test succeeded and the port wasn't 5432.

DEBUG: [ RUN  ] ExamplesTest.DiskFullFramework
...
DEBUG: I1102 22:39:12.339356 46828 master.cpp:380] Master 
550c3c82-063a-427d-b130-60044fbd4898 (localhost) started on 127.0.0.1:24269
...
DEBUG: I1102 22:39:14.312568 46937 slave.cpp:915] New master detected at 
master@127.0.0.1:24269
...
DEBUG: [   OK ] ExamplesTest.DiskFullFramework (17988 ms)


Thanks,

David Robinson



Re: Review Request 50177: Add systemd watchdog support.

2016-07-20 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50177/#review142968
---



Overall it LGTM.


configure.ac (line 615)
<https://reviews.apache.org/r/50177/#comment208618>

This doesn't read correctly and it sort of states the obvious, consider 
simplying it:

# Check for systemd and link it in if present.



src/linux/systemd.cpp (lines 198 - 200)
<https://reviews.apache.org/r/50177/#comment208621>

The terminology here is a little confusing. The 'watchdog daemon' is the 
process running this function, and we're not pinging the watchdog we're sending 
a message to systemd. Consider rewording, eg:

// If the WATCHDOG_USEC environment variable is set systemd expects a 
watchdog message every WATCHDOG_USEC / 2 microseconds. The message indicates 
liveliness to systemd.


- David Robinson


On July 19, 2016, 1:14 a.m., Lawrence Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50177/
> ---
> 
> (Updated July 19, 2016, 1:14 a.m.)
> 
> 
> Review request for mesos, David Robinson and Ian Downes.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Adds systemd watchdog support (see 
> http://0pointer.de/blog/projects/watchdog.html for context).
> 
> 
> Diffs
> -
> 
>   configure.ac d2136909b7305498ae901a5ea00133142b77f9e6 
>   src/linux/systemd.hpp 91134f1d4b100759e45931bd09ca4e1e1aeaaf8a 
>   src/linux/systemd.cpp 619aa2778da5f99d3a078a8e1208bdaa9dc77581 
>   src/slave/main.cpp 4624392d30cf391015dcd63f447fe2414a47a16a 
> 
> Diff: https://reviews.apache.org/r/50177/diff/
> 
> 
> Testing
> ---
> 
> Tested by sending SIGSTOP to running mesos and verifying via journalctl that 
> it was killed by the watchdog.
> 
> TODO: write actual test by using and forking a mock binary that initializes 
> systemd and sending SIGSTOP to that binary. test also needs a unit file for 
> that binary and for systemd to run so we can verify that systemd killed it.
> 
> 
> File Attachments
> 
> 
> mesos gets owned by watchdog
>   
> https://reviews.apache.org/media/uploaded/files/2016/07/19/3c73b595-d28e-45d4-adfe-697801ba02cc__Screen_Shot_2016-07-18_at_2.09.31_PM.png
> 
> 
> Thanks,
> 
> Lawrence Wu
> 
>



Re: Review Request 47209: Establish TCP connection after backing off.

2016-05-12 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47209/
---

(Updated May 13, 2016, 3:37 a.m.)


Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.


Bugs: MESOS-5330
https://issues.apache.org/jira/browse/MESOS-5330


Repository: mesos


Description
---

Establish TCP connection after backing off.


Diffs (updated)
-

  src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 

Diff: https://reviews.apache.org/r/47209/diff/


Testing
---

make check

Also started a master and agent and captured a tcpdump to verify the connection 
is only established _after_ the registration backoff.


Thanks,

David Robinson



Re: Review Request 47209: Establish TCP connection after backing off.

2016-05-12 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47209/
---

(Updated May 12, 2016, 6:18 p.m.)


Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.


Bugs: MESOS-5330
https://issues.apache.org/jira/browse/MESOS-5330


Repository: mesos


Description
---

Establish TCP connection after backing off.


Diffs (updated)
-

  src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 

Diff: https://reviews.apache.org/r/47209/diff/


Testing
---

make check

Also started a master and agent and captured a tcpdump to verify the connection 
is only established _after_ the registration backoff.


Thanks,

David Robinson



Review Request 47209: Establish TCP connection after backing off.

2016-05-10 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47209/
---

Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.


Bugs: MESOS-5330
https://issues.apache.org/jira/browse/MESOS-5330


Repository: mesos


Description
---

Establish TCP connection after backing off.


Diffs
-

  src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 

Diff: https://reviews.apache.org/r/47209/diff/


Testing
---

make check

Also started a master and agent and captured a tcpdump to verify the connection 
is only established _after_ the registration backoff.


Thanks,

David Robinson



Review Request 47154: Establish TCP connection after backing off.

2016-05-09 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47154/
---

Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.


Bugs: MESOS-5330
https://issues.apache.org/jira/browse/MESOS-5330


Repository: mesos


Description
---

Establish TCP connection after backing off.


Diffs
-

  src/slave/slave.hpp b72438033708de473046d321c493d9fbcd7a9b43 
  src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 

Diff: https://reviews.apache.org/r/47154/diff/


Testing
---

make check

Also started a master and agent and captured a tcpdump to verify the connection 
is only established _after_ the registration backoff.


Thanks,

David Robinson



Re: Review Request 47080: Agent should backoff when establishing a socket.

2016-05-09 Thread David Robinson


> On May 9, 2016, 5:20 a.m., Cong Wang wrote:
> > src/slave/slave.cpp, line 954
> > <https://reviews.apache.org/r/47080/diff/1/?file=1375586#file1375586line954>
> >
> > One thing to note is the authentication could need a connect with 
> > master, so this could impact it. But essentially we need a backoff for 
> > authentication too.

Yeah, perhaps the call to authenticate() should be moved within 
doReliableRegistration() too.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47080/#review132205
---


On May 6, 2016, 10:51 p.m., David Robinson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47080/
> ---
> 
> (Updated May 6, 2016, 10:51 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.
> 
> 
> Bugs: MESOS-5330
> https://issues.apache.org/jira/browse/MESOS-5330
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Agent should backoff when establishing a socket.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 
> 
> Diff: https://reviews.apache.org/r/47080/diff/
> 
> 
> Testing
> ---
> 
> make tests
> make check
> 
> Also started a master and agent and captured a tcpdump to verify the 
> connection is only established _after_ the registration backoff.
> 
> 
> Thanks,
> 
> David Robinson
> 
>



Re: Review Request 47080: Agent should backoff when establishing a socket.

2016-05-06 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47080/
---

(Updated May 6, 2016, 10:51 p.m.)


Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.


Bugs: MESOS-5330
https://issues.apache.org/jira/browse/MESOS-5330


Repository: mesos


Description
---

Agent should backoff when establishing a socket.


Diffs
-

  src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 

Diff: https://reviews.apache.org/r/47080/diff/


Testing (updated)
---

make tests
make check

Also started a master and agent and captured a tcpdump to verify the connection 
is only established _after_ the registration backoff.


Thanks,

David Robinson



Review Request 47080: Agent should backoff when establishing a socket.

2016-05-06 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47080/
---

Review request for mesos, Ben Mahler, Ian Downes, and Cong Wang.


Bugs: MESOS-5330
https://issues.apache.org/jira/browse/MESOS-5330


Repository: mesos


Description
---

Agent should backoff when establishing a socket.


Diffs
-

  src/slave/slave.cpp 116ea59b72950db4a7cd721b7ba5bfbb2e1c1454 

Diff: https://reviews.apache.org/r/47080/diff/


Testing
---

make tests

Also started a master and agent and captured a tcpdump to verify the connection 
is only established _after_ the registration backoff.


Thanks,

David Robinson



Re: Review Request 44230: Added FS_DEFAULT case in rmdir.

2016-03-15 Thread David Robinson


> On March 15, 2016, 8:02 p.m., Neil Conway wrote:
> > Is it feasible/portable to have a test case for this change?
> 
> Cong Wang wrote:
> Yes, like in our case, you can create some socket or device file and try 
> to remove the directory contains it, it would fail without this patch.

AFAICT tests have already been added:

https://github.com/apache/mesos/commit/c55ca2941b0119c13764bccdebcea46119e69e4e


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44230/#review123733
---


On March 1, 2016, 10:01 p.m., Jojy Varghese wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44230/
> ---
> 
> (Updated March 1, 2016, 10:01 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We currently dont handle special files like device files in rmdir. This change
> adds FS_DEFAULT as one of the cases where we try to unlink a file.
> 
> Reference: http://man7.org/linux/man-pages/man3/fts.3.html
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/rmdir.hpp 
> bc420c9c10d93ddd619a9eb2c5f4db67f31d722f 
> 
> Diff: https://reviews.apache.org/r/44230/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jojy Varghese
> 
>



Re: Review Request 42980: Increased the default registry_store_timeout.

2016-01-29 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42980/#review117040
---


Ship it!




Ship It!

- David Robinson


On Jan. 29, 2016, 10:46 p.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42980/
> ---
> 
> (Updated Jan. 29, 2016, 10:46 p.m.)
> 
> 
> Review request for mesos, David Robinson and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In order to be more tolerant of disk bandwidth interference
> on the master machines, we should increase the default
> registry store timeout. As it stands, the master is prone
> to induce a failover when the master is temporarily starved
> of disk bandwidth.
> 
> 
> Diffs
> -
> 
>   docs/configuration.md 2f0628a71539ad66fbd96c6a7ff2c7c8eb518e32 
>   src/master/flags.cpp 4a0e21092bfad8baee0279d473e6056842139be7 
> 
> Diff: https://reviews.apache.org/r/42980/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41245: Use ethtool -k lo to check ethtool command

2015-12-11 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41245/#review109965
---

Ship it!


Ship It!

- David Robinson


On Dec. 11, 2015, 5:20 a.m., Cong Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41245/
> ---
> 
> (Updated Dec. 11, 2015, 5:20 a.m.)
> 
> 
> Review request for mesos, David Robinson, Ian Downes, and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> David reported that on CentOS5 part of ethtool --version output sends to 
> stderr instead of stdout, which leads this output in our log.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 
> 7c1724a4ca5c45bb214e5129743e0644c9ca9661 
> 
> Diff: https://reviews.apache.org/r/41245/diff/
> 
> 
> Testing
> ---
> 
> Manually start mesos slave
> 
> 
> Thanks,
> 
> Cong Wang
> 
>



Re: Review Request 41158: Turn off rx checksum offloading for veth in container

2015-12-10 Thread David Robinson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109903
---



src/slave/containerizer/mesos/isolators/network/port_mapping.cpp (lines 1097 - 
1098)
<https://reviews.apache.org/r/41158/#comment169582>

This writes to stderr, which can end up in the logs.

[root@server ~]# ethtool --version 1> /dev/null
ethtool version 6
Usage:
ethtool DEVNAME Display standard information about device

Log snippet:

I1211 01:05:13.215730 10885 main.cpp:190] Build: 2015-12-10 22:54:33 by 
mockbuild
I1211 01:05:13.215859 10885 main.cpp:192] Version: 0.26.0-tw5
I1211 01:05:13.215996 10885 containerizer.cpp:142] Using isolation: 
cgroups/cpu,cgroups/mem,network/port_mapping,posix/disk,cgroups/perf_event,filesystem/posix
ethtool version 6
Usage:
ethtool DEVNAME Display standard information about device
I1211 01:05:13.251729 10885 port_mapping.cpp:1255] Using eth0 as the public 
interface
I1211 01:05:13.252707 10885 port_mapping.cpp:1280] Using lo as the loopback 
interface


- David Robinson


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> ---
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
> https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We noticed that in some cases we delivered some corrupt packets to 
> applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad 
> checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum 
> it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer 
> device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 
> 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> ---
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>