-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/75057/#review226576
-----------------------------------------------------------


Ship it!




Thanks! Edited the description for clarity and applied the following diff 
locally:

```
diff --git a/src/linux/routing/link/link.cpp b/src/linux/routing/link/link.cpp
index 8942d92b6..bb97e3281 100644
--- a/src/linux/routing/link/link.cpp
+++ b/src/linux/routing/link/link.cpp
@@ -249,48 +249,53 @@ Try<bool> setUp(const string& link)
 
 // There seems to be a bug that is occassionally preventing ioctl and libnl
 // from setting correct MAC addresses, despite them not returning errors.
-// A workaround for this behavior is to check that the MAC address is set
-// correctly after the ioctl call, and retry the address setting if necessary.
-// In our test, this workaround appears to reduce the frequency of this
-// issue, but does not seem to prevent all such failures.
+//
 // Observed scenarios with incorrectly assigned MAC addresses:
-// 1. ioctl returns the correct MAC address, but not net::mac
-// 2. both net::mac and ioctl return the same MAC address, but are both wrong
-// 3. There are no cases where ioctl/net::mac come back with the same MAC
-//    address as before setting. i.e. there is no no-op observed.
-// 4. There is a possibility that ioctl/net::mac results disagree with each
-//    other even before attempting to set our desired MAC address. As such, we 
-//    check that the results agree before we set, and log a warning if we find
-//    a mismatch
+//
+// 1. After setting the mac address: ioctl returns the correct MAC address, but
+//    net::mac returns an incorrect MAC address (different from the original!)
+// 2. After setting the mac address: both ioctl and net::mac return the same 
MAC
+//    address, but are both wrong (and different from the original one!)
+// 3. After setting the mac address: there are no cases where ioctl or net::mac
+//    come back with the same MAC address as before we set the address.
+// 4. Before we set the mac address: there is a possibility that ioctl and
+//    net::mac results disagree with each other!
 // 5. There is a possibility that the MAC address we set ends up overwritten by
 //    a garbage value after setMAC has already completed and checked that the
 //    mac address was set correctly. Since this error happens after this
-//    function has finished, we cannot log nor detect it in this function.
+//    function has finished, we cannot log nor detect it in setMAC because we
+//    have not yet studied at what point this occurs.
+//
 // Notes:
+//
 // 1. We have observed this behavior only on CentOS 9 systems at the moment,
-//    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this 
-//    issue.
-//    CentOS 7 systems do not seem to have this issue with setMAC. 
-
+//    CentOS 7 systems under various kernels do not seem to have the issue
+//    (which is quite strange if this was purely a kernel bug).
+// 2. We have tried kernels 5.15.147, 5.15.160, 5.15.161, all of these have
+//    this issue on CentOS 9.
+//
+// To workaround this bug, we check that the MAC address is set correctly
+// after the ioctl call, and retry the address setting if necessary. In our
+// testing, this workaround appears to workaround scenarios (1), (2), (3),
+// and (4) above, but it does not address scenario (5).
+//
+// See MESOS-10243 for additional details, follow-ups.
 Try<Nothing> setMAC(const string& link, const net::MAC& mac)
 {
-  auto getAddressViaIoCtl = [](const int& fd, const string& link) -> 
Try<struct ifreq> {
-    struct ifreq ifr;
+  auto getMacViaIoctl = [](int fd, const string& link) -> Try<ifreq> {
+    ifreq ifr;
     memset(&ifr, 0, sizeof(ifr));
     strncpy(ifr.ifr_name, link.c_str(), IFNAMSIZ);
-
-    // Since loopback interface has sa_family ARPHRD_LOOPBACK, we need
-    // to get the MAC address of the link first to decide what value the
-    // sa_family should be.
     if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) {
-      // Save the error string as os::close may overwrite errno.
-      string message = os::strerror(errno);
-      return Error(std::move(message));
+      return ErrnoError();
     }
-
     return ifr;
   };
 
+  // Since loopback interface has sa_family ARPHRD_LOOPBACK, we need
+  // to get the MAC address of the link first to decide what value the
+  // sa_family should be.
+  //
   // TODO(jieyu): We use ioctl to set the MAC address because the
   // interfaces in libnl have some issues with virtual devices.
   int fd = ::socket(AF_INET, SOCK_STREAM, 0);
@@ -298,70 +303,67 @@ Try<Nothing> setMAC(const string& link, const net::MAC& 
mac)
     return ErrnoError();
   }
 
-  Try<struct ifreq> ifr_try = getAddressViaIoCtl(fd, link);
-
-  if (ifr_try.isError()) {
+  Try<ifreq> if_request = getMacViaIoctl(fd, link);
+  if (if_request.isError()) {
     os::close(fd);
-    return Error(ifr_try.error());
+    return Error(if_request.error());
   }
 
+  // See comment above this function on why we check this and use a loop below.
   Result<net::MAC> net_mac = net::mac(link);
-  net::MAC ioctl_mac = net::MAC(ifr_try->ifr_hwaddr.sa_data);
+  net::MAC ioctl_mac = net::MAC(if_request->ifr_hwaddr.sa_data);
 
   if (!(net_mac.isSome() && *net_mac == ioctl_mac)) {
-    LOG(WARNING) << "MAC mismatch between net::mac = "
-                 << (net_mac.isSome()
-                       ? stringify(*net_mac)
-                       : (net_mac.isError() ? net_mac.error() : "None"))
-                 << " and ioctl = " << stringify(ioctl_mac) << " for link "
-                 << stringify(link)
-                 << " before setting to target mac address " << stringify(mac);
+    LOG(WARNING)
+      << "MAC mismatch between net::mac = " << (net_mac.isSome()
+         ? stringify(*net_mac) : (net_mac.isError() ? net_mac.error() : 
"None"))
+      << " and ioctl = " << stringify(ioctl_mac)
+      << " for link " << stringify(link)
+      << " before setting to target mac address " << stringify(mac);
   }
 
   for (int i = 0; i < 10; ++i) {
-    ifr_try->ifr_hwaddr.sa_data[0] = mac[0];
-    ifr_try->ifr_hwaddr.sa_data[1] = mac[1];
-    ifr_try->ifr_hwaddr.sa_data[2] = mac[2];
-    ifr_try->ifr_hwaddr.sa_data[3] = mac[3];
-    ifr_try->ifr_hwaddr.sa_data[4] = mac[4];
-    ifr_try->ifr_hwaddr.sa_data[5] = mac[5];
-
-    if (ioctl(fd, SIOCSIFHWADDR, &ifr_try.get()) == -1) {
-      // Save the error string as os::close may overwrite errno.
+    if_request->ifr_hwaddr.sa_data[0] = mac[0];
+    if_request->ifr_hwaddr.sa_data[1] = mac[1];
+    if_request->ifr_hwaddr.sa_data[2] = mac[2];
+    if_request->ifr_hwaddr.sa_data[3] = mac[3];
+    if_request->ifr_hwaddr.sa_data[4] = mac[4];
+    if_request->ifr_hwaddr.sa_data[5] = mac[5];
+
+    if (ioctl(fd, SIOCSIFHWADDR, &(*if_request)) == -1) {
       const string message = os::strerror(errno);
       os::close(fd);
       return Error(message);
     }
 
-    // Prepare another read to check if the MAC address was set correctly
-    ifr_try = getAddressViaIoCtl(fd, link);
-
-    if (ifr_try.isError()) {
+    // Prepare another read to check if the MAC address was set correctly.
+    if_request = getMacViaIoctl(fd, link);
+    if (if_request.isError()) {
       os::close(fd);
-      return Error(ifr_try.error());
+      return Error(if_request.error());
     }
-    
+
     net_mac = net::mac(link);
-    ioctl_mac = net::MAC(ifr_try->ifr_hwaddr.sa_data);
+    ioctl_mac = net::MAC(if_request->ifr_hwaddr.sa_data);
 
     if (net_mac.isSome() && *net_mac == mac && ioctl_mac == mac) {
       break;
     }
 
-    LOG(WARNING) << "MAC mismatch between target mac: " << stringify(mac)
-              << ", net::mac: "
-              << (net_mac.isSome()
-                    ? stringify(*net_mac)
-                    : (net_mac.isError() ? net_mac.error() : "None"))
-              << " & ioctl: " << stringify(ioctl_mac) << " for link "
-              << stringify(link) << ". Retrying...";
+    LOG(WARNING)
+      << "MAC mismatch between target mac = " << stringify(mac)
+      << " and net::mac = " << (net_mac.isSome()
+         ? stringify(*net_mac) : (net_mac.isError() ? net_mac.error() : 
"None"))
+      << " and ioctl: " << stringify(ioctl_mac)
+      << " for link " << stringify(link) << "; retrying set operation...";
   }
 
   os::close(fd);
+
   if (!(net_mac.isSome() && *net_mac == mac && ioctl_mac == mac)) {
     return Error(
-      "net::mac and ioctl did not report the correct address despite multiple "
-      "attempts to set target MAC address");
+      "net::mac and ioctl did not report the correct address despite multiple"
+      " attempts to set target MAC address");
   }
   return Nothing();
 }

```


src/linux/routing/link/link.cpp
Lines 273 (patched)
<https://reviews.apache.org/r/75057/#comment314857>

    looks like your editor is allowing trailing whitespace



src/linux/routing/link/link.cpp
Lines 263-265 (original), 282-284 (patched)
<https://reviews.apache.org/r/75057/#comment314855>

    looks like this comment needs to move now



src/linux/routing/link/link.cpp
Line 267 (original), 286 (patched)
<https://reviews.apache.org/r/75057/#comment314856>

    comment is now stale
    
    because there's no close anymore we can just return ErrnoError()


- Benjamin Mahler


On June 19, 2024, 8:13 p.m., Jason Zhou wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/75057/
> -----------------------------------------------------------
> 
> (Updated June 19, 2024, 8:13 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-10243
>     https://issues.apache.org/jira/browse/MESOS-10243
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC.
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/link/link.cpp bff172dea63f77a2c7bafd7e3fbee5dca1dfbd51 
> 
> 
> Diff: https://reviews.apache.org/r/75057/diff/7/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jason Zhou
> 
>

Reply via email to