-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/75057/#review226576
-----------------------------------------------------------
Ship it!
Thanks! Edited the description for clarity and applied the following diff
locally:
```
diff --git a/src/linux/routing/link/link.cpp b/src/linux/routing/link/link.cpp
index 8942d92b6..bb97e3281 100644
--- a/src/linux/routing/link/link.cpp
+++ b/src/linux/routing/link/link.cpp
@@ -249,48 +249,53 @@ Try<bool> setUp(const string& link)
// There seems to be a bug that is occassionally preventing ioctl and libnl
// from setting correct MAC addresses, despite them not returning errors.
-// A workaround for this behavior is to check that the MAC address is set
-// correctly after the ioctl call, and retry the address setting if necessary.
-// In our test, this workaround appears to reduce the frequency of this
-// issue, but does not seem to prevent all such failures.
+//
// Observed scenarios with incorrectly assigned MAC addresses:
-// 1. ioctl returns the correct MAC address, but not net::mac
-// 2. both net::mac and ioctl return the same MAC address, but are both wrong
-// 3. There are no cases where ioctl/net::mac come back with the same MAC
-// address as before setting. i.e. there is no no-op observed.
-// 4. There is a possibility that ioctl/net::mac results disagree with each
-// other even before attempting to set our desired MAC address. As such, we
-// check that the results agree before we set, and log a warning if we find
-// a mismatch
+//
+// 1. After setting the mac address: ioctl returns the correct MAC address, but
+// net::mac returns an incorrect MAC address (different from the original!)
+// 2. After setting the mac address: both ioctl and net::mac return the same
MAC
+// address, but are both wrong (and different from the original one!)
+// 3. After setting the mac address: there are no cases where ioctl or net::mac
+// come back with the same MAC address as before we set the address.
+// 4. Before we set the mac address: there is a possibility that ioctl and
+// net::mac results disagree with each other!
// 5. There is a possibility that the MAC address we set ends up overwritten by
// a garbage value after setMAC has already completed and checked that the
// mac address was set correctly. Since this error happens after this
-// function has finished, we cannot log nor detect it in this function.
+// function has finished, we cannot log nor detect it in setMAC because we
+// have not yet studied at what point this occurs.
+//
// Notes:
+//
// 1. We have observed this behavior only on CentOS 9 systems at the moment,
-// We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
-// issue.
-// CentOS 7 systems do not seem to have this issue with setMAC.
-
+// CentOS 7 systems under various kernels do not seem to have the issue
+// (which is quite strange if this was purely a kernel bug).
+// 2. We have tried kernels 5.15.147, 5.15.160, 5.15.161, all of these have
+// this issue on CentOS 9.
+//
+// To workaround this bug, we check that the MAC address is set correctly
+// after the ioctl call, and retry the address setting if necessary. In our
+// testing, this workaround appears to workaround scenarios (1), (2), (3),
+// and (4) above, but it does not address scenario (5).
+//
+// See MESOS-10243 for additional details, follow-ups.
Try<Nothing> setMAC(const string& link, const net::MAC& mac)
{
- auto getAddressViaIoCtl = [](const int& fd, const string& link) ->
Try<struct ifreq> {
- struct ifreq ifr;
+ auto getMacViaIoctl = [](int fd, const string& link) -> Try<ifreq> {
+ ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, link.c_str(), IFNAMSIZ);
-
- // Since loopback interface has sa_family ARPHRD_LOOPBACK, we need
- // to get the MAC address of the link first to decide what value the
- // sa_family should be.
if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) {
- // Save the error string as os::close may overwrite errno.
- string message = os::strerror(errno);
- return Error(std::move(message));
+ return ErrnoError();
}
-
return ifr;
};
+ // Since loopback interface has sa_family ARPHRD_LOOPBACK, we need
+ // to get the MAC address of the link first to decide what value the
+ // sa_family should be.
+ //
// TODO(jieyu): We use ioctl to set the MAC address because the
// interfaces in libnl have some issues with virtual devices.
int fd = ::socket(AF_INET, SOCK_STREAM, 0);
@@ -298,70 +303,67 @@ Try<Nothing> setMAC(const string& link, const net::MAC&
mac)
return ErrnoError();
}
- Try<struct ifreq> ifr_try = getAddressViaIoCtl(fd, link);
-
- if (ifr_try.isError()) {
+ Try<ifreq> if_request = getMacViaIoctl(fd, link);
+ if (if_request.isError()) {
os::close(fd);
- return Error(ifr_try.error());
+ return Error(if_request.error());
}
+ // See comment above this function on why we check this and use a loop below.
Result<net::MAC> net_mac = net::mac(link);
- net::MAC ioctl_mac = net::MAC(ifr_try->ifr_hwaddr.sa_data);
+ net::MAC ioctl_mac = net::MAC(if_request->ifr_hwaddr.sa_data);
if (!(net_mac.isSome() && *net_mac == ioctl_mac)) {
- LOG(WARNING) << "MAC mismatch between net::mac = "
- << (net_mac.isSome()
- ? stringify(*net_mac)
- : (net_mac.isError() ? net_mac.error() : "None"))
- << " and ioctl = " << stringify(ioctl_mac) << " for link "
- << stringify(link)
- << " before setting to target mac address " << stringify(mac);
+ LOG(WARNING)
+ << "MAC mismatch between net::mac = " << (net_mac.isSome()
+ ? stringify(*net_mac) : (net_mac.isError() ? net_mac.error() :
"None"))
+ << " and ioctl = " << stringify(ioctl_mac)
+ << " for link " << stringify(link)
+ << " before setting to target mac address " << stringify(mac);
}
for (int i = 0; i < 10; ++i) {
- ifr_try->ifr_hwaddr.sa_data[0] = mac[0];
- ifr_try->ifr_hwaddr.sa_data[1] = mac[1];
- ifr_try->ifr_hwaddr.sa_data[2] = mac[2];
- ifr_try->ifr_hwaddr.sa_data[3] = mac[3];
- ifr_try->ifr_hwaddr.sa_data[4] = mac[4];
- ifr_try->ifr_hwaddr.sa_data[5] = mac[5];
-
- if (ioctl(fd, SIOCSIFHWADDR, &ifr_try.get()) == -1) {
- // Save the error string as os::close may overwrite errno.
+ if_request->ifr_hwaddr.sa_data[0] = mac[0];
+ if_request->ifr_hwaddr.sa_data[1] = mac[1];
+ if_request->ifr_hwaddr.sa_data[2] = mac[2];
+ if_request->ifr_hwaddr.sa_data[3] = mac[3];
+ if_request->ifr_hwaddr.sa_data[4] = mac[4];
+ if_request->ifr_hwaddr.sa_data[5] = mac[5];
+
+ if (ioctl(fd, SIOCSIFHWADDR, &(*if_request)) == -1) {
const string message = os::strerror(errno);
os::close(fd);
return Error(message);
}
- // Prepare another read to check if the MAC address was set correctly
- ifr_try = getAddressViaIoCtl(fd, link);
-
- if (ifr_try.isError()) {
+ // Prepare another read to check if the MAC address was set correctly.
+ if_request = getMacViaIoctl(fd, link);
+ if (if_request.isError()) {
os::close(fd);
- return Error(ifr_try.error());
+ return Error(if_request.error());
}
-
+
net_mac = net::mac(link);
- ioctl_mac = net::MAC(ifr_try->ifr_hwaddr.sa_data);
+ ioctl_mac = net::MAC(if_request->ifr_hwaddr.sa_data);
if (net_mac.isSome() && *net_mac == mac && ioctl_mac == mac) {
break;
}
- LOG(WARNING) << "MAC mismatch between target mac: " << stringify(mac)
- << ", net::mac: "
- << (net_mac.isSome()
- ? stringify(*net_mac)
- : (net_mac.isError() ? net_mac.error() : "None"))
- << " & ioctl: " << stringify(ioctl_mac) << " for link "
- << stringify(link) << ". Retrying...";
+ LOG(WARNING)
+ << "MAC mismatch between target mac = " << stringify(mac)
+ << " and net::mac = " << (net_mac.isSome()
+ ? stringify(*net_mac) : (net_mac.isError() ? net_mac.error() :
"None"))
+ << " and ioctl: " << stringify(ioctl_mac)
+ << " for link " << stringify(link) << "; retrying set operation...";
}
os::close(fd);
+
if (!(net_mac.isSome() && *net_mac == mac && ioctl_mac == mac)) {
return Error(
- "net::mac and ioctl did not report the correct address despite multiple "
- "attempts to set target MAC address");
+ "net::mac and ioctl did not report the correct address despite multiple"
+ " attempts to set target MAC address");
}
return Nothing();
}
```
src/linux/routing/link/link.cpp
Lines 273 (patched)
<https://reviews.apache.org/r/75057/#comment314857>
looks like your editor is allowing trailing whitespace
src/linux/routing/link/link.cpp
Lines 263-265 (original), 282-284 (patched)
<https://reviews.apache.org/r/75057/#comment314855>
looks like this comment needs to move now
src/linux/routing/link/link.cpp
Line 267 (original), 286 (patched)
<https://reviews.apache.org/r/75057/#comment314856>
comment is now stale
because there's no close anymore we can just return ErrnoError()
- Benjamin Mahler
On June 19, 2024, 8:13 p.m., Jason Zhou wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/75057/
> -----------------------------------------------------------
>
> (Updated June 19, 2024, 8:13 p.m.)
>
>
> Review request for mesos and Benjamin Mahler.
>
>
> Bugs: MESOS-10243
> https://issues.apache.org/jira/browse/MESOS-10243
>
>
> Repository: mesos
>
>
> Description
> -------
>
> It seems that there are scenarios where mesos containers cannot communicate
> with agents as the MAC addresses are set incorrectly, leading to dropped
> packets. A workaround for this behavior is to check that the MAC address is
> set correctly after the ioctl call, and retry the address setting if
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue,
> but does not seem to prevent all such failures.
>
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
> address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
> other even before attempting to set our desired MAC address. As such, we
> check that the results agree before we set, and log a warning if we find
> a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
> a garbage value after setMAC has already completed and checked that the
> mac address was set correctly. Since this error happens after this
> function has finished, we cannot log nor detect it in setMAC.
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
> We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
> issue.
> CentOS 7 systems do not seem to have this issue with setMAC.
>
>
> Diffs
> -----
>
> src/linux/routing/link/link.cpp bff172dea63f77a2c7bafd7e3fbee5dca1dfbd51
>
>
> Diff: https://reviews.apache.org/r/75057/diff/7/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jason Zhou
>
>