[jira] [Created] (MESOS-10244) [MSVC] Mesos failed due to cannot open file 'libevent_pthreads.lib' and 'libevent_openssl.lib'
QuellaZhang created MESOS-10244: --- Summary: [MSVC] Mesos failed due to cannot open file 'libevent_pthreads.lib' and 'libevent_openssl.lib' Key: MESOS-10244 URL: https://issues.apache.org/jira/browse/MESOS-10244 Project: Mesos Issue Type: Bug Environment: VS2022 17.11.3 + windows Reporter: QuellaZhang Hi All, We tried to build latest source code of Mesos on VS2022. It failed due to cannot open file 'libevent_pthreads.lib' and 'libevent_openssl.lib'. It can be reproduced on latest commit c1b42f7 on master branch. Could you please take a look at this isssue? Thanks a lot! *Reproduce steps:* # git clone https://github.com/apache/mesos F:\gitP\apache\mesos # Open a VS 2022 x64 command prompt as admin and browse to F:\gitP\apache\mesos # mkdir build_amd64 && pushd build_amd64 # set OPENSSL_ROOT_DIR=F:\Microsoft\vcpkg\installed\x64-windows # cmake -G "Visual Studio 17 2022" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.22621.0 -DENABLE_LIBEVENT=1 -DENABLE_SSL=1 -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="C:\Program Files\Git\usr\bin" -T host=x64 .. # msbuild /m /p:Platform=x64 /p:Configuration=Debug Mesos.sln /t:Rebuild *Error:* "F:\gitP\apache\mesos\build_amd64\src\tests\mesos-tests.vcxproj.metaproj" (Rebuild target) (53) -> "F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj.metaproj" (Rebuild target) (54) -> "F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj" (Rebuild target) (104) -> LINK : fatal error LNK1104: cannot open file 'libevent_pthreads.lib' [F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj] LINK : fatal error LNK1104: cannot open file 'libevent_pthreads.lib' [F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj] "F:\gitP\apache\mesos\build_amd64\Mesos.sln" (Rebuild target) (1) -> "F:\gitP\apache\mesos\build_amd64\src\tests\mesos-tests.vcxproj.metaproj" (Rebuild target) (52) -> "F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj.metaproj" (Rebuild target) (53) -> "F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj" (Rebuild target) (103) -> LINK : fatal error LNK1104: cannot open file 'libevent_openssl.lib' [C:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj] LINK : fatal error LNK1104: cannot open file 'libevent_openssl.lib' [C:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866189#comment-17866189 ] Jason Zhou edited comment on MESOS-10243 at 7/15/24 10:26 PM: -- Landed fix for peer link MAC setting on creation: {code:java} commit 70d9da223a204dc89431aa7b26db7beeb53a9f3c Author: Jason Zhou Date: Mon Jul 15 18:02:54 2024 -0400 [veth] Provide the ability to set veth peer link MAC address on creation. This addresses the previous todo where we want to set the MAC address of the peer link when we are creating a veth pair so that we can avoid the race condition we are racing against udev to see who will set the MAC address of the interface last. See: https://reviews.apache.org/r/75087/ See: https://issues.apache.org/jira/browse/MESOS-10243 Review: https://reviews.apache.org/r/75090/ {code} was (Author: JIRAUSER305893): Landed fix for peer link MAC setting on creation: {code:java} commit 70d9da223a204dc89431aa7b26db7beeb53a9f3c Author: Jason Zhou Date: Mon Jul 15 18:02:54 2024 -0400 [veth] Provide the ability to set veth peer link MAC address on creation. This addresses the previous todo where we want to set the MAC address of the peer link when we are creating a veth pair so that we can avoid the race condition we are racing against udev to see who will set the MAC address of the interface last. See: https://reviews.apache.org/r/75087/ See: https://issues.apache.org/jira/browse/MESOS-10243 Review: https://reviews.apache.org/r/75090/ {code} > MAC Address changes from link::setMAC may not stick, leading to container > launch failure with port mapping isolator. > > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Jason Zhou >Assignee: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866189#comment-17866189 ] Jason Zhou commented on MESOS-10243: Landed fix for peer link MAC setting on creation: {code:java} commit 70d9da223a204dc89431aa7b26db7beeb53a9f3c Author: Jason Zhou Date: Mon Jul 15 18:02:54 2024 -0400 [veth] Provide the ability to set veth peer link MAC address on creation. This addresses the previous todo where we want to set the MAC address of the peer link when we are creating a veth pair so that we can avoid the race condition we are racing against udev to see who will set the MAC address of the interface last. See: https://reviews.apache.org/r/75087/ See: https://issues.apache.org/jira/browse/MESOS-10243 Review: https://reviews.apache.org/r/75090/ {code} > MAC Address changes from link::setMAC may not stick, leading to container > launch failure with port mapping isolator. > > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Jason Zhou >Assignee: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074 ] Jason Zhou edited comment on MESOS-10243 at 7/15/24 5:31 PM: - Update: We have discovered that in systems with systemd version above 242, there is a potential data race where udev will try to update the MAC address of the device at the same time as us if the systemd's MacAddressPolicy is set to 'persistent'. To prevent udev from trying to set the veth device's MAC address by itself, we must set the device MAC address on creation so that addr_assign_type will be set to NET_ADDR_SET, which prevents udev from attempting to change the MAC address of the veth device. The reason we do not see this issue on CentOS 7 systems is because CentOS 7 uses systemd 219 and CentOS 9 is using 255, so it does not have the default MacAddressPolicy. see: [https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6] see: [https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/] Patch for avoiding race condition for veth link: [https://reviews.apache.org/r/75086/] Todo: also avoid race condition for the created peer link: [https://reviews.apache.org/r/75087/] was (Author: JIRAUSER305893): Update: We have discovered that in systems with systemd version above 242, there is a potential data race where udev will try to update the MAC address of the device at the same time as us if the systemd's MacAddressPolicy is set to 'persistent'. To prevent udev from trying to set the veth device's MAC address by itself, we must set the device MAC address on creation so that addr_assign_type will be set to NET_ADDR_SET, which prevents udev from attempting to change the MAC address of the veth device. see: [https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6] see: [https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/] Patch for avoiding race condition for veth link: [https://reviews.apache.org/r/75086/] Todo: also avoid race condition for the created peer link: [https://reviews.apache.org/r/75087/] > MAC Address changes from link::setMAC may not stick, leading to container > launch failure with port mapping isolator. > > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Jason Zhou >Assignee: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866134#comment-17866134 ] Benjamin Mahler commented on MESOS-10243: - Landed fix for host network namespace veth interface. Let's leave this open and mark as fixed once we also set the container network namespace eth0 interface's mac address on creation / update the script to stop setting it. > MAC Address changes from link::setMAC may not stick, leading to container > launch failure with port mapping isolator. > > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.11.0 >Reporter: Jason Zhou >Assignee: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (MESOS-10243) MAC Address changes from link::setMAC may not stick
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Zhou reassigned MESOS-10243: -- Assignee: Jason Zhou > MAC Address changes from link::setMAC may not stick > --- > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Reporter: Jason Zhou >Assignee: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (MESOS-10243) MAC Address changes from link::setMAC may not stick
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074 ] Jason Zhou edited comment on MESOS-10243 at 7/15/24 1:43 PM: - Update: We have discovered that in systems with systemd version above 242, there is a potential data race where udev will try to update the MAC address of the device at the same time as us if the systemd's MacAddressPolicy is set to 'persistent'. To prevent udev from trying to set the veth device's MAC address by itself, we must set the device MAC address on creation so that addr_assign_type will be set to NET_ADDR_SET, which prevents udev from attempting to change the MAC address of the veth device. see: [https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6] see: [https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/] Patch for avoiding race condition for veth link: [https://reviews.apache.org/r/75086/] Todo: also avoid race condition for the created peer link: [https://reviews.apache.org/r/75087/] was (Author: JIRAUSER305893): Update: We have discovered that in systems with systemd version above 242, there is a potential data race where udev will try to update the MAC address of the device at the same time as us if the systemd's MacAddressPolicy is set to 'persistent'. To prevent udev from trying to set the veth device's MAC address by itself, we must set the device MAC address on creation so that addr_assign_type will be set to NET_ADDR_SET, which prevents udev from attempting to change the MAC address of the veth device. see: [https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6] see: [https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/] Patch for avoiding race condition: [https://reviews.apache.org/r/75086/] Todo: also avoid race condition for the created peer link: [https://reviews.apache.org/r/75087/] > MAC Address changes from link::setMAC may not stick > --- > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Reporter: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10243) MAC Address changes from link::setMAC may not stick
[ https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074 ] Jason Zhou commented on MESOS-10243: Update: We have discovered that in systems with systemd version above 242, there is a potential data race where udev will try to update the MAC address of the device at the same time as us if the systemd's MacAddressPolicy is set to 'persistent'. To prevent udev from trying to set the veth device's MAC address by itself, we must set the device MAC address on creation so that addr_assign_type will be set to NET_ADDR_SET, which prevents udev from attempting to change the MAC address of the veth device. see: [https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6] see: [https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/] Patch for avoiding race condition: [https://reviews.apache.org/r/75086/] Todo: also avoid race condition for the created peer link: [https://reviews.apache.org/r/75087/] > MAC Address changes from link::setMAC may not stick > --- > > Key: MESOS-10243 > URL: https://issues.apache.org/jira/browse/MESOS-10243 > Project: Mesos > Issue Type: Bug >Reporter: Jason Zhou >Priority: Major > > It seems that there are scenarios where mesos containers cannot communicate > with agents as the MAC addresses are set incorrectly, leading to dropped > packets. A workaround for this behavior is to check that the MAC address is > set correctly after the ioctl call, and retry the address setting if > necessary. > In our test, this workaround appears to reduce the frequency of this issue, > but does not seem to prevent all such failures. > Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] > Observed scenarios with incorrectly assigned MAC addresses: > 1. ioctl returns the correct MAC address, but not net::mac > 2. both net::mac and ioctl return the same MAC address, but are both wrong > 3. There are no cases where ioctl/net::mac come back with the same MAC > address as before setting. i.e. there is no no-op observed. > 4. There is a possibility that ioctl/net::mac results disagree with each > other even before attempting to set our desired MAC address. As such, we > check that the results agree before we set, and log a warning if we find > a mismatch > 5. There is a possibility that the MAC address we set ends up overwritten by > a garbage value after setMAC has already completed and checked that the > mac address was set correctly. Since this error happens after this > function has finished, we cannot log nor detect it in setMAC. Our > workaround cannot deal with this scenario as it occurs outside setMAC > Notes: > 1. We have observed this behavior only on CentOS 9 systems at the moment, > We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this > issue. > CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-9045) LogZooKeeperTest.WriteRead can segfault
[ https://issues.apache.org/jira/browse/MESOS-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862647#comment-17862647 ] Benjamin Mahler commented on MESOS-9045: Very different case, but also a segfault: {noformat} [--] 2 tests from LogZooKeeperTest I0703 02:50:24.968773 185149 zookeeper.cpp:82] Using Java classpath: -Djava.class.path=/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/zookeeper-3.4.8.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/log4j-1.2.16.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/jline-0.9.94.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/slf4j-log4j12-1.6.1.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/netty-3.7.0.Final.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/slf4j-api-1.6.1.jar [ RUN ] LogZooKeeperTest.WriteRead I0703 02:50:25.058761 185149 jvm.cpp:590] Looking up method (Ljava/lang/String;)V I0703 02:50:25.059170 185149 jvm.cpp:590] Looking up method deleteOnExit()V I0703 02:50:25.060112 185149 jvm.cpp:590] Looking up method (Ljava/io/File;Ljava/io/File;)V log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.persistence.FileTxnSnapLog). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. I0703 02:50:25.206512 185149 jvm.cpp:590] Looking up method ()V I0703 02:50:25.207417 185149 jvm.cpp:590] Looking up method (Lorg/apache/zookeeper/server/persistence/FileTxnSnapLog;Lorg/apache/zookeeper/server/ZooKeeperServer$DataTreeBuilder;)V *** Aborted at 1719975025 (unix time) try "date -d @1719975025" if you are using GNU date *** PC: @ 0x7f5edf914ccd OopStorage::Block::release_entries() *** SIGSEGV (@0x238) received by PID 185149 (TID 0x7f5f5a15cb40) from PID 568; stack trace: *** @ 0x7f5edf923929 os::Linux::chained_handler() @ 0x7f5edf92963b JVM_handle_linux_signal @ 0x7f5edf91c1dc signalHandler() @ 0x7f5f5b7af420 (unknown) @ 0x7f5edf914ccd OopStorage::Block::release_entries() @ 0x7f5edf914f26 OopStorage::release() @ 0x7f5edf617b21 jni_DeleteGlobalRef @ 0x7f5f6aeffaf2 JNIEnv_::DeleteGlobalRef() @ 0x7f5f6aefdc3a Jvm::deleteGlobalRef() @ 0x55f0ed05a2ea Jvm::Object::~Object() @ 0x55f0ed05f110 org::apache::zookeeper::server::ZooKeeperServer::DataTreeBuilder::~DataTreeBuilder() @ 0x55f0ed061a14 org::apache::zookeeper::server::ZooKeeperServer::BasicDataTreeBuilder::~BasicDataTreeBuilder() @ 0x55f0ed05da2f mesos::internal::tests::ZooKeeperTestServer::ZooKeeperTestServer() @ 0x55f0eba05ef6 mesos::internal::tests::ZooKeeperTest::ZooKeeperTest() @ 0x55f0eba0823f mesos::internal::tests::LogZooKeeperTest::LogZooKeeperTest() @ 0x55f0eba08350 mesos::internal::tests::LogZooKeeperTest_WriteRead_Test::LogZooKeeperTest_WriteRead_Test() @ 0x55f0eba73252 testing::internal::TestFactoryImpl<>::CreateTest() @ 0x55f0ed0a42bc testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x55f0ed09d88d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x55f0ed079ac5 testing::TestInfo::Run() @ 0x55f0ed07a1cd testing::TestCase::Run() @ 0x55f0ed081567 testing::internal::UnitTestImpl::RunAllTests() @ 0x55f0ed0a54ea testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x55f0ed09e3f3 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x55f0ed08007f testing::UnitTest::Run() @ 0x55f0eba90e05 RUN_ALL_TESTS() @ 0x55f0eba907cc main @ 0x7f5f5b5cd083 __libc_start_main @ 0x55f0eaad675e _start {noformat} > LogZooKeeperTest.WriteRead can segfault > --- > > Key: MESOS-9045 > URL: https://issues.apache.org/jira/browse/MESOS-9045 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.5.1 > Environment: macOS >Reporter: Jan Schlicht >Priority: Major > Labels: flaky-test, segfault > > The following segfault occured when testing the {{1.5.x}} branch (SHA > {{64341865d}}) on macOS: > {noformat} > [ RUN ] LogZooKeeperTest.WriteRead > I0702 00:49:46.259831 2560127808 jvm.cpp:590] Looking up method > (Ljava/lang/String;)V > I0702 00:49:46.260002 2560127808 jvm.cpp:590] Looking up method > deleteOnExit()V > I0702 00:49:46.260550 2560127808 jvm.cpp:590] Looking up method > (Ljava/io/File;Ljava/io/File;)V > log4j:WARN No appenders could be found for logger > (org.apache.zookeeper.server.persistence.FileTxnSnapLog). > log4j:WA
[jira] [Assigned] (MESOS-8867) CMake: Bundled libevent v2.1.5-beta doesn't compile with OpenSSL 1.1.0
[ https://issues.apache.org/jira/browse/MESOS-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-8867: -- Assignee: Jason Zhou > CMake: Bundled libevent v2.1.5-beta doesn't compile with OpenSSL 1.1.0 > -- > > Key: MESOS-8867 > URL: https://issues.apache.org/jira/browse/MESOS-8867 > Project: Mesos > Issue Type: Bug > Components: cmake > Environment: Fedora 28 with OpenSSL 1.1.0h, {{cmake -G Ninja -D > ENABLE_LIBEVENT=ON -D ENABLE_SSL=ON}} >Reporter: Jan Schlicht >Assignee: Jason Zhou >Priority: Major > > Compiling libevent 2.1.5 beta with OpenSSL 1.1.0 fails with errors like > {noformat} > /home/vagrant/mesos/build/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c: > In function ‘bio_bufferevent_new’: > /home/vagrant/mesos/build/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c:112:3: > error: dereferencing pointer to incomplete type ‘BIO’ {aka ‘struct bio_st’} > b->init = 0; >^~ > {noformat} > As this is the version currently bundled by CMake, builds with > {{ENABLE_LIBEVENT=ON, ENABLE_SSL=ON}} will fail to compile. > Libevent supports OpenSSL 1.1.0 beginning with v2.1.7-rc (see > https://github.com/libevent/libevent/pull/397) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (MESOS-10243) MAC Address changes from link::setMAC may not stick
Jason Zhou created MESOS-10243: -- Summary: MAC Address changes from link::setMAC may not stick Key: MESOS-10243 URL: https://issues.apache.org/jira/browse/MESOS-10243 Project: Mesos Issue Type: Bug Reporter: Jason Zhou It seems that there are scenarios where mesos containers cannot communicate with agents as the MAC addresses are set incorrectly, leading to dropped packets. A workaround for this behavior is to check that the MAC address is set correctly after the ioctl call, and retry the address setting if necessary. In our test, this workaround appears to reduce the frequency of this issue, but does not seem to prevent all such failures. Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] Observed scenarios with incorrectly assigned MAC addresses: 1. ioctl returns the correct MAC address, but not net::mac 2. both net::mac and ioctl return the same MAC address, but are both wrong 3. There are no cases where ioctl/net::mac come back with the same MAC address as before setting. i.e. there is no no-op observed. 4. There is a possibility that ioctl/net::mac results disagree with each other even before attempting to set our desired MAC address. As such, we check that the results agree before we set, and log a warning if we find a mismatch 5. There is a possibility that the MAC address we set ends up overwritten by a garbage value after setMAC has already completed and checked that the mac address was set correctly. Since this error happens after this function has finished, we cannot log nor detect it in setMAC. Our workaround cannot deal with this scenario as it occurs outside setMAC Notes: 1. We have observed this behavior only on CentOS 9 systems at the moment, We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this issue. CentOS 7 systems do not seem to have this issue with setMAC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (MESOS-10242) [MSVC] Mesos failed to build due to error C2039: 'cgroupsV2': is not a member of 'mesos::internal::tests::ContainerizerTest'
Zhaojun created MESOS-10242: --- Summary: [MSVC] Mesos failed to build due to error C2039: 'cgroupsV2': is not a member of 'mesos::internal::tests::ContainerizerTest' Key: MESOS-10242 URL: https://issues.apache.org/jira/browse/MESOS-10242 Project: Mesos Issue Type: Bug Environment: The commit of Mesos we used: 5d6d386 VS version: VS 2022 17.9.5 OS: Windows Server 2022 Reporter: Zhaojun Attachments: image-2024-05-11-09-50-42-947.png Mesos failed to build with MSVC on Windows due to below errors: src\tests\mesos.cpp(868,52): error C2039: 'cgroupsV2': is not a member of 'mesos::internal::tests::ContainerizerTest' src\tests\containerizer\memory_isolator_tests.cpp(155,37): error C2653: 'cgroups': is not a class or namespace name. Repro steps: # git clone [https://github.com/apache/mesos] C:\gitP\apache # set _CL_=/D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /wd4061 # mkdir C:\gitP\apache\mesos\build_amd64 and cd /d C:\gitP\apache\mesos\build_amd64 # set PATH=C:\Program Files\Git\usr\bin;%PATH% # cmake -G "Visual Studio 17 2022" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.22621.0 -DENABLE_LIBEVENT=1 -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="C:\Program Files\Git\usr\bin" -T host=x64 .. # msbuild /maxcpucount:4 /p:Platform=x64 /p:Configuration=Debug Mesos.sln /t:Rebuild Note: I tried to add '#ifdef __linux__' and '#endif' to file [https://github.com/apache/mesos/blob/master/src/tests/containerizer/memory_isolator_tests.cpp#L153:L167] to wrap lines 153-167 and update file [https://github.com/apache/mesos/blob/master/src/tests/mesos.cpp], move the code '#endif // __linux__' on line 865 to line 880, the errors disappeared. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-7685) Issue using S3FS from docker container with the mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840095#comment-17840095 ] Mykel Alvis commented on MESOS-7685: In your run command: {{docker run --cap-add SYS_ADMIN --device /dev/fuse}} blahblahblah > Issue using S3FS from docker container with the mesos containerizer > --- > > Key: MESOS-7685 > URL: https://issues.apache.org/jira/browse/MESOS-7685 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.1.0 >Reporter: Andrei Filip >Assignee: Jie Yu >Priority: Major > > I have a docker image which uses S3FS to mount an amazon S3 bucket for use as > a local filesystem. Playing around with this container manually, using > docker, i am able to use S3FS as expected. > When trying to use this image with the mesos containerizer, i get the > following error: > fuse: device not found, try 'modprobe fuse' first > The way i'm launching a job that runs this s3fs command is via the aurora > scheduler. Somehow it seems that docker is able to use the fuse kernel > plugin, but the mesos containerizer does not. > I've also created a stackoverflow topic about this issue here: > https://stackoverflow.com/questions/44569238/using-s3fs-in-a-docker-container-ran-by-the-mesos-containerizer/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-7187) Master can neglect to update agent metadata in a re-registration corner case.
[ https://issues.apache.org/jira/browse/MESOS-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839811#comment-17839811 ] Benjamin Mahler commented on MESOS-7187: Added a mitigation of the bug I commented on above: https://github.com/apache/mesos/pull/558 It does not fix the overall issue here due to a lack of a connection construct, but it prevents the agent from getting stuck sending TASK_DROPPED for all incoming tasks. > Master can neglect to update agent metadata in a re-registration corner case. > - > > Key: MESOS-7187 > URL: https://issues.apache.org/jira/browse/MESOS-7187 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Mahler >Priority: Major > Labels: tech-debt > > If the agent is re-registering with the master for the first time, the master > will drop any re-registration messages that arrive while the registry > operation is in progress. > These dropped messages can have different metadata (e.g. version, > capabilities, etc) that gets dropped. Since the master doesn't distinguish > between different instances of the agent (both share the same UPID and there > is no instance identifying information), the master can't tell whether this > is a retry from the original instance of the agent or a re-registration from > a new instance of the agent. > The following is an example: > (1) Master restarts. > (2) Agent re-registers with OLD_VERSION / OLD_CAPABILITIES. > (3) While registry operation is in progress, agent is upgraded and > re-registers with NEW_VERSION / NEW_CAPABILITIES. > (4) Registry operation completes, new agent receives the re-registration > acknowledgement message and so, does not retry. > (5) Now, the master's memory reflects OLD_VERSION / OLD_CAPABILITIES for the > agent which remains inconsistent until a later re-registration occurs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-7187) Master can neglect to update agent metadata in a re-registration corner case.
[ https://issues.apache.org/jira/browse/MESOS-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836731#comment-17836731 ] Benjamin Mahler commented on MESOS-7187: Observed an actual instance of this, occurred due to the following occurring: 1. ZK session expired 2. Master failover 3. Agent run 1 sends re-registration message to new master with UUID 1. 4. Agent fails over (for upgrade) 5. Agent run 2 sends re-registration message to new master 6. Master receives run 1 re-registration message. 7. Master ignores run 2 re-registration message (as agent is already re-registering). 8. Master completes re-registration and stores resource UUID 1 and notifies agent. 9. Agent receives re-registration completion, sends resource update with UUID 2. 10. Master *does not update* the agent's resource UUID (not because it ignores the update message, but because the logic simply doesn't make any update to it, which looks like a bug), so it remains UUID 1. At this point, any tasks launched on the agent will go to TASK_DROPPED due to "Task assumes outdated resource state". The agent must be restarted at this point to fix the issue. > Master can neglect to update agent metadata in a re-registration corner case. > - > > Key: MESOS-7187 > URL: https://issues.apache.org/jira/browse/MESOS-7187 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Mahler >Priority: Major > Labels: tech-debt > > If the agent is re-registering with the master for the first time, the master > will drop any re-registration messages that arrive while the registry > operation is in progress. > These dropped messages can have different metadata (e.g. version, > capabilities, etc) that gets dropped. Since the master doesn't distinguish > between different instances of the agent (both share the same UPID and there > is no instance identifying information), the master can't tell whether this > is a retry from the original instance of the agent or a re-registration from > a new instance of the agent. > The following is an example: > (1) Master restarts. > (2) Agent re-registers with OLD_VERSION / OLD_CAPABILITIES. > (3) While registry operation is in progress, agent is upgraded and > re-registers with NEW_VERSION / NEW_CAPABILITIES. > (4) Registry operation completes, new agent receives the re-registration > acknowledgement message and so, does not retry. > (5) Now, the master's memory reflects OLD_VERSION / OLD_CAPABILITIES for the > agent which remains inconsistent until a later re-registration occurs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-9752) ./configure fails in certain strange Cyrus SASL setups
[ https://issues.apache.org/jira/browse/MESOS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748913#comment-17748913 ] Ryan Carsten Schmidt commented on MESOS-9752: - Not realizing this had been reported already, I reported it again in bug MESOS-10241. > ./configure fails in certain strange Cyrus SASL setups > -- > > Key: MESOS-9752 > URL: https://issues.apache.org/jira/browse/MESOS-9752 > Project: Mesos > Issue Type: Task > Environment: MacOS X 10.13 > Cyrus SASL 2.1.27 installed through MacPorts >Reporter: David Gilman >Priority: Major > > I have an installation of Cyrus SASL that, for some unknown reason, has > duplicated SASL mechanisms installed. The crammd5_installed.c will print out > "found" for each CRAM-MD5 mechanism set up, resulting in output of > "foundfound" (once for each CRAM-MD5) which fails the Mesos ./configure test > which expects just "found". > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10241) checking SASL CRAM-MD5 support... configure: error: no
[ https://issues.apache.org/jira/browse/MESOS-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748912#comment-17748912 ] Ryan Carsten Schmidt commented on MESOS-10241: -- I just realized this was already reported to you in bug MESOS-9752 four years ago but nobody has commented on it yet. Perhaps it can be addressed now. > checking SASL CRAM-MD5 support... configure: error: no > -- > > Key: MESOS-10241 > URL: https://issues.apache.org/jira/browse/MESOS-10241 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.8.0, 1.11.0 > Environment: macOS 12.6.7 > Xcode 13.2.1 > Apple clang version 13.0.0 (clang-1300.0.29.30) > Cyrus SASL 2.1.28 >Reporter: Ryan Carsten Schmidt >Priority: Major > Attachments: mesos-crammd5-quoting.patch, mesos-crammd5-test.patch > > > mesos fails to configure: > {{checking SASL CRAM-MD5 support... configure: error: no}} > {{---}} > {{We need CRAM-MD5 support for SASL authentication.}} > {{---}} > The configure script is checking if a test program outputs the word "found", > but on my system, the program outputs "foundfound" so the test fails. The > simplest fix would be instead to check whether the test program outputs > anything at all, per the attached "test" patch. > Also, the configure check has incorrect syntax which was [introduced > here|https://github.com/apache/mesos/commit/c7d1e8055ea7c0cc6c01f2d7fca95a02b890d76b]: > the entire test program's code is not enclosed within square brackets. This > does not appear to cause a problem in autoconf 2.71 but it's probably best to > fix it before some future version of autoconf decides it is a problem, per > the attached "quoting" patch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10240) configure: error: cannot find libz
[ https://issues.apache.org/jira/browse/MESOS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748910#comment-17748910 ] Ryan Carsten Schmidt commented on MESOS-10240: -- For example, the attached patch changes it to check for just one of zlib's functions, and this works. Also, this bug report assumes bug MESOS-10241 has already been addressed; if not, that problem will be encountered first. > configure: error: cannot find libz > -- > > Key: MESOS-10240 > URL: https://issues.apache.org/jira/browse/MESOS-10240 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.8.0, 1.11.0 > Environment: macOS 12.6.7 > Xcode 13.2.1 > Apple clang version 13.0.0 (clang-1300.0.29.30) >Reporter: Ryan Carsten Schmidt >Priority: Major > Attachments: mesos-zlib.patch > > > mesos fails to configure: > {{checking for zlib.h... yes}} > {{checking for deflate, gzread, gzwrite, inflate in -lz... no}} > {{configure: error: cannot find libz}} > {{---}} > {{libz is required for Mesos to build.}} > {{---}} > I don't think [the zlib configure > check|https://github.com/apache/mesos/blob/8856d6fba11281df898fd65b0cafa1e20eb90fe8/configure.ac#L2301] > is correct. According to [autoconf > documentation|https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.71/html_node/Libraries.html#index-AC_005fCHECK_005fLIB-1], > the second argument of {{AC_CHECK_LIB}} is a function name, not a > comma-separated list of function names. It also says {{AC_CHECK_LIB}} "should > be avoided in some common cases", suggesting {{AC_SEARCH_LIBS}} be used > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (MESOS-10241) checking SASL CRAM-MD5 support... configure: error: no
Ryan Carsten Schmidt created MESOS-10241: Summary: checking SASL CRAM-MD5 support... configure: error: no Key: MESOS-10241 URL: https://issues.apache.org/jira/browse/MESOS-10241 Project: Mesos Issue Type: Bug Components: build Affects Versions: 1.11.0, 1.8.0 Environment: macOS 12.6.7 Xcode 13.2.1 Apple clang version 13.0.0 (clang-1300.0.29.30) Cyrus SASL 2.1.28 Reporter: Ryan Carsten Schmidt Attachments: mesos-crammd5-quoting.patch, mesos-crammd5-test.patch mesos fails to configure: {{checking SASL CRAM-MD5 support... configure: error: no}} {{---}} {{We need CRAM-MD5 support for SASL authentication.}} {{---}} The configure script is checking if a test program outputs the word "found", but on my system, the program outputs "foundfound" so the test fails. The simplest fix would be instead to check whether the test program outputs anything at all, per the attached "test" patch. Also, the configure check has incorrect syntax which was [introduced here|https://github.com/apache/mesos/commit/c7d1e8055ea7c0cc6c01f2d7fca95a02b890d76b]: the entire test program's code is not enclosed within square brackets. This does not appear to cause a problem in autoconf 2.71 but it's probably best to fix it before some future version of autoconf decides it is a problem, per the attached "quoting" patch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (MESOS-10240) configure: error: cannot find libz
Ryan Carsten Schmidt created MESOS-10240: Summary: configure: error: cannot find libz Key: MESOS-10240 URL: https://issues.apache.org/jira/browse/MESOS-10240 Project: Mesos Issue Type: Bug Components: build Affects Versions: 1.11.0, 1.8.0 Environment: macOS 12.6.7 Xcode 13.2.1 Apple clang version 13.0.0 (clang-1300.0.29.30) Reporter: Ryan Carsten Schmidt mesos fails to configure: {{checking for zlib.h... yes}} {{checking for deflate, gzread, gzwrite, inflate in -lz... no}} {{configure: error: cannot find libz}} {{---}} {{libz is required for Mesos to build.}} {{---}} I don't think [the zlib configure check|https://github.com/apache/mesos/blob/8856d6fba11281df898fd65b0cafa1e20eb90fe8/configure.ac#L2301] is correct. According to [autoconf documentation|https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.71/html_node/Libraries.html#index-AC_005fCHECK_005fLIB-1], the second argument of {{AC_CHECK_LIB}} is a function name, not a comma-separated list of function names. It also says {{AC_CHECK_LIB}} "should be avoided in some common cases", suggesting {{AC_SEARCH_LIBS}} be used instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10239) Installing Mesos on Oracle Linux 8.3
[ https://issues.apache.org/jira/browse/MESOS-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605513#comment-17605513 ] Charles Natali commented on MESOS-10239: Hi [~Mar_zieh], You don't need Python to install Mesos, unless you use Python bindings. If you're building from the source, you can just pass {{--disable-python}} as describe here: https://mesos.apache.org/documentation/latest/configuration/autotools/ Could you please details the error you're getting? > Installing Mesos on Oracle Linux 8.3 > > > Key: MESOS-10239 > URL: https://issues.apache.org/jira/browse/MESOS-10239 > Project: Mesos > Issue Type: Task >Reporter: Marzieh >Priority: Major > > some new versions of Linux like Oracle Linux 8, Redhat 8 , does not support > Python2 any more,however Mesos need to Python2. So, there is no way to > install Mesos in these environments. > Would you please make Mesos updated to be installed in new Linux > distributions? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (MESOS-10239) Installing Mesos on Oracle Linux 8.3
Marzieh created MESOS-10239: --- Summary: Installing Mesos on Oracle Linux 8.3 Key: MESOS-10239 URL: https://issues.apache.org/jira/browse/MESOS-10239 Project: Mesos Issue Type: Task Reporter: Marzieh some new versions of Linux like Oracle Linux 8, Redhat 8 , does not support Python2 any more,however Mesos need to Python2. So, there is no way to install Mesos in these environments. Would you please make Mesos updated to be installed in new Linux distributions? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601327#comment-17601327 ] Sangita Nalkar edited comment on MESOS-10234 at 9/7/22 2:14 PM: Thank you [~cf.natali] and [~qianzhang] for your response. was (Author: JIRAUSER282507): Thank you [~cf.natali] and [~qianzhang] for your response. Closing this issue. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601327#comment-17601327 ] Sangita Nalkar commented on MESOS-10234: Thank you [~cf.natali] and [~qianzhang] for your response. Closing this issue. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585253#comment-17585253 ] Qian Zhang edited comment on MESOS-10234 at 8/26/22 9:09 AM: - According to [https://blogs.apache.org/security/entry/cve-2021-44228|https://blogs.apache.org/security/entry/cve-2021-44228,] , it seems ZooKeeper is not affected by [CVE-2021-44228|https://www.cve.org/CVERecord?id=CVE-2021-44228]. was (Author: qianzhang): According to [https://blogs.apache.org/security/entry/cve-2021-44228,] it seems ZooKeeper is not affected by [CVE-2021-44228|https://www.cve.org/CVERecord?id=CVE-2021-44228]. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585253#comment-17585253 ] Qian Zhang commented on MESOS-10234: According to [https://blogs.apache.org/security/entry/cve-2021-44228,] it seems ZooKeeper is not affected by [CVE-2021-44228|https://www.cve.org/CVERecord?id=CVE-2021-44228]. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584433#comment-17584433 ] Charles Natali commented on MESOS-10234: Hi Sangita, if this is an issue for you, you can simply use whatever zookeeper version you want, you do not need to use the shipped one. We could update zookeeper separately, the shipped version is quite old and has some known bugs - [~qianzhang] what do you think? > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584019#comment-17584019 ] Sangita Nalkar commented on MESOS-10234: Hello, A gentle reminder since we are waiting for your response. Could you please provide an update if the log4j version shipped with zookeeper has been updated or would be updated in the coming release? Thanks and Regards, Sangita > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Peters reassigned MESOS-10230: -- Assignee: Andreas Peters > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Assignee: Andreas Peters >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581021#comment-17581021 ] Andreas Peters commented on MESOS-10230: Hi, the new JQuery version will be shipped out with a new currently planned Mesos release. But I understand your pain. If you like, I can show you how to replace is manually. We also have a Mesos Slack (or Matrix) channel ([https://mesos.apache.org/community/)|https://mesos.apache.org/community/).] if you need a quick help. :) Cheers, Andreas > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580877#comment-17580877 ] p engels commented on MESOS-10230: -- Hate to bring this up after all this time. The old installation of jQuery is still showing on the scanner for my organization. We are on the latest Mesos version. Is there anything needed on my end to remove that jQuery from the system? > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (MESOS-10238) Error while compiling directory mesos/build using make
Calvin Valentino Gosal created MESOS-10238: -- Summary: Error while compiling directory mesos/build using make Key: MESOS-10238 URL: https://issues.apache.org/jira/browse/MESOS-10238 Project: Mesos Issue Type: Bug Reporter: Calvin Valentino Gosal I'm building mesos on Ubuntu with version 20.4. I've followed the steps like the one on this page https://mesos.apache.org/documentation/latest/building/ But I encountered a problem when I compile it with the make command. The error is *make[4]: Entering directory '/home/calvin/mesos/build/3rdparty/grpc-1.10.0'* *[CXX] Compiling src/core/lib/gpr/log_linux.cc* *src/core/lib/gpr/log_linux.cc:42:13: error: ambiguating new declaration of 'long int gettid()'* *42 | static long gettid(void) \{ return syscall(__NR_gettid); }* *| ^~~~* *In file included from /usr/include/unistd.h:1170,* *from src/core/lib/gpr/log_linux.cc:40:* */usr/includes/x86_64-linux-gnu/bits/unistd_ext.h:34:16: note: old declaration '__pid_t gettid()'* *34 | extern __pid_t gettid(void) __THROW;* *| ^~~~* *src/core/lib/gpr/log_linux.cc:42:13: warning: 'long int gettid()' defined but not used [-Wunused-function]* *42 | static long gettid(void) \{ return syscall(__NR_gettid); }* Now, I stop here. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513217#comment-17513217 ] Sangita Nalkar commented on MESOS-10234: Hello [~cf.natali] , You mentioned the log4j version shipped with zookeeper would be updated, could you please provide any updates on same. Regards, Sangita > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10237) Mesos-slave issue report
[ https://issues.apache.org/jira/browse/MESOS-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512168#comment-17512168 ] feixiachao commented on MESOS-10237: Thanks [~cf.natali] , we dont have specific problem , just confuse about this warning log , > Mesos-slave issue report > - > > Key: MESOS-10237 > URL: https://issues.apache.org/jira/browse/MESOS-10237 > Project: Mesos > Issue Type: Bug >Reporter: feixiachao >Priority: Major > > we encountered an issue about mesos-slave , the mesos.ERROR log shown as > below: > E0323 22:56:03.278918 2848 memory.cpp:502] Listening on OOM events failed > for container ff408971-b610-4f84-bbc3-81b0c6be9499: Event listener is > terminating > E0323 22:58:06.018554 2834 memory.cpp:502] Listening on OOM events failed > for container 3afa2056-1976-4857-9121-cfad0f0ba73e: Event listener is > terminating > E0323 23:12:05.261996 2816 memory.cpp:502] Listening on OOM events failed > for container 56912877-5733-4050-bce8-0cc179cc0bc8: Event listener is > terminating > Could any someone to help for this issue ? > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10237) Mesos-slave issue report
[ https://issues.apache.org/jira/browse/MESOS-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512051#comment-17512051 ] Charles Natali commented on MESOS-10237: Hi [~feixiachao], Are you having a specific problem or just wondering about those error messages? Those errors are benign and can be ignored - they've actually been fixed in master: https://github.com/apache/mesos/commit/6bc5a5e114077f542f7258adffb78a54849ddf90 > Mesos-slave issue report > - > > Key: MESOS-10237 > URL: https://issues.apache.org/jira/browse/MESOS-10237 > Project: Mesos > Issue Type: Bug >Reporter: feixiachao >Priority: Major > > we encountered an issue about mesos-slave , the mesos.ERROR log shown as > below: > E0323 22:56:03.278918 2848 memory.cpp:502] Listening on OOM events failed > for container ff408971-b610-4f84-bbc3-81b0c6be9499: Event listener is > terminating > E0323 22:58:06.018554 2834 memory.cpp:502] Listening on OOM events failed > for container 3afa2056-1976-4857-9121-cfad0f0ba73e: Event listener is > terminating > E0323 23:12:05.261996 2816 memory.cpp:502] Listening on OOM events failed > for container 56912877-5733-4050-bce8-0cc179cc0bc8: Event listener is > terminating > Could any someone to help for this issue ? > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (MESOS-10237) Mesos-slave issue report
feixiachao created MESOS-10237: -- Summary: Mesos-slave issue report Key: MESOS-10237 URL: https://issues.apache.org/jira/browse/MESOS-10237 Project: Mesos Issue Type: Bug Reporter: feixiachao we encountered an issue about mesos-slave , the mesos.ERROR log shown as below: E0323 22:56:03.278918 2848 memory.cpp:502] Listening on OOM events failed for container ff408971-b610-4f84-bbc3-81b0c6be9499: Event listener is terminating E0323 22:58:06.018554 2834 memory.cpp:502] Listening on OOM events failed for container 3afa2056-1976-4857-9121-cfad0f0ba73e: Event listener is terminating E0323 23:12:05.261996 2816 memory.cpp:502] Listening on OOM events failed for container 56912877-5733-4050-bce8-0cc179cc0bc8: Event listener is terminating Could any someone to help for this issue ? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495409#comment-17495409 ] Sangita Nalkar edited comment on MESOS-10234 at 2/21/22, 9:15 AM: -- Thank you [~cf.natali] for your response. Please let me know once the version for log4j is updated in zookeeper. Regards, Sangita was (Author: JIRAUSER282507): Thank you [~cf.natali] for your response. Please let me know once the version for log4j version is updated in zookeeper. Regards, Sangita > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495409#comment-17495409 ] Sangita Nalkar commented on MESOS-10234: Thank you [~cf.natali] for your response. Please let me know once the version for log4j version is updated in zookeeper. Regards, Sangita > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (MESOS-10236) [/std:c++latest][MSVC] Mesos failed to build due to error C2440 with /std:c++latest on Windows using MSVC
QuellaZhang created MESOS-10236: --- Summary: [/std:c++latest][MSVC] Mesos failed to build due to error C2440 with /std:c++latest on Windows using MSVC Key: MESOS-10236 URL: https://issues.apache.org/jira/browse/MESOS-10236 Project: Mesos Issue Type: Bug Components: build Affects Versions: master Environment: VS 2019 + Windows Server 2016 Reporter: QuellaZhang Attachments: build.log Hi All, We tried to build Mesos on Windows with VS2019. It failed to build due to error C2440 with /std:c++latest on Windows using MSVC. It can be reproduced on latest reversion 97d9a40 on master branch. Could you please take a look at this isssue? Thanks a lot! Reproduce steps: # git clone https://github.com/apache/mesos F:\apache\mesos # Open a VS 2019 x64 command prompt as admin and browse to F:\apache\mesos # mkdir build_amd64 && pushd build_amd64 # cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.18362.0 -DENABLE_LIBEVENT=1 -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="F:\tools\gnuwin32\bin" -T host=x64 .. # set _CL_= /std:c++latest /Zc:char8_t- # set _CL_= /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING %_CL_% # msbuild /maxcpucount:4 /p:Platform=x64 /p:Configuration=Debug Mesos.sln /t:Rebuild error: F:\gitP\apache\mesos\build_amd64\3rdparty\protobuf-3.5.0\src\protobuf-3.5.0\src\google\protobuf\compiler\objectivec\objectivec_helpers.cc(618,1): error C2440: 'return': cannot convert from 'int' to 'std::basic_string,std::allocator>' F:\gitP\apache\mesos\build_amd64\3rdparty\protobuf-3.5.0\src\protobuf-3.5.0\src\google\protobuf\compiler\objectivec\objectivec_helpers.cc(746,1): error C2440: 'return': cannot convert from 'int' to 'std::basic_string,std::allocator>' F:\gitP\apache\mesos\build_amd64\3rdparty\protobuf-3.5.0\src\protobuf-3.5.0\src\google\protobuf\compiler\objectivec\objectivec_helpers.cc(818,1): error C2440: 'return': cannot convert from 'int' to 'std::basic_string,std::allocator>' C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(241,5): error MSB8066: Custom build for 'F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-mkdir.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-download.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-update.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-patch.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-configure.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-build.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-install.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\391b5defe1e7774ea47783fbd33671f6\protobuf-3.5.0-complete.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\3e040e27eb7e35ec141c0982bf4a7993\protobuf-3.5.0.rule;F:\gitP\apache\mesos\3rdparty\CMakeLists.txt' exited with code 1. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492857#comment-17492857 ] Charles Natali commented on MESOS-10234: Hi, I cannot see an explicit dependency on log4j v1.2.17 - are you sure the build is not picking up your system's version? Then again I'm really not familiar with the java bindings. Note that the only log4j which is shipped with Mesos is part of the zookeeper version packaged: {noformat} ./build/3rdparty/zookeeper-3.4.8/lib/slf4j-log4j12-1.6.1.jar ./build/3rdparty/zookeeper-3.4.8/lib/log4j-1.2.16.LICENSE.txt ./build/3rdparty/zookeeper-3.4.8/lib/log4j-1.2.16.jar ./build/3rdparty/zookeeper-3.4.8/src/java/lib/log4j-1.2.16.LICENSE.txt ./build/3rdparty/zookeeper-3.4.8/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties ./build/3rdparty/zookeeper-3.4.8/src/contrib/rest/conf/log4j.properties ./build/3rdparty/zookeeper-3.4.8/src/contrib/zooinspector/lib/log4j.properties ./build/3rdparty/zookeeper-3.4.8/conf/log4j.properties ./build/3rdparty/zookeeper-3.4.8/contrib/rest/lib/slf4j-log4j12-1.6.1.jar ./build/3rdparty/zookeeper-3.4.8/contrib/rest/lib/log4j-1.2.15.jar ./build/3rdparty/zookeeper-3.4.8/contrib/rest/conf/log4j.properties {noformat} I'm not sure if anyone uses the shipped version, but maybe we could update it, what do you think [~asekretenko]? Note that at work we experienced a zookeeper bug following a failover which IIRC caused some ephemeral nodes to not be deleted on the promoted leader, leading to inconsistencies in the Mesos registry - so updating could also solve this issue for whoever happens to use it. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492484#comment-17492484 ] Sangita Nalkar commented on MESOS-10234: Hello [~cf.natali] , [~asekretenko] , Could you please help in answering my query in above comment? Thanks, Sangita > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (MESOS-10235) v1 Operator API GET_MASTER is missing Capability.Type.QUOTA_V2
Dan Leary created MESOS-10235: - Summary: v1 Operator API GET_MASTER is missing Capability.Type.QUOTA_V2 Key: MESOS-10235 URL: https://issues.apache.org/jira/browse/MESOS-10235 Project: Mesos Issue Type: Bug Components: HTTP API Affects Versions: 1.11.0, 1.9.0 Environment: Ubuntu 18.04, Mesos 1.11.0 built from tarball. Reporter: Dan Leary A GET_MASTER call on the v1 HTTP Operator API at [http://master/api/v1] returns a master_info that is missing Capability.Type.QUOTA_V2. E.g.: {noformat} { "type" : "GET_MASTER", "get_master" : { "master_info" : { "capabilities" : [ { "type" : "AGENT_UPDATE" }, { "type" : "AGENT_DRAINING" }, {} ], etc... {noformat} I suspect that this change to include/mesos/mesos.proto: {noformat} 0bc857d672 ( Meng Zhu 2019-06-25 15:19:44 -0700 927) 0bc857d672 ( Meng Zhu 2019-06-25 15:19:44 -0700 928) // The master can handle the new quota API, which supports setting 0bc857d672 ( Meng Zhu 2019-06-25 15:19:44 -0700 929) // limits separately from guarantees (introduced in Mesos 1.9). 0bc857d672 ( Meng Zhu 2019-06-25 15:19:44 -0700 930) QUOTA_V2 = 3; {noformat} Is also needed in include/mesos/v1/mesos.proto. Consequently, enums org.apache.mesos.Protos.MasterInfo.Capability.Type and org.apache.mesos.v1.Protos.MasterInfo.Capability.Type differ in the same respect for those of us using protobufs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470538#comment-17470538 ] Sangita Nalkar commented on MESOS-10234: Hello, While building mesos from source, I see that log4j v1.2.17 is being used. Since you mentioned that example frameworks and tests might be affected due to log4j, do you plan to fix or update the log4j version? Thanks > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467809#comment-17467809 ] Sangita Nalkar commented on MESOS-10234: Thank you [~cf.natali] and [~asekretenko] for your response. Also [~cf.natali] please let me know if in case you find out anything else later. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467338#comment-17467338 ] Andrei Sekretenko commented on MESOS-10234: --- Talking about production code - I don't see how agent/master could be affected; the only potentially affected thing are the Java scheduler libraries. On a first glance there, it indeed looks like scheduler libraries do not use log4j. Which would mean that only example frameworks and tests might be affected. > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467229#comment-17467229 ] Charles Natali commented on MESOS-10234: Hi [~snalkar] Sorry for the delay, but Mesos has very little resources, and holiday season doesn't help. I've had a quick look, and log4j only seems to be used for tests - Mesos is mostly written in C++, so it's not surprising. It's possible it's used in some third-party dependencies included, but I'd be surprised if it was exploitable. I'll have a more thorough look after the holidays. Cheers, > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
[ https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466312#comment-17466312 ] Sangita Nalkar commented on MESOS-10234: Hello, Would appreciate it if you could respond to me on above query. Thanks, Sangita > CVE-2021-44228 Log4j vulnerability for apache mesos > --- > > Key: MESOS-10234 > URL: https://issues.apache.org/jira/browse/MESOS-10234 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.11.0 >Reporter: Sangita Nalkar >Priority: Critical > > Hi, > Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache > mesos. > We see that log4j v1.2.17 is used while building apache mesos from source. > Snippet from build logs: > std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF > jvm/org/apache/.deps/libjava_la-log4j.Tpo -c > ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o > jvm/org/apache/.libs/libjava_la-log4j.o > Thanks, > Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos
Sangita Nalkar created MESOS-10234: -- Summary: CVE-2021-44228 Log4j vulnerability for apache mesos Key: MESOS-10234 URL: https://issues.apache.org/jira/browse/MESOS-10234 Project: Mesos Issue Type: Bug Components: build Affects Versions: 1.11.0 Reporter: Sangita Nalkar Hi, Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache mesos. We see that log4j v1.2.17 is used while building apache mesos from source. Snippet from build logs: std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF jvm/org/apache/.deps/libjava_la-log4j.Tpo -c ../../src/jvm/org/apache/log4j.cpp -fPIC -DPIC -o jvm/org/apache/.libs/libjava_la-log4j.o Thanks, Sangita -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (MESOS-10233) containers were not cleaned up properly and left running.
Tan N. Le created MESOS-10233: - Summary: containers were not cleaned up properly and left running. Key: MESOS-10233 URL: https://issues.apache.org/jira/browse/MESOS-10233 Project: Mesos Issue Type: Task Environment: aurora-scheduler 0.25.0 mesos 1.11.0 executor plugin: DCE [https://github.com/paypal/dce-go] based on mesos-go v0.002 Reporter: Tan N. Le We observe that tasks were in STARTING and mesos tried to killed and cleaned them up dueo to OOM. however, cgroup freezer files are not there and it assumes the containers are being cleaned. the containers left running but the tasks reported lost in aurora/mesos. aurora logs I1026 05:16:55.886 [TaskEventBatchWorker, StateMachine] -b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition INIT -> PENDING I1026 05:16:55.984 [TaskGroupBatchWorker, StateMachine] mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition PENDING -> ASSIGNED I1026 05:16:55.984 [TaskGroupBatchWorker, TaskAssignerImpl] Offer on agent gpma771518.gpf-prod.us-central1.gcp.dev.paypalinc.com (id a76961ab-bba0-46e5-ae7b-b234057b7a33-S307) is being assigned task for mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728. I1026 05:16:57.402 [Thread-1715969, MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for task mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 in state TASK_STARTING from SOURCE_EXECUTOR I1026 05:16:57.402 [TaskStatusHandlerImpl, StateMachine] mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition ASSIGNED -> STARTING W1026 05:18:03.376 [Thread-1717148, MesosCallbackHandler$MesosCallbackHandlerImpl] Lost executor compose-mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 on slave a76961ab-bba0-46e5-ae7b-b234057b7a33-S307 with status -1 I1026 05:18:03.377 [Thread-1717149, MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for task mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 in state TASK_FAILED from SOURCE_AGENT with REASON_EXECUTOR_TERMINATED: Abnormal executor termination: Failed to kill all processes in the container: Timed out after 1mins I1026 05:18:03.390 [TaskStatusHandlerImpl, StateMachine] mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition STARTING -> FAILED I1026 05:18:03.390 [TaskStatusHandlerImpl, StateManagerImpl] Task being rescheduled: mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 I1026 05:21:08.948 [Thread-1720928, MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for task mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 in state TASK_RUNNING from SOURCE_EXECUTOR I1026 05:21:08.949 [TaskStatusHandlerImpl, StateMachine] mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition FAILED -> RUNNING (not allowed) I1026 05:21:08.950 [TaskStatusHandlerImpl, StateMachine] mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition FAILED -> LOST (not allowed) = mesos-master logs I1026 05:16:55.991168 29973 master.cpp:3873] Adding executor 'compose-mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728' with resources cpus(allocated: aurora):0.1; mem(allocated: aurora):256 of framework 9f48d831-63e7-4556-86ab-463a69389e4d- (Aurora) at scheduler-bf829a38-5c60-46cb-82dc-9c7fc7be7130@10.180.52.175:8083 on agent a76961ab-bba0-46e5-ae7b-b234057b7a33-S307 at slave(1)@10.180.50.210:5051 (gpma771518.gpf-prod.us-central1.gcp.dev.paypalinc.com) I1026 05:16:55.991324 29973 master.cpp:3899] Adding task mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 with resources cpus(allocated: aurora):0.9; disk(allocated: aurora):100; mem(allocated: aurora):4096; ports(allocated: aurora):[10020-10020, 10076-10076, 10137-10137, 10139-10139, 10150-10150] of framework 9f48d831-63e7-4556-86ab-463a69389e4d- (Aurora) at scheduler-bf829a38-5c60-46cb-82dc-9c7fc7be7130@10.180.52.175:8083 on agent a76961ab-bba0-46e5-ae7b-b234057b7a33-S307 at slave(1)@10.180.50.210:5051 (gpma771518.gpf-prod.us-central1.gcp.dev.paypalinc.com) I1026 05:16:56.090255 29973 master.cpp:5035] Launching task mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 of framework 9f48d831-63e7-4556-86ab-463a69389e4d- (Aurora) at scheduler-bf829a38-5c60-46cb-82dc-9c7fc7be7130@10.180.52.175:8083 with resources [\{"allocation_info":{"role":"aurora"},"name":"cpus","scalar":\{"value":
[jira] [Assigned] (MESOS-9657) Launching a command task twice can crash the agent
[ https://issues.apache.org/jira/browse/MESOS-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Natali reassigned MESOS-9657: - Fix Version/s: 1.12.0 Assignee: Charles Natali Resolution: Fixed > Launching a command task twice can crash the agent > -- > > Key: MESOS-9657 > URL: https://issues.apache.org/jira/browse/MESOS-9657 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Assignee: Charles Natali >Priority: Major > Fix For: 1.12.0 > > > When launching a command task, we verify that the framework has no existing > executor for that task: > {noformat} > // We are dealing with command task; a new command executor will be > // launched. > CHECK(executor == nullptr); > {noformat} > and afterwards an executor is created with the same executor id as the task > id: > {noformat} > // (slave.cpp) > // Either the master explicitly requests launching a new executor > // or we are in the legacy case of launching one if there wasn't > // one already. Either way, let's launch executor now. > if (executor == nullptr) { > Try added = framework->addExecutor(executorInfo); > [...] > {noformat} > This means that if we relaunch the task with the same task id before the > executor is removed, it will crash the agent: > {noformat} > F0315 16:39:32.822818 38112 slave.cpp:2865] Check failed: executor == nullptr > *** Check failure stack trace: *** > @ 0x7feb29a407af google::LogMessage::Flush() > @ 0x7feb29a43c3f google::LogMessageFatal::~LogMessageFatal() > @ 0x7feb28a5a886 mesos::internal::slave::Slave::__run() > @ 0x7feb28af4f0e > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSK_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaISU_EERKSK_IbESG_SJ_SO_SS_SY_S11_EEvRKNS1_3PIDIT_EEMS13_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSE_OSH_OSM_OSQ_OSW_OSZ_S3_E_JSE_SH_SM_SQ_SW_SZ_St12_PlaceholderILi1EEclEOS3_ > @ 0x7feb2998a620 process::ProcessBase::consume() > @ 0x7feb29987675 process::ProcessManager::resume() > @ 0x7feb299a2d2b > _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvE3$_8E6_M_runEv > @ 0x7feb2632f523 (unknown) > @ 0x7feb25e40594 start_thread > @ 0x7feb25b73e6f __GI___clone > Aborted (core dumped) > {noformat} > Instead of crashing, the agent should just drop the task with an appropriate > error in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10231) Mesos master crashes during framework teardown
[ https://issues.apache.org/jira/browse/MESOS-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423868#comment-17423868 ] Andreas Peters commented on MESOS-10231: Can u show us the configuration how u start spark? Would be helpfull to try it out byself. > Mesos master crashes during framework teardown > -- > > Key: MESOS-10231 > URL: https://issues.apache.org/jira/browse/MESOS-10231 > Project: Mesos > Issue Type: Bug > Components: framework, master >Affects Versions: 1.9.0 > Environment: CentOS Linux release 7.9.2009 > Mesos version - 1.9.0 >Reporter: Divyansh Jamuaar >Priority: Major > > I have setup a Mesos cluster with a single Mesos Master and I submit spark > jobs to it in "cluster" mode. > After running few spark jobs correctly, the Mesos master crashes while trying > to shutdown one of the Spark frameworks with the following error - > > {code:java} > F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: > totalOfferedResources.filter(allocatedToRole).empty() > *** Check failure stack trace: *** > @ 0x7f1e024ded2e google::LogMessage::Fail() > @ 0x7f1e024dec8d google::LogMessage::SendToLog() > @ 0x7f1e024de637 google::LogMessage::Flush() > @ 0x7f1e024e191c google::LogMessageFatal::~LogMessageFatal() > @ 0x7f1dff93978d > mesos::internal::master::Framework::untrackUnderRole() > @ 0x7f1dffad004b mesos::internal::master::Master::removeFramework() > @ 0x7f1dfface859 mesos::internal::master::Master::teardown() > @ 0x7f1dffa8ba25 mesos::internal::master::Master::receive() > @ 0x7f1dffb2f1cf ProtobufProcess<>::handlerMutM<>() > @ 0x7f1dffbe6809 std::__invoke_impl<>() > @ 0x7f1dffbdae22 std::__invoke<>() > @ 0x7f1dffbc8079 > _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcS4_SD_St12_PlaceholderILi1EESO_ILi26__callIvJS8_SL_EJLm0ELm1ELm2ELm3T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x7f1dffbaaae5 std::_Bind<>::operator()<>() > @ 0x7f1dffb833c9 std::_Function_handler<>::_M_invoke() > @ 0x7f1dff330281 std::function<>::operator()() > @ 0x7f1dffb13329 ProtobufProcess<>::consume() > @ 0x7f1dffa85436 mesos::internal::master::Master::_consume() > @ 0x7f1dffa84ad5 mesos::internal::master::Master::consume() > @ 0x7f1dffafb9ae > _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE > @ 0x564c359f7002 process::ProcessBase::serve() > @ 0x7f1e023a7bbd process::ProcessManager::resume() > @ 0x7f1e023a407c > _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv > @ 0x7f1e023cf1ba > _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ > @ 0x7f1e023cd9c9 > _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_ > @ 0x7f1e023cc482 > _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0vSt12_Index_tupleIJXspT_EEE > @ 0x7f1e023cb53b > _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv > @ 0x7f1e023ca3c4 > _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_E6_M_runEv > @ 0x7f1e051f419d execute_native_thread_routine > @ 0x7f1df4200ea5 start_thread > @ 0x7f1df3f2996d __clone > {code} > > > It seems like an assertion check is failing which is categorized as fatal but > I am not able to figure out the root cause of this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423785#comment-17423785 ] Andreas Peters commented on MESOS-10230: Here comes the PR. :) https://github.com/apache/mesos/pull/411 > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423666#comment-17423666 ] Andreas Peters commented on MESOS-10230: My dry run on a server (master and agent) works well with jquery 3.6.0. Everything at the mesos ui still works. Nothing is broken, no error messages. I will build mesos to see if the build scripts missing sth. If everything is fine too, I will open a PR. [~cf.natali] : As I know, "mesos-site" is generated via jenkins. The source is the "mesos/site"! As I say, like I know. It does not mean I'm 100% sure. :) The website I will change later. [~pengels] security scanner will not be affected by that. Have a nice weekend Andreas > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10232) Old sandboxes not being GC'ed caused frequent Mesos GC
HAO SU created MESOS-10232: -- Summary: Old sandboxes not being GC'ed caused frequent Mesos GC Key: MESOS-10232 URL: https://issues.apache.org/jira/browse/MESOS-10232 Project: Mesos Issue Type: Bug Components: agent Reporter: HAO SU Customers reported that their logs (sandbox files) are missing soon after the job completes. Mesos agent logs indicate that the files were GC-ed within minutes of container exit. Checking the host, there were a lot of old sandboxes dating back to Jan 2020. These are occupying a lot of space (~88% of all sandbox usage) and likely causing frequent GC of recently running containers. Mesos does recognize these sandbox and try to schedule them for deletion {code:java} I0902 18:02:27.511576 467334 gc.cpp:95] Scheduling '/var/lib/mesos/meta/slaves/68caec4c-6ea5-44e7-9f8-fad1922d5-S162/frameworks/3dcc744f-016c-6579-9b82-6325402d2-/executors/fa00-29a3-4c47-95fd-808d52ac53-13-1' for gc -85.5641509780737weeks in the future {code} but the deletion seems to never happen. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10231) Mesos master crashes during framework teardown
Divyansh Jamuaar created MESOS-10231: Summary: Mesos master crashes during framework teardown Key: MESOS-10231 URL: https://issues.apache.org/jira/browse/MESOS-10231 Project: Mesos Issue Type: Bug Components: framework, master Affects Versions: 1.9.0 Environment: CentOS Linux release 7.9.2009 Mesos version - 1.9.0 Reporter: Divyansh Jamuaar I have setup a Mesos cluster with a single Mesos Master and I submit spark jobs to it in "cluster" mode. After running few spark jobs correctly, the Mesos master crashes while trying to shutdown one of the Spark frameworks with the following error - {code:java} F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: totalOfferedResources.filter(allocatedToRole).empty() *** Check failure stack trace: *** @ 0x7f1e024ded2e google::LogMessage::Fail() @ 0x7f1e024dec8d google::LogMessage::SendToLog() @ 0x7f1e024de637 google::LogMessage::Flush() @ 0x7f1e024e191c google::LogMessageFatal::~LogMessageFatal() @ 0x7f1dff93978d mesos::internal::master::Framework::untrackUnderRole() @ 0x7f1dffad004b mesos::internal::master::Master::removeFramework() @ 0x7f1dfface859 mesos::internal::master::Master::teardown() @ 0x7f1dffa8ba25 mesos::internal::master::Master::receive() @ 0x7f1dffb2f1cf ProtobufProcess<>::handlerMutM<>() @ 0x7f1dffbe6809 std::__invoke_impl<>() @ 0x7f1dffbdae22 std::__invoke<>() @ 0x7f1dffbc8079 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcS4_SD_St12_PlaceholderILi1EESO_ILi26__callIvJS8_SL_EJLm0ELm1ELm2ELm3T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7f1dffbaaae5 std::_Bind<>::operator()<>() @ 0x7f1dffb833c9 std::_Function_handler<>::_M_invoke() @ 0x7f1dff330281 std::function<>::operator()() @ 0x7f1dffb13329 ProtobufProcess<>::consume() @ 0x7f1dffa85436 mesos::internal::master::Master::_consume() @ 0x7f1dffa84ad5 mesos::internal::master::Master::consume() @ 0x7f1dffafb9ae _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE @ 0x564c359f7002 process::ProcessBase::serve() @ 0x7f1e023a7bbd process::ProcessManager::resume() @ 0x7f1e023a407c _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv @ 0x7f1e023cf1ba _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ @ 0x7f1e023cd9c9 _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_ @ 0x7f1e023cc482 _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0vSt12_Index_tupleIJXspT_EEE @ 0x7f1e023cb53b _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv @ 0x7f1e023ca3c4 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_E6_M_runEv @ 0x7f1e051f419d execute_native_thread_routine @ 0x7f1df4200ea5 start_thread @ 0x7f1df3f2996d __clone {code} It seems like an assertion check is failing which is categorized as fatal but I am not able to figure out the root cause of this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420736#comment-17420736 ] p engels commented on MESOS-10230: -- Thank you! > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420523#comment-17420523 ] Andreas Peters commented on MESOS-10230: No problem. :) I will. > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10198) Mesos-master service is activating state
[ https://issues.apache.org/jira/browse/MESOS-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420368#comment-17420368 ] Charles Natali commented on MESOS-10198: [~kiranjshetty] I assume you've since moved on, so unless there is an update to this ticket soon, I will close. Cheers, > Mesos-master service is activating state > > > Key: MESOS-10198 > URL: https://issues.apache.org/jira/browse/MESOS-10198 > Project: Mesos > Issue Type: Task >Affects Versions: 1.9.0 >Reporter: Kiran J Shetty >Priority: Major > > mesos-master service showing activating state on all 3 master node and which > intern making marathon to restart frequently . in logs I can see below entry. > Mesos-master logs: > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9 > mesos::internal::log::ReplicaProcess::ReplicaProcess() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854 > mesos::internal::log::Replica::Replica() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65 > mesos::internal::log::LogProcess::LogProcess() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34 > mesos::log::Log::Log() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a8207 > __libc_start_main > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown) > Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process > exited, code=killed, status=6/ABRT > Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered > failed state. > Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed. > Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time > over, scheduling restart. > Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master. > Nov 12 08:36:49 servername systemd[1]: Started Mesos Master. > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024 > logging.cpp:201] INFO level logging started! > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024 > main.cpp:243] Build: 2019-10-21 12:10:14 by centos > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024 > main.cpp:244] Version: 1.9.0 > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024 > main.cpp:247] Git tag: 1.9.0 > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024 > main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024 > main.cpp:345] Using 'hierarchical' allocator > Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: > ./db/skiplist.h:344: void leveldb::SkipList::Insert(const > Key&) [with Key = const char*; Comparator = > leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key, > x->key)' failed. > Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 > (unix time) try "date -d @1605150409" if you are using GNU date *** > Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 > __GI_raise > Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) > received by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: *** > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown) > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6 > __assert_fail_base > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e6252 > __GI___assert_fail > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3dc2 > leveldb::SkipList<>::Insert() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3735 > leveldb::MemTable::Add() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00168 > leveldb::WriteBatch::Iterate() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00424 > leveldb::WriteBatchInternal::InsertInto() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5ce8575 > leveldb::DBImpl::RecoverLogFile() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec0fc > leveldb::DBImpl::Recover() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec3fa > leveldb::DB::Open() > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fd
[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
[ https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420364#comment-17420364 ] Charles Natali commented on MESOS-10230: [~apeters] Would you be able to look at this? I think [~pengels] might be referring to https://github.com/apache/mesos/blob/master/src/webui/assets/libs/jquery-3.2.1.min.js Note however that we are also using jquery1.10.1 which is also affected: https://github.com/apache/mesos/blob/master/site/source/assets/js/jquery-1.10.1.min.js and in mesos-site: https://github.com/apache/mesos-site/blob/asf-site/content/assets/js/jquery-1.10.1.min.js I am absolutely not familiar with web development so even though I could probably update it I wouldn't know how to check if it broke anything. > Please update JQuery from 3.2.1 to 3.5.0+ > - > > Key: MESOS-10230 > URL: https://issues.apache.org/jira/browse/MESOS-10230 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.11.0 >Reporter: p engels >Priority: Minor > > JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple > cross-site-scripting vulnerabilities. More info can be found on JQuery's > website: > blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] > My organization's vulnerability scanner locates the out-of-date jquery at > this url (sanitized for security reasons): > [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] > > Please remove the old version of JQuery and replace it with version 3.5.0 or > greater. If this is already planned for a future release, please comment on > this request with the version this will be fixed in. > > Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10228) My current problem is that after mesos-Agent added the option to support GPU, starting Docker through Marathon cannot succeed
[ https://issues.apache.org/jira/browse/MESOS-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420361#comment-17420361 ] Charles Natali commented on MESOS-10228: Hi [~barrylee], It's not clear to me if this is linked to the other issue you opened: https://issues.apache.org/jira/browse/MESOS-10227 Note that Marathon is a project distinct from Mesos, so you might want to report it with them (although I am not sure the project is still active). > My current problem is that after mesos-Agent added the option to support GPU, > starting Docker through Marathon cannot succeed > - > > Key: MESOS-10228 > URL: https://issues.apache.org/jira/browse/MESOS-10228 > Project: Mesos > Issue Type: Task > Components: agent, framework >Affects Versions: 1.11.0 >Reporter: barry lee >Priority: Major > Fix For: 1.11.0 > > Attachments: image-2021-08-19-19-22-51-456.png > > Original Estimate: 24h > Remaining Estimate: 24h > > My current problem is that after mesos-Agent added the option to support GPU, > starting Docker through Marathon cannot succeed. > mesos-agent \ > --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos > \ > --log_dir=/var/log/mesos \ > --containerizers=docker,mesos \ > --executor_registration_timeout=5mins \ > --hostname=192.168.10.19 \ > --ip=192.168.10.19 \ > --port=5051 \ > --work_dir=/var/lib/mesos \ > --image_providers=docker \ > —executor_environment_variables="{}" \ > --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia" > > In the MESos-Agent GPU option, this is useful when there is no GPU node. > > !image-2021-08-19-19-22-51-456.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10227) After mesos-agent starts, mesos-exeute fails to be executed using the GPU
[ https://issues.apache.org/jira/browse/MESOS-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420360#comment-17420360 ] Charles Natali commented on MESOS-10227: Hi [~barrylee], Sorry for the delay. Is this still a problem? The log you're providing is truncated, it would be useful to get: - the agent logs, when the task is started - the executor log > After mesos-agent starts, mesos-exeute fails to be executed using the GPU > - > > Key: MESOS-10227 > URL: https://issues.apache.org/jira/browse/MESOS-10227 > Project: Mesos > Issue Type: Task > Components: agent >Affects Versions: 1.11.0 > Environment: mesos-agent \ > --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos > \ > --log_dir=/var/log/mesos --containerizers=docker,mesos \ > --executor_registration_timeout=5mins \ > --hostname=192.168.10.19 \ > --ip=192.168.10.19 \ > --port=5051 \ > --work_dir=/var/lib/mesos \ > --image_providers=docker \ > —executor_environment_variables="{}" \ > --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia" > > > mesos-execute \ > --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos > \ > --name=gpu-test \ > --docker_image=nvidia/cuda \ > --command="nvidia-smi" \ > --framework_capabilities="GPU_RESOURCES" \ > --resources="gpus:1" > >Reporter: barry lee >Priority: Major > Fix For: 1.11.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > I0819 18:14:26.088129 9337 containerizer.cpp:3414] Transitioning the state of > container fab468e6-bcbd-499c-9c24-ccd572c8317b from PROVISIONING to > DESTROYING after 2.207289088secs > I0819 18:14:26.089609 9339 slave.cpp:7100] Executor 'gpu-test' of framework > d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 has terminated with unknown status > I0819 18:14:26.091435 9339 slave.cpp:5981] Handling status update TASK_FAILED > (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of > framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 from @0.0.0.0:0 > E0819 18:14:26.092530 9346 slave.cpp:6357] Failed to update resources for > container fab468e6-bcbd-499c-9c24-ccd572c8317b of executor 'gpu-test' running > task gpu-test on status update for terminal task, destroying container: > Container not found > W0819 18:14:26.092737 9341 composing.cpp:614] Attempted to destroy unknown > container fab468e6-bcbd-499c-9c24-ccd572c8317b > I0819 18:14:26.092895 9331 task_status_update_manager.cpp:328] Received task > status update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) > for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 > I0819 18:14:26.093626 9333 slave.cpp:6527] Forwarding the update TASK_FAILED > (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of > framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 to > master@192.168.10.192:5050 > I0819 18:14:26.102195 9342 slave.cpp:4310] Shutting down framework > d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 > I0819 18:14:26.102257 9342 slave.cpp:7218] Cleaning up executor 'gpu-test' of > framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 > I0819 18:14:26.102448 9332 gc.cpp:95] Scheduling > '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test/runs/fab468e6-bcbd-499c-9c24-ccd572c8317b' > for gc 6.988156days in the future > I0819 18:14:26.102600 9332 gc.cpp:95] Scheduling > '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test' > for gc 6.9881303111days in the future > I0819 18:14:26.102725 9342 slave.cpp:7347] Cleaning up framework > d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 > I0819 18:14:26.102805 9335 task_status_update_manager.cpp:289] Closing task > status update streams for framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 > I0819 18:14:26.102901 9342 gc.cpp:95] Scheduling > '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027' > for gc 6.9881020741days in the future > I0819 18:14:34.385221 9334 http.cpp:1436] HTTP GET for > /files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._67 > from 192.168.110.142:11640 with User-Agent='Mozill
[jira] [Created] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+
p engels created MESOS-10230: Summary: Please update JQuery from 3.2.1 to 3.5.0+ Key: MESOS-10230 URL: https://issues.apache.org/jira/browse/MESOS-10230 Project: Mesos Issue Type: Improvement Components: security Affects Versions: 1.11.0 Reporter: p engels JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple cross-site-scripting vulnerabilities. More info can be found on JQuery's website: blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/] My organization's vulnerability scanner locates the out-of-date jquery at this url (sanitized for security reasons): [http://example.com:5050/assets/libs/jquery-3.2.1.min.js] Please remove the old version of JQuery and replace it with version 3.5.0 or greater. If this is already planned for a future release, please comment on this request with the version this will be fixed in. Keep up the good work, Apache community <3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10229) [backport] Backport fixes to allow compilation of 1.11 on Ubuntu 20.04
Renan DelValle created MESOS-10229: -- Summary: [backport] Backport fixes to allow compilation of 1.11 on Ubuntu 20.04 Key: MESOS-10229 URL: https://issues.apache.org/jira/browse/MESOS-10229 Project: Mesos Issue Type: Task Affects Versions: 1.11.0 Reporter: Renan DelValle Two recently landed commits are necessary in order to compile Mesos 1.11 on Ubuntu 20.04. * [https://github.com/apache/mesos/commit/4ce33ca185fde0c6b258b85311fde3384e488f0d] * [https://github.com/apache/mesos/commit/7141572d64cc43d3aafe2b4f5de7492cc0803b78] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10228) My current problem is that after mesos-Agent added the option to support GPU, starting Docker through Marathon cannot succeed
barry lee created MESOS-10228: - Summary: My current problem is that after mesos-Agent added the option to support GPU, starting Docker through Marathon cannot succeed Key: MESOS-10228 URL: https://issues.apache.org/jira/browse/MESOS-10228 Project: Mesos Issue Type: Task Components: agent, framework Affects Versions: 1.11.0 Reporter: barry lee Fix For: 1.11.0 My current problem is that after mesos-Agent added the option to support GPU, starting Docker through Marathon cannot succeed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10227) After mesos-agent starts, mesos-exeute fails to be executed using the GPU
barry lee created MESOS-10227: - Summary: After mesos-agent starts, mesos-exeute fails to be executed using the GPU Key: MESOS-10227 URL: https://issues.apache.org/jira/browse/MESOS-10227 Project: Mesos Issue Type: Task Components: agent Affects Versions: 1.11.0 Environment: mesos-agent \ --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos \ --log_dir=/var/log/mesos --containerizers=docker,mesos \ --executor_registration_timeout=5mins \ --hostname=192.168.10.19 \ --ip=192.168.10.19 \ --port=5051 \ --work_dir=/var/lib/mesos \ --image_providers=docker \ —executor_environment_variables="{}" \ --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia" mesos-execute \ --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos \ --name=gpu-test \ --docker_image=nvidia/cuda \ --command="nvidia-smi" \ --framework_capabilities="GPU_RESOURCES" \ --resources="gpus:1" Reporter: barry lee Fix For: 1.11.0 I0819 18:14:26.088129 9337 containerizer.cpp:3414] Transitioning the state of container fab468e6-bcbd-499c-9c24-ccd572c8317b from PROVISIONING to DESTROYING after 2.207289088secs I0819 18:14:26.089609 9339 slave.cpp:7100] Executor 'gpu-test' of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 has terminated with unknown status I0819 18:14:26.091435 9339 slave.cpp:5981] Handling status update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 from @0.0.0.0:0 E0819 18:14:26.092530 9346 slave.cpp:6357] Failed to update resources for container fab468e6-bcbd-499c-9c24-ccd572c8317b of executor 'gpu-test' running task gpu-test on status update for terminal task, destroying container: Container not found W0819 18:14:26.092737 9341 composing.cpp:614] Attempted to destroy unknown container fab468e6-bcbd-499c-9c24-ccd572c8317b I0819 18:14:26.092895 9331 task_status_update_manager.cpp:328] Received task status update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 I0819 18:14:26.093626 9333 slave.cpp:6527] Forwarding the update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 to master@192.168.10.192:5050 I0819 18:14:26.102195 9342 slave.cpp:4310] Shutting down framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 I0819 18:14:26.102257 9342 slave.cpp:7218] Cleaning up executor 'gpu-test' of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 I0819 18:14:26.102448 9332 gc.cpp:95] Scheduling '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test/runs/fab468e6-bcbd-499c-9c24-ccd572c8317b' for gc 6.988156days in the future I0819 18:14:26.102600 9332 gc.cpp:95] Scheduling '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test' for gc 6.9881303111days in the future I0819 18:14:26.102725 9342 slave.cpp:7347] Cleaning up framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 I0819 18:14:26.102805 9335 task_status_update_manager.cpp:289] Closing task status update streams for framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 I0819 18:14:26.102901 9342 gc.cpp:95] Scheduling '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027' for gc 6.9881020741days in the future I0819 18:14:34.385221 9334 http.cpp:1436] HTTP GET for /files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._67 from 192.168.110.142:11640 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36' I0819 18:14:45.385519 9344 http.cpp:1436] HTTP GET for /files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6a from 192.168.110.142:11690 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36' I0819 18:14:56.381196 9334 http.cpp:1436] HTTP GET for /files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6d from 192.168.110.142:11716 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36' I0819 18:15:07.385897 9340 http.cpp:1436] HTTP GET for /files
[jira] [Commented] (MESOS-10200) cmake target "install" not available in 1.10.x branch
[ https://issues.apache.org/jira/browse/MESOS-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399644#comment-17399644 ] Andreas Peters commented on MESOS-10200: I test it some min. ago, is gone with the current master. > cmake target "install" not available in 1.10.x branch > - > > Key: MESOS-10200 > URL: https://issues.apache.org/jira/browse/MESOS-10200 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.10.0 > Environment: OS: Mac OS X Catalina (10.15.7). >Reporter: PRUDHVI RAJ MULAGAPATI >Priority: Major > Attachments: 10198.html > > > I am trying to build mesos on Mac OS X 10.15.7 (Catalina) following the > official documentation. While in 1.10.x branch cmake target "install" is not > found. However I was able to build and install with 3.11.x and master > branches. Listed below are the available targets as shown by cmake --target > help. > > cmake --build . --target install > make: *** No rule to make target `install'. Stop. > > cmake --build . --target help > The following are some of the valid targets for this Makefile: > ... all (the default if no target is provided) > ... clean > ... depend > ... edit_cache > ... package > ... package_source > ... rebuild_cache > ... test > ... boost-1.65.0 > ... check > ... concurrentqueue-7b69a8f > ... csi_v0-0.2.0 > ... csi_v1-1.1.0 > ... dist > ... distcheck > ... elfio-3.2 > ... glog-0.4.0 > ... googletest-1.8.0 > ... grpc-1.10.0 > ... http_parser-2.6.2 > ... leveldb-1.19 > ... libarchive-3.3.2 > ... libev-4.22 > ... make_bin_include_dir > ... make_bin_java_dir > ... make_bin_jni_dir > ... make_bin_src_dir > ... nvml-352.79 > ... picojson-1.3.0 > ... protobuf-3.5.0 > ... rapidjson-1.1.0 > ... tests > ... zookeeper-3.4.8 > ... balloon-executor > ... balloon-framework > ... benchmarks > ... disk-full-framework > ... docker-no-executor-framework > ... dynamic-reservation-framework > ... example > ... examplemodule > ... fixed_resource_estimator > ... inverse-offer-framework > ... libprocess-tests > ... load-generator-framework > ... load_qos_controller > ... logrotate_container_logger > ... long-lived-executor > ... long-lived-framework > ... mesos > ... mesos-agent > ... mesos-cli > ... mesos-cni-port-mapper > ... mesos-containerizer > ... mesos-default-executor > ... mesos-docker-executor > ... mesos-execute > ... mesos-executor > ... mesos-fetcher > ... mesos-io-switchboard > ... mesos-local > ... mesos-log > ... mesos-logrotate-logger > ... mesos-master > ... mesos-protobufs > ... mesos-resolve > ... mesos-tcp-connect > ... mesos-tests > ... mesos-usage > ... no-executor-framework > ... operation-feedback-framework > ... persistent-volume-framework > ... process > ... stout-tests > ... test-csi-user-framework > ... test-executor > ... test-framework > ... test-helper > ... test-http-executor > ... test-http-framework > ... test-linkee > ... testallocator > ... testanonymous > ... testauthentication > ... testauthorizer > ... testcontainer_logger > ... testhook > ... testhttpauthenticator > ... testisolator > ... testmastercontender > ... testmasterdetector > ... testqos_controller > ... testresource_estimator > ... uri_disk_profile_adaptor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10219) 1.11.0 does not build on Windows
[ https://issues.apache.org/jira/browse/MESOS-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399577#comment-17399577 ] Andreas Peters commented on MESOS-10219: Hi [~acecile555] how it it going with? > 1.11.0 does not build on Windows > > > Key: MESOS-10219 > URL: https://issues.apache.org/jira/browse/MESOS-10219 > Project: Mesos > Issue Type: Bug > Components: agent, build, cmake >Affects Versions: 1.11.0 >Reporter: acecile555 >Priority: Major > Attachments: mesos_slave_windows_longpath.png, > patch_1.10.0_windows_build.diff > > > Hello, > > I just tried building Mesos 1.11.0 on Windows and this is not working. > > The first issue is libarchive compilation that can be easily workarounded by > adding the following hunk to 3rdparty/libarchive-3.3.2.patch: > {noformat} > --- a/CMakeLists.txt > +++ b/CMakeLists.txt > @@ -137,7 +137,7 @@ ># This is added into CMAKE_C_FLAGS when CMAKE_BUILD_TYPE is "Debug" ># Enable level 4 C4061: The enumerate has no associated handler in a switch ># statement. > - SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /we4061") > + #SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /we4061") ># Enable level 4 C4254: A larger bit field was assigned to a smaller bit ># field. >SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /we4254") > {noformat} > Sadly it is failing later with issue I cannot solve myself: > {noformat} > C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot > open include file: 'csi/state.pb.h': No such file or directory (compiling > source file C:\Users\earthlab\mesos\src\slave\csi_server.cpp) > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > qos_controller.cpp > resource_estimator.cpp > slave.cpp > state.cpp > task_status_update_manager.cpp > sandbox.cpp > C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot > open include file: 'csi/state.pb.h': No such file or directory (compiling > source file C:\Users\earthlab\mesos\src\slave\slave.cpp) > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > composing.cpp > isolator.cpp > C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot > open include file: 'csi/state.pb.h': No such file or directory (compiling > source file C:\Users\earthlab\mesos\src\slave\task_status_update_manager.cpp) > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > isolator_tracker.cpp > launch.cpp > C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot > open include file: 'csi/state.pb.h': No such file or directory (compiling > source file C:\Users\earthlab\mesos\src\slave\containerizer\composing.cpp) > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > launcher.cpp > C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp(524,34): > error C2668: 'os::spawn': ambiguous call to overloaded function > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > C:\Users\earthlab\mesos\3rdparty\stout\include\stout/os/exec.hpp(52,20): > message : could be 'Option os::spawn(const std::string &,const > std::vector> &)' > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > with > [ > T=int > ] (compiling source file > C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp) > C:\Users\earthlab\mesos\3rdparty\stout\include\stout/os/windows/exec.hpp(412,20): > message : or 'Option os::spawn(const std::string &,const > std::vector> &,const > Option,std::allocator std::string,std::string>>>> &)' > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > with > [ > T=int > ] (compiling source file > C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp) > C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp(525,75): > message : while trying to match the argument list '(const char [3], > initializer list)' [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp(893,47): > error C2668: 'os::spawn': ambiguous call to overloaded function > [C:\Users\earthlab\mesos\build\src\mesos.vcxproj] > C:\Users\earthlab\mesos\3rdparty\stout\include\stout/os/exec.hpp(52,20): > message : could be 'Op
[jira] [Commented] (MESOS-10198) Mesos-master service is activating state
[ https://issues.apache.org/jira/browse/MESOS-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395290#comment-17395290 ] Charles Natali commented on MESOS-10198: Hi [~kiranjshetty], sorry for the delay, I know it's been a while. {noformat} Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: ./db/skiplist.h:344: void leveldb::SkipList::Insert(const Key&) [with Key = const char*; Comparator = leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key, x->key)' failed. {noformat} This points to a corruption of the on-disk leveldb database - it's been a long time, but do you remember if: - this specific error was present in all the masters logs? - did the hosts maybe crash prior to that? - I guess it's too late, but it would have been interesting to see the log of the first time the masters crashed Looking at our code, it's not clear to me what we could do to introduce a leveldb corruption - the only possibilities I can think of are a leveldb bug, or maybe in specific conditions some unrelated code ends up writing to the leveldb file descriptors, which could cause such a corruption. But having it occur across all masters seems very unlikely. > Mesos-master service is activating state > > > Key: MESOS-10198 > URL: https://issues.apache.org/jira/browse/MESOS-10198 > Project: Mesos > Issue Type: Task >Affects Versions: 1.9.0 >Reporter: Kiran J Shetty >Priority: Major > > mesos-master service showing activating state on all 3 master node and which > intern making marathon to restart frequently . in logs I can see below entry. > Mesos-master logs: > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9 > mesos::internal::log::ReplicaProcess::ReplicaProcess() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854 > mesos::internal::log::Replica::Replica() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65 > mesos::internal::log::LogProcess::LogProcess() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34 > mesos::log::Log::Log() > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a8207 > __libc_start_main > Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown) > Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process > exited, code=killed, status=6/ABRT > Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered > failed state. > Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed. > Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time > over, scheduling restart. > Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master. > Nov 12 08:36:49 servername systemd[1]: Started Mesos Master. > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024 > logging.cpp:201] INFO level logging started! > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024 > main.cpp:243] Build: 2019-10-21 12:10:14 by centos > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024 > main.cpp:244] Version: 1.9.0 > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024 > main.cpp:247] Git tag: 1.9.0 > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024 > main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e > Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024 > main.cpp:345] Using 'hierarchical' allocator > Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: > ./db/skiplist.h:344: void leveldb::SkipList::Insert(const > Key&) [with Key = const char*; Comparator = > leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key, > x->key)' failed. > Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 > (unix time) try "date -d @1605150409" if you are using GNU date *** > Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 > __GI_raise > Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) > received by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: *** > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown) > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort > Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6
[jira] [Commented] (MESOS-10200) cmake target "install" not available in 1.10.x branch
[ https://issues.apache.org/jira/browse/MESOS-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391782#comment-17391782 ] Charles Natali commented on MESOS-10200: [~apeters] It's not quite clear to me, is it still a problem in master? > cmake target "install" not available in 1.10.x branch > - > > Key: MESOS-10200 > URL: https://issues.apache.org/jira/browse/MESOS-10200 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.10.0 > Environment: OS: Mac OS X Catalina (10.15.7). >Reporter: PRUDHVI RAJ MULAGAPATI >Priority: Major > Attachments: 10198.html > > > I am trying to build mesos on Mac OS X 10.15.7 (Catalina) following the > official documentation. While in 1.10.x branch cmake target "install" is not > found. However I was able to build and install with 3.11.x and master > branches. Listed below are the available targets as shown by cmake --target > help. > > cmake --build . --target install > make: *** No rule to make target `install'. Stop. > > cmake --build . --target help > The following are some of the valid targets for this Makefile: > ... all (the default if no target is provided) > ... clean > ... depend > ... edit_cache > ... package > ... package_source > ... rebuild_cache > ... test > ... boost-1.65.0 > ... check > ... concurrentqueue-7b69a8f > ... csi_v0-0.2.0 > ... csi_v1-1.1.0 > ... dist > ... distcheck > ... elfio-3.2 > ... glog-0.4.0 > ... googletest-1.8.0 > ... grpc-1.10.0 > ... http_parser-2.6.2 > ... leveldb-1.19 > ... libarchive-3.3.2 > ... libev-4.22 > ... make_bin_include_dir > ... make_bin_java_dir > ... make_bin_jni_dir > ... make_bin_src_dir > ... nvml-352.79 > ... picojson-1.3.0 > ... protobuf-3.5.0 > ... rapidjson-1.1.0 > ... tests > ... zookeeper-3.4.8 > ... balloon-executor > ... balloon-framework > ... benchmarks > ... disk-full-framework > ... docker-no-executor-framework > ... dynamic-reservation-framework > ... example > ... examplemodule > ... fixed_resource_estimator > ... inverse-offer-framework > ... libprocess-tests > ... load-generator-framework > ... load_qos_controller > ... logrotate_container_logger > ... long-lived-executor > ... long-lived-framework > ... mesos > ... mesos-agent > ... mesos-cli > ... mesos-cni-port-mapper > ... mesos-containerizer > ... mesos-default-executor > ... mesos-docker-executor > ... mesos-execute > ... mesos-executor > ... mesos-fetcher > ... mesos-io-switchboard > ... mesos-local > ... mesos-log > ... mesos-logrotate-logger > ... mesos-master > ... mesos-protobufs > ... mesos-resolve > ... mesos-tcp-connect > ... mesos-tests > ... mesos-usage > ... no-executor-framework > ... operation-feedback-framework > ... persistent-volume-framework > ... process > ... stout-tests > ... test-csi-user-framework > ... test-executor > ... test-framework > ... test-helper > ... test-http-executor > ... test-http-framework > ... test-linkee > ... testallocator > ... testanonymous > ... testauthentication > ... testauthorizer > ... testcontainer_logger > ... testhook > ... testhttpauthenticator > ... testisolator > ... testmastercontender > ... testmasterdetector > ... testqos_controller > ... testresource_estimator > ... uri_disk_profile_adaptor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391736#comment-17391736 ] Charles Natali commented on MESOS-10226: Hm, it's annoying - the gdb backtrace you posted shows that the regtest gets stuck in this test, but for some reason running this test on its own isn't enough to reproduce it. It's going to be very difficult to debug without being able to run them myself. > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, > gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to synchronize with the task being > started, and if the task fails to start - in this case because we're trying > to launch an x86 container on an arm64 host - the test will just hang reading > from the pipe. > I send Martin a tentative fix for him to test, and I'll open an MR if > successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390279#comment-17390279 ] Martin Tzvetanov Grigorov commented on MESOS-10226: --- `sudo ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* --verbose` didn't hang but failed with: {code:java} ... 7-b49a-765ac4cd1729/backends/overlay/rootfses/ba3ccd6c-bacf-4d88-a4fc-5104ca45d19e' for container a6a3e1b5-4322-4b07-b49a-765ac4cd1729 I0730 05:32:29.134213 2249744 master.cpp:1149] Master terminating I0730 05:32:29.134589 2249739 hierarchical.cpp:1232] Removed all filters for agent 4c43d934-41d8-4159-9b03-2dfdeee3f386-S0 I0730 05:32:29.134629 2249739 hierarchical.cpp:1108] Removed agent 4c43d934-41d8-4159-9b03-2dfdeee3f386-S0 [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/4, where GetParam() = "quay.io/coreos/alpine-sh" (3751 ms) [--] 5 tests from ContainerImage/ProvisionerDockerTest (38953 ms total)[--] Global test environment tear-down [==] 5 tests from 1 test case ran. (38966 ms total) [ PASSED ] 0 tests. [ FAILED ] 5 tests, listed below: [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where GetParam() = "alpine" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/1, where GetParam() = "library/alpine" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/2, where GetParam() = "gcr.io/google-containers/busybox:1.24" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/3, where GetParam() = "gcr.io/google-containers/busybox:1.27" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/4, where GetParam() = "quay.io/coreos/alpine-sh" 5 FAILED TESTS I0730 05:32:29.176168 2249746 process.cpp:935] Stopped the socket accept loop {code} > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, > gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{nofo
[jira] [Comment Edited] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390152#comment-17390152 ] Charles Natali edited comment on MESOS-10226 at 7/29/21, 8:44 PM: -- Hm, I can't reproduce it. I updated the test to run the arm64 alpine image to cause it to fail in a similar way that it should be failing for you, and it's not hanging, but failing: {noformat} # ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* [ RUN ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0 sh: 1: hadoop: not found Marked '/' as rslave I0729 21:40:16.121507 434157 exec.cpp:164] Version: 1.12.0 I0729 21:40:16.136072 434156 exec.cpp:237] Executor registered on agent 48863f87-f283-42ab-bd93-f301fdfbd73b-S0 I0729 21:40:16.139089 434154 executor.cpp:190] Received SUBSCRIBED event I0729 21:40:16.139974 434154 executor.cpp:194] Subscribed executor on thinkpad I0729 21:40:16.140264 434154 executor.cpp:190] Received LAUNCH event I0729 21:40:16.141703 434154 executor.cpp:722] Starting task 1461a266-1ead-4bdf-9165-9c0f6c5938b8 I0729 21:40:16.147071 434154 executor.cpp:740] Forked command at 434163 Preparing rootfs at '/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634' Changing root to /tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634 Failed to execute '/bin/ls': Exec format error I0729 21:40:16.321754 434155 executor.cpp:1041] Command exited with status 1 (pid: 434163) ../../src/tests/containerizer/provisioner_docker_tests.cpp:785: Failure Expected: TASK_FINISHED To be equal to: statusFinished->state() Which is: TASK_FAILED I0729 21:40:16.333557 434157 exec.cpp:478] Executor asked to shutdown I0729 21:40:16.334996 434158 executor.cpp:190] Received SHUTDOWN event I0729 21:40:16.335037 434158 executor.cpp:843] Shutting down [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where GetParam() = "arm64v8/alpine" (5851 ms) {noformat} Could you try running {noformat} ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* --verbose {noformat} And see if it hangs, and post the result? Worst case we could just ignore the hang and update the test to use the arn64 image so it passes, but I'd like to understand why it hangs. was (Author: cf.natali): Hm, I can't reproduce it. I updated the test to run the arm64 alpine image to cause it to fail in a similar way that it should be failing for you, and it's not hanging, but failing: ``` # ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* [ RUN ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0 sh: 1: hadoop: not found Marked '/' as rslave I0729 21:40:16.121507 434157 exec.cpp:164] Version: 1.12.0 I0729 21:40:16.136072 434156 exec.cpp:237] Executor registered on agent 48863f87-f283-42ab-bd93-f301fdfbd73b-S0 I0729 21:40:16.139089 434154 executor.cpp:190] Received SUBSCRIBED event I0729 21:40:16.139974 434154 executor.cpp:194] Subscribed executor on thinkpad I0729 21:40:16.140264 434154 executor.cpp:190] Received LAUNCH event I0729 21:40:16.141703 434154 executor.cpp:722] Starting task 1461a266-1ead-4bdf-9165-9c0f6c5938b8 I0729 21:40:16.147071 434154 executor.cpp:740] Forked command at 434163 Preparing rootfs at '/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634' Changing root to /tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634 Failed to execute '/bin/ls': Exec format error I0729 21:40:16.321754 434155 executor.cpp:1041] Command exited with status 1 (pid: 434163) ../../src/tests/containerizer/provisioner_docker_tests.cpp:785: Failure Expected: TASK_FINISHED To be equal to: statusFinished->state() Which is: TASK_FAILED I0729 21:40:16.333557 434157 exec.cpp:478] Executor asked to shutdown I0729 21:40:16.334996 434158 executor.cpp:190] Received SHUTDOWN event I0729 21:40:16.335037 434158 executor.cpp:843] Shutting down [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where GetParam() = "arm64v8/alpine" (5851 ms) ``` Could you try running ``` ./bin/m
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390152#comment-17390152 ] Charles Natali commented on MESOS-10226: Hm, I can't reproduce it. I updated the test to run the arm64 alpine image to cause it to fail in a similar way that it should be failing for you, and it's not hanging, but failing: ``` # ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* [ RUN ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0 sh: 1: hadoop: not found Marked '/' as rslave I0729 21:40:16.121507 434157 exec.cpp:164] Version: 1.12.0 I0729 21:40:16.136072 434156 exec.cpp:237] Executor registered on agent 48863f87-f283-42ab-bd93-f301fdfbd73b-S0 I0729 21:40:16.139089 434154 executor.cpp:190] Received SUBSCRIBED event I0729 21:40:16.139974 434154 executor.cpp:194] Subscribed executor on thinkpad I0729 21:40:16.140264 434154 executor.cpp:190] Received LAUNCH event I0729 21:40:16.141703 434154 executor.cpp:722] Starting task 1461a266-1ead-4bdf-9165-9c0f6c5938b8 I0729 21:40:16.147071 434154 executor.cpp:740] Forked command at 434163 Preparing rootfs at '/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634' Changing root to /tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634 Failed to execute '/bin/ls': Exec format error I0729 21:40:16.321754 434155 executor.cpp:1041] Command exited with status 1 (pid: 434163) ../../src/tests/containerizer/provisioner_docker_tests.cpp:785: Failure Expected: TASK_FINISHED To be equal to: statusFinished->state() Which is: TASK_FAILED I0729 21:40:16.333557 434157 exec.cpp:478] Executor asked to shutdown I0729 21:40:16.334996 434158 executor.cpp:190] Received SHUTDOWN event I0729 21:40:16.335037 434158 executor.cpp:843] Shutting down [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where GetParam() = "arm64v8/alpine" (5851 ms) ``` Could you try running ``` ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* --verbose ``` And see if it hangs, and post the result? Worst case we could just ignore the hang and update the test to use the arn64 image so it passes, but I'd like to understand why it hangs. > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, > gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390106#comment-17390106 ] Martin Tzvetanov Grigorov commented on MESOS-10226: --- Hi [~cf.natali] ! It still hangs since 6 hours ago. This is the new thread dump - [^gdb-thread-apply-bt-all-29.07.2021-2.txt] > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, > gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to synchronize with the task being > started, and if the task fails to start - in this case because we're trying > to launch an x86 container on an arm64 host - the test will just hang reading > from the pipe. > I send Martin a tentative fix for him to test, and I'll open an MR if > successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390058#comment-17390058 ] Charles Natali edited comment on MESOS-10226 at 7/29/21, 6:09 PM: -- [~mgrigorov] Looking at the code corresponding to the backtrace, I don't think it should hang foreverm but only up to 10 minutes: {noformat} #13 0xb7ca1418 in AwaitAssertReady (expr=0xba1c1d58 "statusStarting", actual=..., duration=...) at ../../3rdparty/libprocess/include/process/gtest.hpp:126 #14 0xb97c588c in mesos::internal::tests::ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_Test::TestBody (this=0xcd4207a0) at ../../src/tests/containerizer/provisioner_docker_tests.cpp:782 {noformat} {noformat} AWAIT_READY_FOR(statusStarting, Minutes(10));{noformat} Are you sure it was stuck indefinitely and not just taking a long time? Also, it would help to have the output of running the tests with {{--verbose}}. was (Author: cf.natali): [~mgrigorov] Looking at the code corresponding to the backtrace, I don't think it should hang foreverm but only up to 10 minutes: {noformat} #13 0xb7ca1418 in AwaitAssertReady (expr=0xba1c1d58 "statusStarting", actual=..., duration=...) at ../../3rdparty/libprocess/include/process/gtest.hpp:126 #14 0xb97c588c in mesos::internal::tests::ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_Test::TestBody (this=0xcd4207a0) at ../../src/tests/containerizer/provisioner_docker_tests.cpp:782 {noformat} {noformat} AWAIT_READY_FOR(statusStarting, Minutes(10));{noformat} Are you sure it was stuck indefinitely and not just taking a long time? > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0x000
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390058#comment-17390058 ] Charles Natali commented on MESOS-10226: [~mgrigorov] Looking at the code corresponding to the backtrace, I don't think it should hang foreverm but only up to 10 minutes: {noformat} #13 0xb7ca1418 in AwaitAssertReady (expr=0xba1c1d58 "statusStarting", actual=..., duration=...) at ../../3rdparty/libprocess/include/process/gtest.hpp:126 #14 0xb97c588c in mesos::internal::tests::ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_Test::TestBody (this=0xcd4207a0) at ../../src/tests/containerizer/provisioner_docker_tests.cpp:782 {noformat} {noformat} AWAIT_READY_FOR(statusStarting, Minutes(10));{noformat} Are you sure it was stuck indefinitely and not just taking a long time? > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDeb
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390055#comment-17390055 ] Charles Natali commented on MESOS-10226: Thanks, I'll have a look - I hope there won't be too many hanging tests... > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to synchronize with the task being > started, and if the task fails to start - in this case because we're trying > to launch an x86 container on an arm64 host - the test will just hang reading > from the pipe. > I send Martin a tentative fix for him to test, and I'll open an MR if > successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389882#comment-17389882 ] Martin Tzvetanov Grigorov commented on MESOS-10226: --- Attached [^gdb-thread-apply-bt-all-29.07.2021.txt] > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to synchronize with the task being > started, and if the task fails to start - in this case because we're trying > to launch an x86 container on an arm64 host - the test will just hang reading > from the pipe. > I send Martin a tentative fix for him to test, and I'll open an MR if > successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389867#comment-17389867 ] Martin Tzvetanov Grigorov commented on MESOS-10226: --- The test properly fails now: {color:#500050}[--] Global test environment tear-down {color}[==] 34 tests from 2 test cases ran. (66593 ms total) [ PASSED ] 33 tests. [ FAILED ] 1 test, listed below: [ FAILED ] NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace But 'sudo make check` still hangs, probably on a different test this time. I am trying to get the backtraces with gdb but gdb also hangs ... I'll send you the new info once I have it! > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to sync
[jira] [Commented] (MESOS-10226) test suite hangs on ARM64
[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389717#comment-17389717 ] Martin Tzvetanov Grigorov commented on MESOS-10226: --- Hi Charles, I could reproduce the issue easily with `sudo {color:#22}./bin/mesos-tests.sh --gtest_filter=*NestedMesosCon{color}{color:#22}tainerizerTest*{color}`. Now I am re-building Mesos with your patch! I update this ticket in half an hour or so! > test suite hangs on ARM64 > - > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug >Reporter: Charles Natali >Assignee: Charles Natali >Priority: Major > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): > #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=, posix_mode=, prot=prot@entry=438, > read_write=8, is32not64=) at fileops.c:189 > #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 > #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0xd74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to synchronize with the task being > started, and if the task fails to start - in this case because we're trying > to launch an x86 container on an arm64 host - the test will just hang reading > from the pipe. > I send Martin a tentative fix for him to test, and I'll open an MR if > successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10226) test suite hangs on ARM64
Charles Natali created MESOS-10226: -- Summary: test suite hangs on ARM64 Key: MESOS-10226 URL: https://issues.apache.org/jira/browse/MESOS-10226 Project: Mesos Issue Type: Bug Reporter: Charles Natali Assignee: Charles Natali Reported by [~mgrigorov]. {noformat} [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace sh: 1: hadoop: not found Marked '/' as rslave I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on martin-arm64 I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event I0726 11:59:17.834415 36 executor.cpp:722] Starting task d1bbb266-bee7-4c9d-929f-16aa41f4e9cf I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 Preparing rootfs at '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' Changing root to /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 Failed to execute 'sh': Exec format error I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 (pid: 38) ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 03-00 00-00>) Expected: to be called twice Actual: called 3 times - over-saturated and active I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept loop{noformat} I asked him to provide a gdb traceback and we can see the following: {noformat} Thread 1 (Thread 0xa3bc2c60 (LWP 173475)): #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 "/tmp/7VXP3w/pipe", oflag=) at ../sysdeps/unix/sysv/linux/open64.c:48 #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, filename=, posix_mode=, prot=prot@entry=438, read_write=8, is32not64=) at fileops.c:189 #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e ntry=1) at fileops.c:281 #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75 #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at ../../3rdparty/stout/include/stout/os/read.hpp:136 #5 0xd74f1c1c in mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody (this=0xaaab00f88f50) at ../../src/tests/containeri zer/nested_mesos_containerizer_tests.cpp:1126 {noformat} Basically the test uses a named pipe to synchronize with the task being started, and if the task fails to start - in this case because we're trying to launch an x86 container on an arm64 host - the test will just hang reading from the pipe. I send Martin a tentative fix for him to test, and I'll open an MR if successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-9352) Data in persistent volume deleted accidentally when using Docker container and Persistent volume
[ https://issues.apache.org/jira/browse/MESOS-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384479#comment-17384479 ] Charles Natali commented on MESOS-9352: --- If it's fixed feel free to close! > Data in persistent volume deleted accidentally when using Docker container > and Persistent volume > > > Key: MESOS-9352 > URL: https://issues.apache.org/jira/browse/MESOS-9352 > Project: Mesos > Issue Type: Bug > Components: agent, containerization, docker >Affects Versions: 1.5.1, 1.5.2 > Environment: DCOS 1.11.6 > Mesos 1.5.2 >Reporter: David Ko >Assignee: Joseph Wu >Priority: Critical > Labels: dcos, dcos-1.11.6, mesosphere, persistent-volumes > Attachments: image-2018-10-24-22-20-51-059.png, > image-2018-10-24-22-21-13-399.png > > > Using docker image w/ persistent volume to start a service, it will cause > data in persistent volume deleted accidentally when task killed and > restarted, also old mount points not unmounted, even the service already > deleted. > *The expected result should be data in persistent volume kept until task > deleted completely, also dangling mount points should be unmounted correctly.* > > *Step 1:* Use below JSON config to create a Mysql server using Docker image > and Persistent Volume > {code:javascript} > { > "env": { > "MYSQL_USER": "wordpress", > "MYSQL_PASSWORD": "secret", > "MYSQL_ROOT_PASSWORD": "supersecret", > "MYSQL_DATABASE": "wordpress" > }, > "id": "/mysqlgc", > "backoffFactor": 1.15, > "backoffSeconds": 1, > "constraints": [ > [ > "hostname", > "IS", > "172.27.12.216" > ] > ], > "container": { > "portMappings": [ > { > "containerPort": 3306, > "hostPort": 0, > "protocol": "tcp", > "servicePort": 1 > } > ], > "type": "DOCKER", > "volumes": [ > { > "persistent": { > "type": "root", > "size": 1000, > "constraints": [] > }, > "mode": "RW", > "containerPath": "mysqldata" > }, > { > "containerPath": "/var/lib/mysql", > "hostPath": "mysqldata", > "mode": "RW" > } > ], > "docker": { > "image": "mysql", > "forcePullImage": false, > "privileged": false, > "parameters": [] > } > }, > "cpus": 1, > "disk": 0, > "instances": 1, > "maxLaunchDelaySeconds": 3600, > "mem": 512, > "gpus": 0, > "networks": [ > { > "mode": "container/bridge" > } > ], > "residency": { > "relaunchEscalationTimeoutSeconds": 3600, > "taskLostBehavior": "WAIT_FOREVER" > }, > "requirePorts": false, > "upgradeStrategy": { > "maximumOverCapacity": 0, > "minimumHealthCapacity": 0 > }, > "killSelection": "YOUNGEST_FIRST", > "unreachableStrategy": "disabled", > "healthChecks": [], > "fetch": [] > } > {code} > *Step 2:* Kill mysqld process to force rescheduling new Mysql task, but found > 2 mount points to the same persistent volume, it means old mount point did > not be unmounted immediately. > !image-2018-10-24-22-20-51-059.png! > *Step 3:* After GC, data in persistent volume was deleted accidentally, but > mysqld (Mesos task) still running > !image-2018-10-24-22-21-13-399.png! > *Step 4:* Delete Mysql service from Marathon, all mount points unable to > unmount, even the service already deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-9352) Data in persistent volume deleted accidentally when using Docker container and Persistent volume
[ https://issues.apache.org/jira/browse/MESOS-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384020#comment-17384020 ] Andreas Peters commented on MESOS-9352: --- Is this still a issue in the current mesos version? I tried to reproduce it in 1.11.0 but it's working as expected. > Data in persistent volume deleted accidentally when using Docker container > and Persistent volume > > > Key: MESOS-9352 > URL: https://issues.apache.org/jira/browse/MESOS-9352 > Project: Mesos > Issue Type: Bug > Components: agent, containerization, docker >Affects Versions: 1.5.1, 1.5.2 > Environment: DCOS 1.11.6 > Mesos 1.5.2 >Reporter: David Ko >Assignee: Joseph Wu >Priority: Critical > Labels: dcos, dcos-1.11.6, mesosphere, persistent-volumes > Attachments: image-2018-10-24-22-20-51-059.png, > image-2018-10-24-22-21-13-399.png > > > Using docker image w/ persistent volume to start a service, it will cause > data in persistent volume deleted accidentally when task killed and > restarted, also old mount points not unmounted, even the service already > deleted. > *The expected result should be data in persistent volume kept until task > deleted completely, also dangling mount points should be unmounted correctly.* > > *Step 1:* Use below JSON config to create a Mysql server using Docker image > and Persistent Volume > {code:javascript} > { > "env": { > "MYSQL_USER": "wordpress", > "MYSQL_PASSWORD": "secret", > "MYSQL_ROOT_PASSWORD": "supersecret", > "MYSQL_DATABASE": "wordpress" > }, > "id": "/mysqlgc", > "backoffFactor": 1.15, > "backoffSeconds": 1, > "constraints": [ > [ > "hostname", > "IS", > "172.27.12.216" > ] > ], > "container": { > "portMappings": [ > { > "containerPort": 3306, > "hostPort": 0, > "protocol": "tcp", > "servicePort": 1 > } > ], > "type": "DOCKER", > "volumes": [ > { > "persistent": { > "type": "root", > "size": 1000, > "constraints": [] > }, > "mode": "RW", > "containerPath": "mysqldata" > }, > { > "containerPath": "/var/lib/mysql", > "hostPath": "mysqldata", > "mode": "RW" > } > ], > "docker": { > "image": "mysql", > "forcePullImage": false, > "privileged": false, > "parameters": [] > } > }, > "cpus": 1, > "disk": 0, > "instances": 1, > "maxLaunchDelaySeconds": 3600, > "mem": 512, > "gpus": 0, > "networks": [ > { > "mode": "container/bridge" > } > ], > "residency": { > "relaunchEscalationTimeoutSeconds": 3600, > "taskLostBehavior": "WAIT_FOREVER" > }, > "requirePorts": false, > "upgradeStrategy": { > "maximumOverCapacity": 0, > "minimumHealthCapacity": 0 > }, > "killSelection": "YOUNGEST_FIRST", > "unreachableStrategy": "disabled", > "healthChecks": [], > "fetch": [] > } > {code} > *Step 2:* Kill mysqld process to force rescheduling new Mysql task, but found > 2 mount points to the same persistent volume, it means old mount point did > not be unmounted immediately. > !image-2018-10-24-22-20-51-059.png! > *Step 3:* After GC, data in persistent volume was deleted accidentally, but > mysqld (Mesos task) still running > !image-2018-10-24-22-21-13-399.png! > *Step 4:* Delete Mysql service from Marathon, all mount points unable to > unmount, even the service already deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-6285) Agents may OOM during recovery if there are too many tasks or executors
[ https://issues.apache.org/jira/browse/MESOS-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383897#comment-17383897 ] Andreas Peters commented on MESOS-6285: --- Is this still a issue or can we close it? :) > Agents may OOM during recovery if there are too many tasks or executors > --- > > Key: MESOS-6285 > URL: https://issues.apache.org/jira/browse/MESOS-6285 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Joseph Wu >Priority: Critical > Labels: foundations, mesosphere > > On an test cluster, we encountered a degenerate case where running the > example {{long-lived-framework}} for over a week would render the agent > un-recoverable. > The {{long-lived-framework}} creates one custom {{long-lived-executor}} and > launches a single task on that executor every time it receives an offer from > that agent. Over a week's worth of time, the framework manages to launch > some 400k tasks (short sleeps) on one executor. During runtime, this is not > problematic, as each completed task is quickly rotated out of the agent's > memory (and checkpointed to disk). > During recovery, however, the agent reads every single task into memory, > which leads to slow recovery; and often results in the agent being OOM-killed > before it finishes recovering. > To repro this condition quickly: > 1) Apply this patch to the {{long-lived-framework}}: > {code} > diff --git a/src/examples/long_lived_framework.cpp > b/src/examples/long_lived_framework.cpp > index 7c57eb5..1263d82 100644 > --- a/src/examples/long_lived_framework.cpp > +++ b/src/examples/long_lived_framework.cpp > @@ -358,16 +358,6 @@ private: >// Helper to launch a task using an offer. >void launch(const Offer& offer) >{ > -int taskId = tasksLaunched++; > -++metrics.tasks_launched; > - > -TaskInfo task; > -task.set_name("Task " + stringify(taskId)); > -task.mutable_task_id()->set_value(stringify(taskId)); > -task.mutable_agent_id()->MergeFrom(offer.agent_id()); > -task.mutable_resources()->CopyFrom(taskResources); > -task.mutable_executor()->CopyFrom(executor); > - > Call call; > call.set_type(Call::ACCEPT); > > @@ -380,7 +370,23 @@ private: > Offer::Operation* operation = accept->add_operations(); > operation->set_type(Offer::Operation::LAUNCH); > > -operation->mutable_launch()->add_task_infos()->CopyFrom(task); > +// Launch as many tasks as possible in the given offer. > +Resources remaining = Resources(offer.resources()).flatten(); > +while (remaining.contains(taskResources)) { > + int taskId = tasksLaunched++; > + ++metrics.tasks_launched; > + > + TaskInfo task; > + task.set_name("Task " + stringify(taskId)); > + task.mutable_task_id()->set_value(stringify(taskId)); > + task.mutable_agent_id()->MergeFrom(offer.agent_id()); > + task.mutable_resources()->CopyFrom(taskResources); > + task.mutable_executor()->CopyFrom(executor); > + > + operation->mutable_launch()->add_task_infos()->CopyFrom(task); > + > + remaining -= taskResources; > +} > > mesos->send(call); >} > {code} > 2) Run a master, agent, and {{long-lived-framework}}. On a 1 CPU, 1 GB agent > + this patch, it should take about 10 minutes to build up sufficient task > launches. > 3) Restart the agent and watch it flail during recovery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-8679) If the first KILL stuck in the default executor, all other KILLs will be ignored.
[ https://issues.apache.org/jira/browse/MESOS-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383894#comment-17383894 ] Andreas Peters commented on MESOS-8679: --- Is this still a issue or can we close it? :) > If the first KILL stuck in the default executor, all other KILLs will be > ignored. > - > > Key: MESOS-8679 > URL: https://issues.apache.org/jira/browse/MESOS-8679 > Project: Mesos > Issue Type: Bug > Components: executor >Reporter: Gilbert Song >Priority: Critical > Labels: default-executor, foundations > > If the first {{KILL}} call gets stuck in the default executor, all other > {{KILL}} requests will be ignored. It would make a particular task become > unkillable (stuck in {{TASK_KILLING}) forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-8608) RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
[ https://issues.apache.org/jira/browse/MESOS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383887#comment-17383887 ] Andreas Peters commented on MESOS-8608: --- Is this still a issue or can we close it? :) > RmdirContinueOnErrorTest.RemoveWithContinueOnError fails. > - > > Key: MESOS-8608 > URL: https://issues.apache.org/jira/browse/MESOS-8608 > Project: Mesos > Issue Type: Bug > Components: cmake >Affects Versions: 1.4.1, 1.8.0 > Environment: Docker 17.12.0 > Ubuntu 16.04, Ubuntu 18.04 >Reporter: Pierre-Louis Chevallier >Priority: Critical > Labels: flaky-test, foundations, newbie, test > > I'm trying to run mesos on docker and when i "make check", i have 1 test that > is failed, i followed all the requirements & instructions on mesos getting > started guide. The Failed test say > RmDirContinueOnErrorTest.RemoveWithContinuedOnError -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.
[ https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372510#comment-17372510 ] Martin Tzvetanov Grigorov commented on MESOS-10223: --- Running it! I will attach the logs once it finishes! > Crashes on ARM64 due to bad interaction of libunwind with libgcc. > -- > > Key: MESOS-10223 > URL: https://issues.apache.org/jira/browse/MESOS-10223 > Project: Mesos > Issue Type: Bug >Reporter: Martin Tzvetanov Grigorov >Assignee: Charles Natali >Priority: Major > Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, > mesos-on-arm64.tgz, sudo_make_check_output.txt > > > Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors: > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.NumberFormat > [ OK ] JsonTest.NumberFormat (0 ms) > [ RUN ] JsonTest.Find > terminate called after throwing an instance of > 'boost::exception_detail::clone_impl > >' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are > using GNU date *** > PC: @0x0 (unknown) > *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from > PID 2317; stack trace: *** > @ 0xa80e77fc ([vdso]+0x7fb) > @ 0xa7b71188 gsignal > @ 0xa7b5ddac abort > @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d715b0 __cxa_rethrow > @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d71544 __cxa_throw > @ 0xab4ee114 boost::throw_exception<>() > @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>() > @ 0xab5c2228 boost::lexical_cast<>() > @ 0xab5bf89c numify<>() > @ 0xab5e00e8 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5cdd2c JsonTest_Find_Test::TestBody() > @ 0xab886fec > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87f1d4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab85a9d0 testing::Test::Run() > @ 0xab85b258 testing::TestInfo::Run() > @ 0xab85b8d0 testing::TestCase::Run() > @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests() > @ 0xab888440 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87ffd4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab86100c testing::UnitTest::Run() > @ 0xab630950 RUN_ALL_TESTS() > @ 0xab630418 main > @ 0xa7b5e110 __libc_start_main > @ 0xab4b41d4 (unknown) > [FAIL]: 8 shard(s) have failed tests > make[6]: *** [Makefile:2092: check-local] Error 8 > make[6]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[5]: *** [Makefile:1840: check-am] Error 2 > make[5]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[4]: *** [Makefile:1685: check-recursive] Error 1 > make[4]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[3]: *** [Makefile:1842: check] Error 2 > make[3]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[2]: *** [Makefile:1153: check-recursive] Error 1 > make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make[1]: *** [Makefile:1306: check] Error 2 > make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make: *** [Makefile:785: check-recursive] Error 1 > {code} > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.InvalidUTF8 > [ OK ] JsonTest.InvalidUTF8 (0 ms) > [ RUN ] JsonTest.ParseError > terminate called after throwing an instance of 'std::overflow_error' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are > using GNU date *** > PC: @0x0 (unknown) > *** SIG
[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.
[ https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372475#comment-17372475 ] Charles Natali commented on MESOS-10223: It must be a different issue then. Could you run {noformat} # ./bin/mesos-tests.sh --verbose > mesos-tests.log 2>&1{noformat} And post the result? > Crashes on ARM64 due to bad interaction of libunwind with libgcc. > -- > > Key: MESOS-10223 > URL: https://issues.apache.org/jira/browse/MESOS-10223 > Project: Mesos > Issue Type: Bug >Reporter: Martin Tzvetanov Grigorov >Assignee: Charles Natali >Priority: Major > Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, > mesos-on-arm64.tgz, sudo_make_check_output.txt > > > Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors: > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.NumberFormat > [ OK ] JsonTest.NumberFormat (0 ms) > [ RUN ] JsonTest.Find > terminate called after throwing an instance of > 'boost::exception_detail::clone_impl > >' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are > using GNU date *** > PC: @0x0 (unknown) > *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from > PID 2317; stack trace: *** > @ 0xa80e77fc ([vdso]+0x7fb) > @ 0xa7b71188 gsignal > @ 0xa7b5ddac abort > @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d715b0 __cxa_rethrow > @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d71544 __cxa_throw > @ 0xab4ee114 boost::throw_exception<>() > @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>() > @ 0xab5c2228 boost::lexical_cast<>() > @ 0xab5bf89c numify<>() > @ 0xab5e00e8 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5cdd2c JsonTest_Find_Test::TestBody() > @ 0xab886fec > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87f1d4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab85a9d0 testing::Test::Run() > @ 0xab85b258 testing::TestInfo::Run() > @ 0xab85b8d0 testing::TestCase::Run() > @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests() > @ 0xab888440 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87ffd4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab86100c testing::UnitTest::Run() > @ 0xab630950 RUN_ALL_TESTS() > @ 0xab630418 main > @ 0xa7b5e110 __libc_start_main > @ 0xab4b41d4 (unknown) > [FAIL]: 8 shard(s) have failed tests > make[6]: *** [Makefile:2092: check-local] Error 8 > make[6]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[5]: *** [Makefile:1840: check-am] Error 2 > make[5]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[4]: *** [Makefile:1685: check-recursive] Error 1 > make[4]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[3]: *** [Makefile:1842: check] Error 2 > make[3]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[2]: *** [Makefile:1153: check-recursive] Error 1 > make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make[1]: *** [Makefile:1306: check] Error 2 > make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make: *** [Makefile:785: check-recursive] Error 1 > {code} > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.InvalidUTF8 > [ OK ] JsonTest.InvalidUTF8 (0 ms) > [ RUN ] JsonTest.ParseError > terminate called after throwing an instance of 'std::overflow_error' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are
[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.
[ https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372398#comment-17372398 ] Martin Tzvetanov Grigorov commented on MESOS-10223: --- [~cf.natali] *sudo make check* still hangs on the same step here. HEAD is: commit 72f16f68973bf7d2ce5c621539a21fc4eccfa56e Author: Charles-Francois Natali Date: Sat Jun 26 19:04:33 2021 +0100 Fixed a bug where timers wouldn't expire after `process:reinitialize`. > Crashes on ARM64 due to bad interaction of libunwind with libgcc. > -- > > Key: MESOS-10223 > URL: https://issues.apache.org/jira/browse/MESOS-10223 > Project: Mesos > Issue Type: Bug >Reporter: Martin Tzvetanov Grigorov >Assignee: Charles Natali >Priority: Major > Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, > mesos-on-arm64.tgz, sudo_make_check_output.txt > > > Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors: > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.NumberFormat > [ OK ] JsonTest.NumberFormat (0 ms) > [ RUN ] JsonTest.Find > terminate called after throwing an instance of > 'boost::exception_detail::clone_impl > >' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are > using GNU date *** > PC: @0x0 (unknown) > *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from > PID 2317; stack trace: *** > @ 0xa80e77fc ([vdso]+0x7fb) > @ 0xa7b71188 gsignal > @ 0xa7b5ddac abort > @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d715b0 __cxa_rethrow > @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d71544 __cxa_throw > @ 0xab4ee114 boost::throw_exception<>() > @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>() > @ 0xab5c2228 boost::lexical_cast<>() > @ 0xab5bf89c numify<>() > @ 0xab5e00e8 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5cdd2c JsonTest_Find_Test::TestBody() > @ 0xab886fec > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87f1d4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab85a9d0 testing::Test::Run() > @ 0xab85b258 testing::TestInfo::Run() > @ 0xab85b8d0 testing::TestCase::Run() > @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests() > @ 0xab888440 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87ffd4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab86100c testing::UnitTest::Run() > @ 0xab630950 RUN_ALL_TESTS() > @ 0xab630418 main > @ 0xa7b5e110 __libc_start_main > @ 0xab4b41d4 (unknown) > [FAIL]: 8 shard(s) have failed tests > make[6]: *** [Makefile:2092: check-local] Error 8 > make[6]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[5]: *** [Makefile:1840: check-am] Error 2 > make[5]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[4]: *** [Makefile:1685: check-recursive] Error 1 > make[4]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[3]: *** [Makefile:1842: check] Error 2 > make[3]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[2]: *** [Makefile:1153: check-recursive] Error 1 > make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make[1]: *** [Makefile:1306: check] Error 2 > make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make: *** [Makefile:785: check-recursive] Error 1 > {code} > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.InvalidUTF8 > [ OK ] JsonTest.InvalidUTF8 (0 ms) > [ RUN ] JsonTest.ParseError > terminate called after throwing an instance of 'std
[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.
[ https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371617#comment-17371617 ] Charles Natali commented on MESOS-10223: [~mgrigorov] The hang should be fixed in master - it'd be great if you could give it a try. > Crashes on ARM64 due to bad interaction of libunwind with libgcc. > -- > > Key: MESOS-10223 > URL: https://issues.apache.org/jira/browse/MESOS-10223 > Project: Mesos > Issue Type: Bug >Reporter: Martin Tzvetanov Grigorov >Assignee: Charles Natali >Priority: Major > Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, > mesos-on-arm64.tgz, sudo_make_check_output.txt > > > Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors: > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.NumberFormat > [ OK ] JsonTest.NumberFormat (0 ms) > [ RUN ] JsonTest.Find > terminate called after throwing an instance of > 'boost::exception_detail::clone_impl > >' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are > using GNU date *** > PC: @0x0 (unknown) > *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from > PID 2317; stack trace: *** > @ 0xa80e77fc ([vdso]+0x7fb) > @ 0xa7b71188 gsignal > @ 0xa7b5ddac abort > @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d715b0 __cxa_rethrow > @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler() > @ 0xa7d711ec (unknown) > @ 0xa7d71250 std::terminate() > @ 0xa7d71544 __cxa_throw > @ 0xab4ee114 boost::throw_exception<>() > @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>() > @ 0xab5c2228 boost::lexical_cast<>() > @ 0xab5bf89c numify<>() > @ 0xab5e00e8 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5e0584 JSON::Object::find<>() > @ 0xab5cdd2c JsonTest_Find_Test::TestBody() > @ 0xab886fec > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87f1d4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab85a9d0 testing::Test::Run() > @ 0xab85b258 testing::TestInfo::Run() > @ 0xab85b8d0 testing::TestCase::Run() > @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests() > @ 0xab888440 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0xab87ffd4 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0xab86100c testing::UnitTest::Run() > @ 0xab630950 RUN_ALL_TESTS() > @ 0xab630418 main > @ 0xa7b5e110 __libc_start_main > @ 0xab4b41d4 (unknown) > [FAIL]: 8 shard(s) have failed tests > make[6]: *** [Makefile:2092: check-local] Error 8 > make[6]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[5]: *** [Makefile:1840: check-am] Error 2 > make[5]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[4]: *** [Makefile:1685: check-recursive] Error 1 > make[4]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[3]: *** [Makefile:1842: check] Error 2 > make[3]: Leaving directory > '/home/ubuntu/git/apache/mesos/build/3rdparty/stout' > make[2]: *** [Makefile:1153: check-recursive] Error 1 > make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make[1]: *** [Makefile:1306: check] Error 2 > make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty' > make: *** [Makefile:785: check-recursive] Error 1 > {code} > > {code:java} > [--] 3 tests from JsonTest > [ RUN ] JsonTest.InvalidUTF8 > [ OK ] JsonTest.InvalidUTF8 (0 ms) > [ RUN ] JsonTest.ParseError > terminate called after throwing an instance of 'std::overflow_error' > terminate called recursively > *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are > using GNU date *** > PC: @
[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371513#comment-17371513 ] Andreas Peters commented on MESOS-10225: I open a PR: https://github.com/apache/mesos/pull/398 > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371166#comment-17371166 ] Andreas Peters commented on MESOS-10225: I think so too. Is more visible. Then I will create a section. :) > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370808#comment-17370808 ] Charles Natali commented on MESOS-10225: Good question - I think having a dedicated section might be better, maybe "Interaction with systemd" or something like that? > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370464#comment-17370464 ] Andreas Peters commented on MESOS-10225: Where would be the reference more visible. As notice "--[no]-cgroups_cpu_enable_pids_and_tids_count" or maybe it would be better to create a sub header like "Notice" where we add these kind of information? > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370208#comment-17370208 ] Charles Natali commented on MESOS-10225: Thanks Andreas, that'd be great - hopefully will avoid some surprises to users. > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369850#comment-17369850 ] Andreas Peters commented on MESOS-10225: I will do it. I can also add the "delegate" into the systemd scripts of the mesos-agent. > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
[ https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Peters reassigned MESOS-10225: -- Assignee: Andreas Peters > mention that systemd agent unit should have Delegate=yes > > > Key: MESOS-10225 > URL: https://issues.apache.org/jira/browse/MESOS-10225 > Project: Mesos > Issue Type: Documentation >Reporter: Charles Natali >Assignee: Andreas Peters >Priority: Major > > If managed by systemd, the agent unit should have > [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] > to prevent systemd from manipulating cgroups created by the agent, which can > break things quite badly. > See for example https://issues.apache.org/jira/browse/MESOS-3488 and > https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it > causes. > I think it's quite important and should figure in good place in the > documentation, maybe in the agent configuration page > [http://mesos.apache.org/documentation/latest/configuration/agent/] ? > > [~surahman] or [~apeters] if either one of you wants to have a look at it, I > think it's important that at least someone is familiar with the documentation > part. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up
[ https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369759#comment-17369759 ] Subhajit Palit commented on MESOS-9950: --- Thanks for that option [~cf.natali] - I will try it out further and share my observation. > memory cgroup gone before isolator cleaning up > -- > > Key: MESOS-9950 > URL: https://issues.apache.org/jira/browse/MESOS-9950 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: longfei >Priority: Major > > The memcg created by mesos may have been deleted before cgroup/memory > isolator cleaning up. > This would let the termination fail and lose information in the old > termination(before fail). > {code:java} > I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > for user 'tiger' > I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources > [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] > in work directory > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89' > I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor > 'mt:z03584687:1' of framework > 8e4967e5-736e-4a22-90c3-7b32d526914d- > I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to > PREPARING > I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM > events for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079540 3354804 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container > a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus > 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' > to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89 > I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module > finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; > IOSwitchboard server is not required > I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container > a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces > I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing > container's forked pid 1857418 to > '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid' > I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING > I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING > I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state > of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING > I0821 15:16:03.197753 3354808 memory.cpp:198] Updated > 'memory.soft_limit_in_bytes' to 4032MB for container >
[jira] [Created] (MESOS-10225) mention that systemd agent unit should have Delegate=yes
Charles Natali created MESOS-10225: -- Summary: mention that systemd agent unit should have Delegate=yes Key: MESOS-10225 URL: https://issues.apache.org/jira/browse/MESOS-10225 Project: Mesos Issue Type: Documentation Reporter: Charles Natali If managed by systemd, the agent unit should have [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=] to prevent systemd from manipulating cgroups created by the agent, which can break things quite badly. See for example https://issues.apache.org/jira/browse/MESOS-3488 and https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it causes. I think it's quite important and should figure in good place in the documentation, maybe in the agent configuration page [http://mesos.apache.org/documentation/latest/configuration/agent/] ? [~surahman] or [~apeters] if either one of you wants to have a look at it, I think it's important that at least someone is familiar with the documentation part. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MESOS-10129) Build fails on Maven javadoc generation when using JDK11
[ https://issues.apache.org/jira/browse/MESOS-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369026#comment-17369026 ] Saad Ur Rahman edited comment on MESOS-10129 at 6/24/21, 6:46 PM: -- I am on it. I just ran a clean build of Mesos from the updated _upstream:main_ without issues. *OS:* Ubuntu 21.04 *Javac:* openjdk-11-jdk-headless:amd64: /usr/lib/jvm/java-11-openjdk-amd64/bin/javac *Java:* openjdk-11-jre-headless:amd64: /usr/lib/jvm/java-11-openjdk-amd64/bin/java Sorry, I am still a bit of a newbie with Mesos, is there a specific build command I should run to try and replicate this? It might be a Debian-specific issue. was (Author: surahman): I am on it. > Build fails on Maven javadoc generation when using JDK11 > > > Key: MESOS-10129 > URL: https://issues.apache.org/jira/browse/MESOS-10129 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: master, 1.10.0 > Environment: Debian 10 Buster (2020-04-29) with OpenJdk 11.0.7 > (2020-04-14) >Reporter: Carlos Saltos >Priority: Major > Labels: Java11, beginner, build, java11, jdk11 > Attachments: mesos.10.0.maven.javadoc.fix.patch > > > h3. CURRENT BEHAVIOR: > When using Java 11 (or newer versions) the Javadoc generation step fails with > the error: > {{[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar > (build-and-attach-javadocs) on project mesos: MavenReportException: Error > while creating archive:}} > {{[ERROR] Exit code: 1 - javadoc: error - The code being documented uses > modules but the packages defined in > http://download.oracle.com/javase/6/docs/api/ are in the unnamed module.}} > {{[ERROR]}} > {{[ERROR] Command line was: /usr/lib/jvm/java-11-openjdk-amd64/bin/javadoc > @options}} > {{[ERROR]}} > {{[ERROR] Refer to the generated Javadoc files in > '/home/admin/mesos-deb-packaging/mesos-repo/build/src/java/target/apidocs' > dir.}} > {{[ERROR] -> [Help 1]}} > {{[ERROR]}} > {{[ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch.}} > {{[ERROR] Re-run Maven using the -X switch to enable full debug logging.}} > {{[ERROR]}} > {{[ERROR] For more information about the errors and possible solutions, > please read the following articles:}} > {{[ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException}} > {{make[1]: *** [Makefile:17533: java/target/mesos-1.11.0.jar] Error 1}} > {{make[1]: Leaving directory > '/home/admin/mesos-deb-packaging/mesos-repo/build/src'}} > {{make: *** [Makefile:785: all-recursive] Error 1}} > *NOTE:* The error is at the Maven javadoc plugin call when it tries to > include references to the non-existant old Java 6 documentation. > h3. POSSIBLE SOLUTION: > Just remove the old reference with adding > false to the javadoc maven plugin > configuration section -- This message was sent by Atlassian Jira (v8.3.4#803005)