[jira] [Created] (MESOS-10244) [MSVC] Mesos failed due to cannot open file 'libevent_pthreads.lib' and 'libevent_openssl.lib'

2024-09-24 Thread QuellaZhang (Jira)
QuellaZhang created MESOS-10244:
---

 Summary: [MSVC] Mesos failed due to cannot open file 
'libevent_pthreads.lib' and 'libevent_openssl.lib'
 Key: MESOS-10244
 URL: https://issues.apache.org/jira/browse/MESOS-10244
 Project: Mesos
  Issue Type: Bug
 Environment: VS2022 17.11.3 + windows
Reporter: QuellaZhang


Hi All,

We tried to build latest source code of Mesos on VS2022. It failed due to 
cannot open file 'libevent_pthreads.lib' and 'libevent_openssl.lib'. It can be 
reproduced on latest commit c1b42f7 on master branch. Could you please take a 
look at this isssue? Thanks a lot!

*Reproduce steps:*

# git clone https://github.com/apache/mesos F:\gitP\apache\mesos
# Open a VS 2022 x64 command prompt as admin and browse to F:\gitP\apache\mesos
# mkdir build_amd64 && pushd build_amd64
# set OPENSSL_ROOT_DIR=F:\Microsoft\vcpkg\installed\x64-windows
# cmake -G "Visual Studio 17 2022" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.22621.0  
-DENABLE_LIBEVENT=1 -DENABLE_SSL=1 -DHAS_AUTHENTICATION=0 
-DPATCHEXE_PATH="C:\Program Files\Git\usr\bin" -T host=x64 ..
# msbuild /m /p:Platform=x64 /p:Configuration=Debug Mesos.sln /t:Rebuild

*Error:*
"F:\gitP\apache\mesos\build_amd64\src\tests\mesos-tests.vcxproj.metaproj" 
(Rebuild target) (53) ->
"F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj.metaproj" 
(Rebuild target) (54) ->
"F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj" (Rebuild 
target) (104) ->
LINK : fatal error LNK1104: cannot open file 'libevent_pthreads.lib' 
[F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj]
LINK : fatal error LNK1104: cannot open file 'libevent_pthreads.lib' 
[F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj]
"F:\gitP\apache\mesos\build_amd64\Mesos.sln" (Rebuild target) (1) ->
"F:\gitP\apache\mesos\build_amd64\src\tests\mesos-tests.vcxproj.metaproj" 
(Rebuild target) (52) ->
"F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj.metaproj" 
(Rebuild target) (53) ->
"F:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj" (Rebuild 
target) (103) ->
LINK : fatal error LNK1104: cannot open file 'libevent_openssl.lib' 
[C:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj]
LINK : fatal error LNK1104: cannot open file 'libevent_openssl.lib' 
[C:\gitP\apache\mesos\build_amd64\src\tests\test-helper.vcxproj]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.

2024-07-15 Thread Jason Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866189#comment-17866189
 ] 

Jason Zhou edited comment on MESOS-10243 at 7/15/24 10:26 PM:
--

Landed fix for peer link MAC setting on creation:
{code:java}
commit 70d9da223a204dc89431aa7b26db7beeb53a9f3c
Author: Jason Zhou 
Date:   Mon Jul 15 18:02:54 2024 -0400    [veth] Provide the ability to set 
veth peer link MAC address on creation.
    This addresses the previous todo where we want to set the MAC address
    of the peer link when we are creating a veth pair so that we can avoid
    the race condition we are racing against udev to see who will set the
    MAC address of the interface last.
    See: https://reviews.apache.org/r/75087/
    See: https://issues.apache.org/jira/browse/MESOS-10243    Review: 
https://reviews.apache.org/r/75090/ {code}
 

 


was (Author: JIRAUSER305893):
Landed fix for peer link MAC setting on creation:


{code:java}
commit 70d9da223a204dc89431aa7b26db7beeb53a9f3c
Author: Jason Zhou 
Date:   Mon Jul 15 18:02:54 2024 -0400    [veth] Provide the ability to set 
veth peer link MAC address on creation.
    This addresses the previous todo where we want to set the MAC address
    of the peer link when we are creating a veth pair so that we can avoid
    the race condition we are racing against udev to see who will set the
    MAC address of the interface last.    See: 
https://reviews.apache.org/r/75087/
    See: https://issues.apache.org/jira/browse/MESOS-10243    Review: 
https://reviews.apache.org/r/75090/ {code}
 

 

> MAC Address changes from link::setMAC may not stick, leading to container 
> launch failure with port mapping isolator.
> 
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Jason Zhou
>Assignee: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.

2024-07-15 Thread Jason Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866189#comment-17866189
 ] 

Jason Zhou commented on MESOS-10243:


Landed fix for peer link MAC setting on creation:


{code:java}
commit 70d9da223a204dc89431aa7b26db7beeb53a9f3c
Author: Jason Zhou 
Date:   Mon Jul 15 18:02:54 2024 -0400    [veth] Provide the ability to set 
veth peer link MAC address on creation.
    This addresses the previous todo where we want to set the MAC address
    of the peer link when we are creating a veth pair so that we can avoid
    the race condition we are racing against udev to see who will set the
    MAC address of the interface last.    See: 
https://reviews.apache.org/r/75087/
    See: https://issues.apache.org/jira/browse/MESOS-10243    Review: 
https://reviews.apache.org/r/75090/ {code}
 

 

> MAC Address changes from link::setMAC may not stick, leading to container 
> launch failure with port mapping isolator.
> 
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Jason Zhou
>Assignee: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.

2024-07-15 Thread Jason Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074
 ] 

Jason Zhou edited comment on MESOS-10243 at 7/15/24 5:31 PM:
-

Update:

We have discovered that in systems with systemd version above 242, there is a 
potential data race where udev will try to update the MAC address of the device 
at the same time as us if the systemd's MacAddressPolicy is set to 
'persistent'. To prevent udev from trying to set the veth device's MAC address 
by itself, we must set the device MAC address on creation so that 
addr_assign_type will be set to NET_ADDR_SET, which prevents udev from 
attempting to change the MAC address of the veth device.
The reason we do not see this issue on CentOS 7 systems is because CentOS 7 
uses systemd 219 and CentOS 9 is using 255, so it does not have the default 
MacAddressPolicy.
see: 
[https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6]
see: 
[https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/]

Patch for avoiding race condition for veth link: 
[https://reviews.apache.org/r/75086/] 
Todo: also avoid race condition for the created peer link: 
[https://reviews.apache.org/r/75087/] 


was (Author: JIRAUSER305893):
Update:

We have discovered that in systems with systemd version above 242, there is a 
potential data race where udev will try to update the MAC address of the device 
at the same time as us if the systemd's MacAddressPolicy is set to 
'persistent'. To prevent udev from trying to set the veth device's MAC address 
by itself, we must set the device MAC address on creation so that 
addr_assign_type will be set to NET_ADDR_SET, which prevents udev from 
attempting to change the MAC address of the veth device. 
see: 
[https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6]
see: 
[https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/]

Patch for avoiding race condition for veth link: 
[https://reviews.apache.org/r/75086/] 
Todo: also avoid race condition for the created peer link: 
[https://reviews.apache.org/r/75087/] 

> MAC Address changes from link::setMAC may not stick, leading to container 
> launch failure with port mapping isolator.
> 
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Jason Zhou
>Assignee: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10243) MAC Address changes from link::setMAC may not stick, leading to container launch failure with port mapping isolator.

2024-07-15 Thread Benjamin Mahler (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866134#comment-17866134
 ] 

Benjamin Mahler commented on MESOS-10243:
-

Landed fix for host network namespace veth interface.

Let's leave this open and mark as fixed once we also set the container network 
namespace eth0 interface's mac address on creation / update the script to stop 
setting it.

> MAC Address changes from link::setMAC may not stick, leading to container 
> launch failure with port mapping isolator.
> 
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Jason Zhou
>Assignee: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (MESOS-10243) MAC Address changes from link::setMAC may not stick

2024-07-15 Thread Jason Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Zhou reassigned MESOS-10243:
--

Assignee: Jason Zhou

> MAC Address changes from link::setMAC may not stick
> ---
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jason Zhou
>Assignee: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (MESOS-10243) MAC Address changes from link::setMAC may not stick

2024-07-15 Thread Jason Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074
 ] 

Jason Zhou edited comment on MESOS-10243 at 7/15/24 1:43 PM:
-

Update:

We have discovered that in systems with systemd version above 242, there is a 
potential data race where udev will try to update the MAC address of the device 
at the same time as us if the systemd's MacAddressPolicy is set to 
'persistent'. To prevent udev from trying to set the veth device's MAC address 
by itself, we must set the device MAC address on creation so that 
addr_assign_type will be set to NET_ADDR_SET, which prevents udev from 
attempting to change the MAC address of the veth device. 
see: 
[https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6]
see: 
[https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/]

Patch for avoiding race condition for veth link: 
[https://reviews.apache.org/r/75086/] 
Todo: also avoid race condition for the created peer link: 
[https://reviews.apache.org/r/75087/] 


was (Author: JIRAUSER305893):
Update:

We have discovered that in systems with systemd version above 242, there is a 
potential data race where udev will try to update the MAC address of the device 
at the same time as us if the systemd's MacAddressPolicy is set to 
'persistent'. To prevent udev from trying to set the veth device's MAC address 
by itself, we must set the device MAC address on creation so that 
addr_assign_type will be set to NET_ADDR_SET, which prevents udev from 
attempting to change the MAC address of the veth device. 
see: 
[https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6]
see: 
[https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/]

Patch for avoiding race condition: [https://reviews.apache.org/r/75086/] 
Todo: also avoid race condition for the created peer link: 
[https://reviews.apache.org/r/75087/] 

> MAC Address changes from link::setMAC may not stick
> ---
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10243) MAC Address changes from link::setMAC may not stick

2024-07-15 Thread Jason Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074
 ] 

Jason Zhou commented on MESOS-10243:


Update:

We have discovered that in systems with systemd version above 242, there is a 
potential data race where udev will try to update the MAC address of the device 
at the same time as us if the systemd's MacAddressPolicy is set to 
'persistent'. To prevent udev from trying to set the veth device's MAC address 
by itself, we must set the device MAC address on creation so that 
addr_assign_type will be set to NET_ADDR_SET, which prevents udev from 
attempting to change the MAC address of the veth device. 
see: 
[https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6]
see: 
[https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/]

Patch for avoiding race condition: [https://reviews.apache.org/r/75086/] 
Todo: also avoid race condition for the created peer link: 
[https://reviews.apache.org/r/75087/] 

> MAC Address changes from link::setMAC may not stick
> ---
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jason Zhou
>Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate 
> with agents as the MAC addresses are set incorrectly, leading to dropped 
> packets. A workaround for this behavior is to check that the MAC address is 
> set correctly after the ioctl call, and retry the address setting if 
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue, 
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
>    address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
>    other even before attempting to set our desired MAC address. As such, we
>    check that the results agree before we set, and log a warning if we find
>    a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
>    a garbage value after setMAC has already completed and checked that the
>    mac address was set correctly. Since this error happens after this
>    function has finished, we cannot log nor detect it in setMAC. Our 
> workaround cannot     deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
>    We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
>    issue.
>    CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-9045) LogZooKeeperTest.WriteRead can segfault

2024-07-02 Thread Benjamin Mahler (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862647#comment-17862647
 ] 

Benjamin Mahler commented on MESOS-9045:


Very different case, but also a segfault:

{noformat}
[--] 2 tests from LogZooKeeperTest
I0703 02:50:24.968773 185149 zookeeper.cpp:82] Using Java classpath: 
-Djava.class.path=/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/zookeeper-3.4.8.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/log4j-1.2.16.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/jline-0.9.94.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/slf4j-log4j12-1.6.1.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/netty-3.7.0.Final.jar:/tmp/SRC/build/mesos-1.12.0/_build/sub/3rdparty/zookeeper-3.4.8/lib/slf4j-api-1.6.1.jar
[ RUN  ] LogZooKeeperTest.WriteRead
I0703 02:50:25.058761 185149 jvm.cpp:590] Looking up method 
(Ljava/lang/String;)V
I0703 02:50:25.059170 185149 jvm.cpp:590] Looking up method deleteOnExit()V
I0703 02:50:25.060112 185149 jvm.cpp:590] Looking up method 
(Ljava/io/File;Ljava/io/File;)V
log4j:WARN No appenders could be found for logger 
(org.apache.zookeeper.server.persistence.FileTxnSnapLog).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
I0703 02:50:25.206512 185149 jvm.cpp:590] Looking up method ()V
I0703 02:50:25.207417 185149 jvm.cpp:590] Looking up method 
(Lorg/apache/zookeeper/server/persistence/FileTxnSnapLog;Lorg/apache/zookeeper/server/ZooKeeperServer$DataTreeBuilder;)V
*** Aborted at 1719975025 (unix time) try "date -d @1719975025" if you are 
using GNU date ***
PC: @ 0x7f5edf914ccd OopStorage::Block::release_entries()
*** SIGSEGV (@0x238) received by PID 185149 (TID 0x7f5f5a15cb40) from PID 568; 
stack trace: ***
@ 0x7f5edf923929 os::Linux::chained_handler()
@ 0x7f5edf92963b JVM_handle_linux_signal
@ 0x7f5edf91c1dc signalHandler()
@ 0x7f5f5b7af420 (unknown)
@ 0x7f5edf914ccd OopStorage::Block::release_entries()
@ 0x7f5edf914f26 OopStorage::release()
@ 0x7f5edf617b21 jni_DeleteGlobalRef
@ 0x7f5f6aeffaf2 JNIEnv_::DeleteGlobalRef()
@ 0x7f5f6aefdc3a Jvm::deleteGlobalRef()
@ 0x55f0ed05a2ea Jvm::Object::~Object()
@ 0x55f0ed05f110 
org::apache::zookeeper::server::ZooKeeperServer::DataTreeBuilder::~DataTreeBuilder()
@ 0x55f0ed061a14 
org::apache::zookeeper::server::ZooKeeperServer::BasicDataTreeBuilder::~BasicDataTreeBuilder()
@ 0x55f0ed05da2f 
mesos::internal::tests::ZooKeeperTestServer::ZooKeeperTestServer()
@ 0x55f0eba05ef6 mesos::internal::tests::ZooKeeperTest::ZooKeeperTest()
@ 0x55f0eba0823f 
mesos::internal::tests::LogZooKeeperTest::LogZooKeeperTest()
@ 0x55f0eba08350 
mesos::internal::tests::LogZooKeeperTest_WriteRead_Test::LogZooKeeperTest_WriteRead_Test()
@ 0x55f0eba73252 testing::internal::TestFactoryImpl<>::CreateTest()
@ 0x55f0ed0a42bc 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x55f0ed09d88d 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x55f0ed079ac5 testing::TestInfo::Run()
@ 0x55f0ed07a1cd testing::TestCase::Run()
@ 0x55f0ed081567 testing::internal::UnitTestImpl::RunAllTests()
@ 0x55f0ed0a54ea 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x55f0ed09e3f3 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x55f0ed08007f testing::UnitTest::Run()
@ 0x55f0eba90e05 RUN_ALL_TESTS()
@ 0x55f0eba907cc main
@ 0x7f5f5b5cd083 __libc_start_main
@ 0x55f0eaad675e _start
{noformat}


> LogZooKeeperTest.WriteRead can segfault
> ---
>
> Key: MESOS-9045
> URL: https://issues.apache.org/jira/browse/MESOS-9045
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.1
> Environment: macOS
>Reporter: Jan Schlicht
>Priority: Major
>  Labels: flaky-test, segfault
>
> The following segfault occured when testing the {{1.5.x}} branch (SHA 
> {{64341865d}}) on macOS:
> {noformat}
> [ RUN  ] LogZooKeeperTest.WriteRead
> I0702 00:49:46.259831 2560127808 jvm.cpp:590] Looking up method 
> (Ljava/lang/String;)V
> I0702 00:49:46.260002 2560127808 jvm.cpp:590] Looking up method 
> deleteOnExit()V
> I0702 00:49:46.260550 2560127808 jvm.cpp:590] Looking up method 
> (Ljava/io/File;Ljava/io/File;)V
> log4j:WARN No appenders could be found for logger 
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog).
> log4j:WA

[jira] [Assigned] (MESOS-8867) CMake: Bundled libevent v2.1.5-beta doesn't compile with OpenSSL 1.1.0

2024-07-02 Thread Benjamin Mahler (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-8867:
--

Assignee: Jason Zhou

> CMake: Bundled libevent v2.1.5-beta doesn't compile with OpenSSL 1.1.0
> --
>
> Key: MESOS-8867
> URL: https://issues.apache.org/jira/browse/MESOS-8867
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
> Environment: Fedora 28 with OpenSSL 1.1.0h, {{cmake -G Ninja -D 
> ENABLE_LIBEVENT=ON -D ENABLE_SSL=ON}}
>Reporter: Jan Schlicht
>Assignee: Jason Zhou
>Priority: Major
>
> Compiling libevent 2.1.5 beta with OpenSSL 1.1.0 fails with errors like
> {noformat}
> /home/vagrant/mesos/build/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c:
>  In function ‘bio_bufferevent_new’:
> /home/vagrant/mesos/build/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c:112:3:
>  error: dereferencing pointer to incomplete type ‘BIO’ {aka ‘struct bio_st’}
>   b->init = 0;
>^~
> {noformat}
> As this is the version currently bundled by CMake, builds with 
> {{ENABLE_LIBEVENT=ON, ENABLE_SSL=ON}} will fail to compile.
> Libevent supports OpenSSL 1.1.0 beginning with v2.1.7-rc (see 
> https://github.com/libevent/libevent/pull/397) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (MESOS-10243) MAC Address changes from link::setMAC may not stick

2024-06-19 Thread Jason Zhou (Jira)
Jason Zhou created MESOS-10243:
--

 Summary: MAC Address changes from link::setMAC may not stick
 Key: MESOS-10243
 URL: https://issues.apache.org/jira/browse/MESOS-10243
 Project: Mesos
  Issue Type: Bug
Reporter: Jason Zhou


It seems that there are scenarios where mesos containers cannot communicate 
with agents as the MAC addresses are set incorrectly, leading to dropped 
packets. A workaround for this behavior is to check that the MAC address is set 
correctly after the ioctl call, and retry the address setting if necessary.
In our test, this workaround appears to reduce the frequency of this issue, but 
does not seem to prevent all such failures.

Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/] 

Observed scenarios with incorrectly assigned MAC addresses:
1. ioctl returns the correct MAC address, but not net::mac
2. both net::mac and ioctl return the same MAC address, but are both wrong
3. There are no cases where ioctl/net::mac come back with the same MAC
   address as before setting. i.e. there is no no-op observed.
4. There is a possibility that ioctl/net::mac results disagree with each
   other even before attempting to set our desired MAC address. As such, we
   check that the results agree before we set, and log a warning if we find
   a mismatch
5. There is a possibility that the MAC address we set ends up overwritten by
   a garbage value after setMAC has already completed and checked that the
   mac address was set correctly. Since this error happens after this
   function has finished, we cannot log nor detect it in setMAC. Our workaround 
cannot     deal with this scenario as it occurs outside setMAC
Notes:
1. We have observed this behavior only on CentOS 9 systems at the moment,
   We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
   issue.
   CentOS 7 systems do not seem to have this issue with setMAC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (MESOS-10242) [MSVC] Mesos failed to build due to error C2039: 'cgroupsV2': is not a member of 'mesos::internal::tests::ContainerizerTest'

2024-05-10 Thread Zhaojun (Jira)
Zhaojun created MESOS-10242:
---

 Summary: [MSVC] Mesos failed to build due to error C2039: 
'cgroupsV2': is not a member of 
'mesos::internal::tests::ContainerizerTest'
 Key: MESOS-10242
 URL: https://issues.apache.org/jira/browse/MESOS-10242
 Project: Mesos
  Issue Type: Bug
 Environment: The commit of Mesos we used: 5d6d386

VS version: VS 2022 17.9.5

OS: Windows Server 2022

 
Reporter: Zhaojun
 Attachments: image-2024-05-11-09-50-42-947.png

Mesos failed to build with MSVC on Windows due to below errors:

src\tests\mesos.cpp(868,52): error C2039: 'cgroupsV2': is not a member of 
'mesos::internal::tests::ContainerizerTest'

src\tests\containerizer\memory_isolator_tests.cpp(155,37): error C2653: 
'cgroups': is not a class or namespace name.

 

Repro steps:
 # git clone [https://github.com/apache/mesos] C:\gitP\apache
 # set _CL_=/D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING 
/D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /wd4061
 # mkdir C:\gitP\apache\mesos\build_amd64 and cd /d 
C:\gitP\apache\mesos\build_amd64
 # set PATH=C:\Program Files\Git\usr\bin;%PATH%
 # cmake -G "Visual Studio 17 2022" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.22621.0 
-DENABLE_LIBEVENT=1 -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="C:\Program 
Files\Git\usr\bin" -T host=x64 ..
 # msbuild /maxcpucount:4 /p:Platform=x64 /p:Configuration=Debug Mesos.sln 
/t:Rebuild

 

Note:

I tried to add '#ifdef __linux__' and '#endif' to file 
[https://github.com/apache/mesos/blob/master/src/tests/containerizer/memory_isolator_tests.cpp#L153:L167]
 to wrap lines 153-167 and update file 
[https://github.com/apache/mesos/blob/master/src/tests/mesos.cpp], move the 
code '#endif // __linux__' on line 865 to line 880, the errors disappeared.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-7685) Issue using S3FS from docker container with the mesos containerizer

2024-04-23 Thread Mykel Alvis (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840095#comment-17840095
 ] 

Mykel Alvis commented on MESOS-7685:


In your run command:
{{docker run --cap-add SYS_ADMIN --device /dev/fuse}} blahblahblah

> Issue using S3FS from docker container with the mesos containerizer
> ---
>
> Key: MESOS-7685
> URL: https://issues.apache.org/jira/browse/MESOS-7685
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.1.0
>Reporter: Andrei Filip
>Assignee: Jie Yu
>Priority: Major
>
> I have a docker image which uses S3FS to mount an amazon S3 bucket for use as 
> a local filesystem. Playing around with this container manually, using 
> docker, i am able to use S3FS as expected.
> When trying to use this image with the mesos containerizer, i get the 
> following error:
> fuse: device not found, try 'modprobe fuse' first
> The way i'm launching a job that runs this s3fs command is via the aurora 
> scheduler. Somehow it seems that docker is able to use the fuse kernel 
> plugin, but the mesos containerizer does not.
> I've also created a stackoverflow topic about this issue here: 
> https://stackoverflow.com/questions/44569238/using-s3fs-in-a-docker-container-ran-by-the-mesos-containerizer/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-7187) Master can neglect to update agent metadata in a re-registration corner case.

2024-04-22 Thread Benjamin Mahler (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839811#comment-17839811
 ] 

Benjamin Mahler commented on MESOS-7187:


Added a mitigation of the bug I commented on above: 
https://github.com/apache/mesos/pull/558
It does not fix the overall issue here due to a lack of a connection construct, 
but it prevents the agent from getting stuck sending TASK_DROPPED for all 
incoming tasks.

> Master can neglect to update agent metadata in a re-registration corner case.
> -
>
> Key: MESOS-7187
> URL: https://issues.apache.org/jira/browse/MESOS-7187
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Mahler
>Priority: Major
>  Labels: tech-debt
>
> If the agent is re-registering with the master for the first time, the master 
> will drop any re-registration messages that arrive while the registry 
> operation is in progress.
> These dropped messages can have different metadata (e.g. version, 
> capabilities, etc) that gets dropped. Since the master doesn't distinguish 
> between different instances of the agent (both share the same UPID and there 
> is no instance identifying information), the master can't tell whether this 
> is a retry from the original instance of the agent or a re-registration from 
> a new instance of the agent.
> The following is an example:
> (1) Master restarts.
> (2) Agent re-registers with OLD_VERSION / OLD_CAPABILITIES.
> (3) While registry operation is in progress, agent is upgraded and 
> re-registers with NEW_VERSION / NEW_CAPABILITIES.
> (4) Registry operation completes, new agent receives the re-registration 
> acknowledgement message and so, does not retry.
> (5) Now, the master's memory reflects OLD_VERSION / OLD_CAPABILITIES for the 
> agent which remains inconsistent until a later re-registration occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-7187) Master can neglect to update agent metadata in a re-registration corner case.

2024-04-12 Thread Benjamin Mahler (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836731#comment-17836731
 ] 

Benjamin Mahler commented on MESOS-7187:


Observed an actual instance of this, occurred due to the following occurring:

1. ZK session expired
2. Master failover
3. Agent run 1 sends re-registration message to new master with UUID 1.
4. Agent fails over (for upgrade)
5. Agent run 2 sends re-registration message to new master
6. Master receives run 1 re-registration message.
7. Master ignores run 2 re-registration message (as agent is already 
re-registering).
8. Master completes re-registration and stores resource UUID 1 and notifies 
agent.
9. Agent receives re-registration completion, sends resource update with UUID 2.
10. Master *does not update* the agent's resource UUID (not because it ignores 
the update message, but because the logic simply doesn't make any update to it, 
which looks like a bug), so it remains UUID 1.

At this point, any tasks launched on the agent will go to TASK_DROPPED due to 
"Task assumes outdated resource state". The agent must be restarted at this 
point to fix the issue.


> Master can neglect to update agent metadata in a re-registration corner case.
> -
>
> Key: MESOS-7187
> URL: https://issues.apache.org/jira/browse/MESOS-7187
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Mahler
>Priority: Major
>  Labels: tech-debt
>
> If the agent is re-registering with the master for the first time, the master 
> will drop any re-registration messages that arrive while the registry 
> operation is in progress.
> These dropped messages can have different metadata (e.g. version, 
> capabilities, etc) that gets dropped. Since the master doesn't distinguish 
> between different instances of the agent (both share the same UPID and there 
> is no instance identifying information), the master can't tell whether this 
> is a retry from the original instance of the agent or a re-registration from 
> a new instance of the agent.
> The following is an example:
> (1) Master restarts.
> (2) Agent re-registers with OLD_VERSION / OLD_CAPABILITIES.
> (3) While registry operation is in progress, agent is upgraded and 
> re-registers with NEW_VERSION / NEW_CAPABILITIES.
> (4) Registry operation completes, new agent receives the re-registration 
> acknowledgement message and so, does not retry.
> (5) Now, the master's memory reflects OLD_VERSION / OLD_CAPABILITIES for the 
> agent which remains inconsistent until a later re-registration occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-9752) ./configure fails in certain strange Cyrus SASL setups

2023-07-30 Thread Ryan Carsten Schmidt (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748913#comment-17748913
 ] 

Ryan Carsten Schmidt commented on MESOS-9752:
-

Not realizing this had been reported already, I reported it again in bug 
MESOS-10241.

> ./configure fails in certain strange Cyrus SASL setups
> --
>
> Key: MESOS-9752
> URL: https://issues.apache.org/jira/browse/MESOS-9752
> Project: Mesos
>  Issue Type: Task
> Environment: MacOS X 10.13
> Cyrus SASL 2.1.27 installed through MacPorts
>Reporter: David Gilman
>Priority: Major
>
> I have an installation of Cyrus SASL that, for some unknown reason, has 
> duplicated SASL mechanisms installed. The crammd5_installed.c will print out 
> "found" for each CRAM-MD5 mechanism set up, resulting in output of 
> "foundfound" (once for each CRAM-MD5) which fails the Mesos ./configure test 
> which expects just "found".
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10241) checking SASL CRAM-MD5 support... configure: error: no

2023-07-30 Thread Ryan Carsten Schmidt (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748912#comment-17748912
 ] 

Ryan Carsten Schmidt commented on MESOS-10241:
--

I just realized this was already reported to you in bug MESOS-9752 four years 
ago but nobody has commented on it yet. Perhaps it can be addressed now.

> checking SASL CRAM-MD5 support... configure: error: no
> --
>
> Key: MESOS-10241
> URL: https://issues.apache.org/jira/browse/MESOS-10241
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.8.0, 1.11.0
> Environment: macOS 12.6.7
> Xcode 13.2.1
> Apple clang version 13.0.0 (clang-1300.0.29.30)
> Cyrus SASL 2.1.28
>Reporter: Ryan Carsten Schmidt
>Priority: Major
> Attachments: mesos-crammd5-quoting.patch, mesos-crammd5-test.patch
>
>
> mesos fails to configure:
> {{checking SASL CRAM-MD5 support... configure: error: no}}
> {{---}}
> {{We need CRAM-MD5 support for SASL authentication.}}
> {{---}}
> The configure script is checking if a test program outputs the word "found", 
> but on my system, the program outputs "foundfound" so the test fails. The 
> simplest fix would be instead to check whether the test program outputs 
> anything at all, per the attached "test" patch.
> Also, the configure check has incorrect syntax which was [introduced 
> here|https://github.com/apache/mesos/commit/c7d1e8055ea7c0cc6c01f2d7fca95a02b890d76b]:
>  the entire test program's code is not enclosed within square brackets. This 
> does not appear to cause a problem in autoconf 2.71 but it's probably best to 
> fix it before some future version of autoconf decides it is a problem, per 
> the attached "quoting" patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10240) configure: error: cannot find libz

2023-07-30 Thread Ryan Carsten Schmidt (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748910#comment-17748910
 ] 

Ryan Carsten Schmidt commented on MESOS-10240:
--

For example, the attached patch changes it to check for just one of zlib's 
functions, and this works.

Also, this bug report assumes bug MESOS-10241 has already been addressed; if 
not, that problem will be encountered first.

> configure: error: cannot find libz
> --
>
> Key: MESOS-10240
> URL: https://issues.apache.org/jira/browse/MESOS-10240
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.8.0, 1.11.0
> Environment: macOS 12.6.7
> Xcode 13.2.1
> Apple clang version 13.0.0 (clang-1300.0.29.30)
>Reporter: Ryan Carsten Schmidt
>Priority: Major
> Attachments: mesos-zlib.patch
>
>
> mesos fails to configure:
> {{checking for zlib.h... yes}}
> {{checking for deflate, gzread, gzwrite, inflate in -lz... no}}
> {{configure: error: cannot find libz}}
> {{---}}
> {{libz is required for Mesos to build.}}
> {{---}}
> I don't think [the zlib configure 
> check|https://github.com/apache/mesos/blob/8856d6fba11281df898fd65b0cafa1e20eb90fe8/configure.ac#L2301]
>  is correct. According to [autoconf 
> documentation|https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.71/html_node/Libraries.html#index-AC_005fCHECK_005fLIB-1],
>  the second argument of {{AC_CHECK_LIB}} is a function name, not a 
> comma-separated list of function names. It also says {{AC_CHECK_LIB}} "should 
> be avoided in some common cases", suggesting {{AC_SEARCH_LIBS}} be used 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (MESOS-10241) checking SASL CRAM-MD5 support... configure: error: no

2023-07-30 Thread Ryan Carsten Schmidt (Jira)
Ryan Carsten Schmidt created MESOS-10241:


 Summary: checking SASL CRAM-MD5 support... configure: error: no
 Key: MESOS-10241
 URL: https://issues.apache.org/jira/browse/MESOS-10241
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.11.0, 1.8.0
 Environment: macOS 12.6.7
Xcode 13.2.1
Apple clang version 13.0.0 (clang-1300.0.29.30)
Cyrus SASL 2.1.28
Reporter: Ryan Carsten Schmidt
 Attachments: mesos-crammd5-quoting.patch, mesos-crammd5-test.patch

mesos fails to configure:

{{checking SASL CRAM-MD5 support... configure: error: no}}
{{---}}
{{We need CRAM-MD5 support for SASL authentication.}}
{{---}}

The configure script is checking if a test program outputs the word "found", 
but on my system, the program outputs "foundfound" so the test fails. The 
simplest fix would be instead to check whether the test program outputs 
anything at all, per the attached "test" patch.

Also, the configure check has incorrect syntax which was [introduced 
here|https://github.com/apache/mesos/commit/c7d1e8055ea7c0cc6c01f2d7fca95a02b890d76b]:
 the entire test program's code is not enclosed within square brackets. This 
does not appear to cause a problem in autoconf 2.71 but it's probably best to 
fix it before some future version of autoconf decides it is a problem, per the 
attached "quoting" patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (MESOS-10240) configure: error: cannot find libz

2023-07-30 Thread Ryan Carsten Schmidt (Jira)
Ryan Carsten Schmidt created MESOS-10240:


 Summary: configure: error: cannot find libz
 Key: MESOS-10240
 URL: https://issues.apache.org/jira/browse/MESOS-10240
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.11.0, 1.8.0
 Environment: macOS 12.6.7
Xcode 13.2.1
Apple clang version 13.0.0 (clang-1300.0.29.30)
Reporter: Ryan Carsten Schmidt


mesos fails to configure:

{{checking for zlib.h... yes}}
{{checking for deflate, gzread, gzwrite, inflate in -lz... no}}
{{configure: error: cannot find libz}}
{{---}}
{{libz is required for Mesos to build.}}
{{---}}

I don't think [the zlib configure 
check|https://github.com/apache/mesos/blob/8856d6fba11281df898fd65b0cafa1e20eb90fe8/configure.ac#L2301]
 is correct. According to [autoconf 
documentation|https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.71/html_node/Libraries.html#index-AC_005fCHECK_005fLIB-1],
 the second argument of {{AC_CHECK_LIB}} is a function name, not a 
comma-separated list of function names. It also says {{AC_CHECK_LIB}} "should 
be avoided in some common cases", suggesting {{AC_SEARCH_LIBS}} be used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10239) Installing Mesos on Oracle Linux 8.3

2022-09-15 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605513#comment-17605513
 ] 

Charles Natali commented on MESOS-10239:


Hi [~Mar_zieh],

You don't need Python to install Mesos, unless you use Python bindings.
If you're building from the source, you can just pass {{--disable-python}} as 
describe here: 
https://mesos.apache.org/documentation/latest/configuration/autotools/

Could you please details the error you're getting?

> Installing Mesos on Oracle Linux 8.3
> 
>
> Key: MESOS-10239
>     URL: https://issues.apache.org/jira/browse/MESOS-10239
> Project: Mesos
>  Issue Type: Task
>Reporter: Marzieh
>Priority: Major
>
> some new versions of Linux like Oracle Linux 8,  Redhat 8 , does not support 
> Python2 any more,however Mesos need to Python2. So, there is no way to 
> install Mesos in these environments.
> Would you please make Mesos updated to be installed in new Linux 
> distributions?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (MESOS-10239) Installing Mesos on Oracle Linux 8.3

2022-09-13 Thread Marzieh (Jira)
Marzieh created MESOS-10239:
---

 Summary: Installing Mesos on Oracle Linux 8.3
 Key: MESOS-10239
 URL: https://issues.apache.org/jira/browse/MESOS-10239
 Project: Mesos
  Issue Type: Task
Reporter: Marzieh


some new versions of Linux like Oracle Linux 8,  Redhat 8 , does not support 
Python2 any more,however Mesos need to Python2. So, there is no way to install 
Mesos in these environments.

Would you please make Mesos updated to be installed in new Linux distributions?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-09-07 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601327#comment-17601327
 ] 

Sangita Nalkar edited comment on MESOS-10234 at 9/7/22 2:14 PM:


Thank you [~cf.natali] and [~qianzhang] for your response.

 


was (Author: JIRAUSER282507):
Thank you [~cf.natali] and [~qianzhang] for your response.

Closing this issue.

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-09-07 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601327#comment-17601327
 ] 

Sangita Nalkar commented on MESOS-10234:


Thank you [~cf.natali] and [~qianzhang] for your response.

Closing this issue.

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-08-26 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585253#comment-17585253
 ] 

Qian Zhang edited comment on MESOS-10234 at 8/26/22 9:09 AM:
-

According to 
[https://blogs.apache.org/security/entry/cve-2021-44228|https://blogs.apache.org/security/entry/cve-2021-44228,]
 , it seems ZooKeeper is not affected by 
[CVE-2021-44228|https://www.cve.org/CVERecord?id=CVE-2021-44228].


was (Author: qianzhang):
According to [https://blogs.apache.org/security/entry/cve-2021-44228,] it seems 
ZooKeeper is not affected by 
[CVE-2021-44228|https://www.cve.org/CVERecord?id=CVE-2021-44228].

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-08-26 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585253#comment-17585253
 ] 

Qian Zhang commented on MESOS-10234:


According to [https://blogs.apache.org/security/entry/cve-2021-44228,] it seems 
ZooKeeper is not affected by 
[CVE-2021-44228|https://www.cve.org/CVERecord?id=CVE-2021-44228].

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-08-24 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584433#comment-17584433
 ] 

Charles Natali commented on MESOS-10234:


Hi Sangita,

if this is an issue for you, you can simply use whatever zookeeper version you 
want, you do not need to use the shipped one.

We could update zookeeper separately, the shipped version is quite old and has 
some known bugs - [~qianzhang] what do you think?

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-08-23 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584019#comment-17584019
 ] 

Sangita Nalkar commented on MESOS-10234:


Hello,

A gentle reminder since we are waiting for your response.

Could you please provide an update if the log4j version shipped with zookeeper 
has been updated or would be updated in the coming release?

 

Thanks and Regards,

Sangita

 

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2022-08-18 Thread Andreas Peters (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Peters reassigned MESOS-10230:
--

Assignee: Andreas Peters

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Assignee: Andreas Peters
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2022-08-17 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581021#comment-17581021
 ] 

Andreas Peters commented on MESOS-10230:


Hi,

the new JQuery version will be shipped out with a new currently planned Mesos 
release. But I understand your pain. If you like, I can show you how to replace 
is manually. We also have a Mesos Slack (or Matrix) channel 
([https://mesos.apache.org/community/)|https://mesos.apache.org/community/).] 
if you need a quick help. :)

Cheers,

Andreas

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2022-08-17 Thread p engels (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580877#comment-17580877
 ] 

p engels commented on MESOS-10230:
--

Hate to bring this up after all this time. The old installation of jQuery is 
still showing on the scanner for my organization. We are on the latest Mesos 
version. Is there anything needed on my end to remove that jQuery from the 
system?

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (MESOS-10238) Error while compiling directory mesos/build using make

2022-04-04 Thread Calvin Valentino Gosal (Jira)
Calvin Valentino Gosal created MESOS-10238:
--

 Summary: Error while compiling directory mesos/build using make
 Key: MESOS-10238
 URL: https://issues.apache.org/jira/browse/MESOS-10238
 Project: Mesos
  Issue Type: Bug
Reporter: Calvin Valentino Gosal


I'm building mesos on Ubuntu with version 20.4. I've followed the steps like 
the one on this page https://mesos.apache.org/documentation/latest/building/

But I encountered a problem when I compile it with the make command. The error 
is

*make[4]: Entering directory '/home/calvin/mesos/build/3rdparty/grpc-1.10.0'*
*[CXX] Compiling src/core/lib/gpr/log_linux.cc*
*src/core/lib/gpr/log_linux.cc:42:13: error: ambiguating new declaration of 
'long int gettid()'*
   *42 | static long gettid(void) \{ return syscall(__NR_gettid); }*
      *| ^~~~*
*In file included from /usr/include/unistd.h:1170,*
                 *from src/core/lib/gpr/log_linux.cc:40:*
*/usr/includes/x86_64-linux-gnu/bits/unistd_ext.h:34:16: note: old declaration 
'__pid_t gettid()'*
   *34 | extern __pid_t gettid(void) __THROW;*
      *| ^~~~*
*src/core/lib/gpr/log_linux.cc:42:13: warning: 'long int gettid()' defined but 
not used [-Wunused-function]*
   *42 | static long gettid(void) \{ return syscall(__NR_gettid); }*

Now, I stop here.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-03-28 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513217#comment-17513217
 ] 

Sangita Nalkar commented on MESOS-10234:


Hello [~cf.natali] ,

You mentioned the log4j version shipped with zookeeper would be updated, could 
you please provide any updates on same.

Regards,

Sangita

 

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10237) Mesos-slave issue report

2022-03-24 Thread feixiachao (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512168#comment-17512168
 ] 

feixiachao commented on MESOS-10237:


Thanks [~cf.natali]  , we dont have specific problem , just confuse about this 
warning log ,

> Mesos-slave issue report 
> -
>
> Key: MESOS-10237
> URL: https://issues.apache.org/jira/browse/MESOS-10237
> Project: Mesos
>  Issue Type: Bug
>Reporter: feixiachao
>Priority: Major
>
> we encountered an issue about mesos-slave , the mesos.ERROR log shown as 
> below:
> E0323 22:56:03.278918  2848 memory.cpp:502] Listening on OOM events failed 
> for container ff408971-b610-4f84-bbc3-81b0c6be9499: Event listener is 
> terminating
> E0323 22:58:06.018554  2834 memory.cpp:502] Listening on OOM events failed 
> for container 3afa2056-1976-4857-9121-cfad0f0ba73e: Event listener is 
> terminating
> E0323 23:12:05.261996  2816 memory.cpp:502] Listening on OOM events failed 
> for container 56912877-5733-4050-bce8-0cc179cc0bc8: Event listener is 
> terminating
> Could any someone to help for this issue ?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10237) Mesos-slave issue report

2022-03-24 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512051#comment-17512051
 ] 

Charles Natali commented on MESOS-10237:


Hi [~feixiachao],

Are you having a specific problem or just wondering about those error messages?
Those errors are benign and can be ignored - they've actually been fixed in 
master: 
https://github.com/apache/mesos/commit/6bc5a5e114077f542f7258adffb78a54849ddf90

> Mesos-slave issue report 
> -
>
> Key: MESOS-10237
> URL: https://issues.apache.org/jira/browse/MESOS-10237
> Project: Mesos
>  Issue Type: Bug
>Reporter: feixiachao
>Priority: Major
>
> we encountered an issue about mesos-slave , the mesos.ERROR log shown as 
> below:
> E0323 22:56:03.278918  2848 memory.cpp:502] Listening on OOM events failed 
> for container ff408971-b610-4f84-bbc3-81b0c6be9499: Event listener is 
> terminating
> E0323 22:58:06.018554  2834 memory.cpp:502] Listening on OOM events failed 
> for container 3afa2056-1976-4857-9121-cfad0f0ba73e: Event listener is 
> terminating
> E0323 23:12:05.261996  2816 memory.cpp:502] Listening on OOM events failed 
> for container 56912877-5733-4050-bce8-0cc179cc0bc8: Event listener is 
> terminating
> Could any someone to help for this issue ?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (MESOS-10237) Mesos-slave issue report

2022-03-23 Thread feixiachao (Jira)
feixiachao created MESOS-10237:
--

 Summary: Mesos-slave issue report 
 Key: MESOS-10237
 URL: https://issues.apache.org/jira/browse/MESOS-10237
 Project: Mesos
  Issue Type: Bug
Reporter: feixiachao


we encountered an issue about mesos-slave , the mesos.ERROR log shown as below:

E0323 22:56:03.278918  2848 memory.cpp:502] Listening on OOM events failed for 
container ff408971-b610-4f84-bbc3-81b0c6be9499: Event listener is terminating
E0323 22:58:06.018554  2834 memory.cpp:502] Listening on OOM events failed for 
container 3afa2056-1976-4857-9121-cfad0f0ba73e: Event listener is terminating
E0323 23:12:05.261996  2816 memory.cpp:502] Listening on OOM events failed for 
container 56912877-5733-4050-bce8-0cc179cc0bc8: Event listener is terminating



Could any someone to help for this issue ?

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-02-21 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495409#comment-17495409
 ] 

Sangita Nalkar edited comment on MESOS-10234 at 2/21/22, 9:15 AM:
--

Thank you [~cf.natali] for your response.

Please let me know once the version for log4j is updated in zookeeper.

Regards,

Sangita


was (Author: JIRAUSER282507):
Thank you [~cf.natali] for your response.

Please let me know once the version for log4j version is updated in zookeeper.

Regards,

Sangita

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-02-21 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495409#comment-17495409
 ] 

Sangita Nalkar commented on MESOS-10234:


Thank you [~cf.natali] for your response.

Please let me know once the version for log4j version is updated in zookeeper.

Regards,

Sangita

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (MESOS-10236) [/std:c++latest][MSVC] Mesos failed to build due to error C2440 with /std:c++latest on Windows using MSVC

2022-02-16 Thread QuellaZhang (Jira)
QuellaZhang created MESOS-10236:
---

 Summary: [/std:c++latest][MSVC] Mesos failed to build due to error 
C2440 with /std:c++latest on Windows using MSVC
 Key: MESOS-10236
 URL: https://issues.apache.org/jira/browse/MESOS-10236
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: master
 Environment: VS 2019 + Windows Server 2016
Reporter: QuellaZhang
 Attachments: build.log

Hi All,

We tried to build Mesos on Windows with VS2019. It failed to build due to error 
C2440 with /std:c++latest on Windows using MSVC. It can be reproduced on latest 
reversion 97d9a40 on master branch. Could you please take a look at this 
isssue? Thanks a lot!

Reproduce steps:
# git clone https://github.com/apache/mesos F:\apache\mesos
# Open a VS 2019 x64 command prompt as admin and browse to F:\apache\mesos
# mkdir build_amd64 && pushd build_amd64
# cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.18362.0 
-DENABLE_LIBEVENT=1 -DHAS_AUTHENTICATION=0 
-DPATCHEXE_PATH="F:\tools\gnuwin32\bin" -T host=x64 ..
# set _CL_= /std:c++latest /Zc:char8_t-
# set _CL_= /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING %_CL_%
# msbuild /maxcpucount:4 /p:Platform=x64 /p:Configuration=Debug Mesos.sln 
/t:Rebuild

error:
F:\gitP\apache\mesos\build_amd64\3rdparty\protobuf-3.5.0\src\protobuf-3.5.0\src\google\protobuf\compiler\objectivec\objectivec_helpers.cc(618,1):
 error C2440: 'return': cannot convert from 'int' to 
'std::basic_string,std::allocator>'
F:\gitP\apache\mesos\build_amd64\3rdparty\protobuf-3.5.0\src\protobuf-3.5.0\src\google\protobuf\compiler\objectivec\objectivec_helpers.cc(746,1):
 error C2440: 'return': cannot convert from 'int' to 
'std::basic_string,std::allocator>'
F:\gitP\apache\mesos\build_amd64\3rdparty\protobuf-3.5.0\src\protobuf-3.5.0\src\google\protobuf\compiler\objectivec\objectivec_helpers.cc(818,1):
 error C2440: 'return': cannot convert from 'int' to 
'std::basic_string,std::allocator>'
C:\Program Files (x86)\Microsoft Visual 
Studio\2019\Enterprise\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(241,5):
 error MSB8066: Custom build for 
'F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-mkdir.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-download.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-update.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-patch.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-configure.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-build.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\c2387485d1f3dbc1b6a74fd6bcbf1c42\protobuf-3.5.0-install.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\391b5defe1e7774ea47783fbd33671f6\protobuf-3.5.0-complete.rule;F:\gitP\apache\mesos\build_amd64\CMakeFiles\3e040e27eb7e35ec141c0982bf4a7993\protobuf-3.5.0.rule;F:\gitP\apache\mesos\3rdparty\CMakeLists.txt'
 exited with code 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-02-15 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492857#comment-17492857
 ] 

Charles Natali commented on MESOS-10234:


Hi,

I cannot see an explicit dependency on log4j v1.2.17 - are you sure the build 
is not picking up your system's version?

Then again I'm really not familiar with the java bindings.

Note that the only log4j which is shipped with Mesos is part of the zookeeper 
version packaged:


{noformat}
./build/3rdparty/zookeeper-3.4.8/lib/slf4j-log4j12-1.6.1.jar
./build/3rdparty/zookeeper-3.4.8/lib/log4j-1.2.16.LICENSE.txt
./build/3rdparty/zookeeper-3.4.8/lib/log4j-1.2.16.jar
./build/3rdparty/zookeeper-3.4.8/src/java/lib/log4j-1.2.16.LICENSE.txt
./build/3rdparty/zookeeper-3.4.8/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties
./build/3rdparty/zookeeper-3.4.8/src/contrib/rest/conf/log4j.properties
./build/3rdparty/zookeeper-3.4.8/src/contrib/zooinspector/lib/log4j.properties
./build/3rdparty/zookeeper-3.4.8/conf/log4j.properties
./build/3rdparty/zookeeper-3.4.8/contrib/rest/lib/slf4j-log4j12-1.6.1.jar
./build/3rdparty/zookeeper-3.4.8/contrib/rest/lib/log4j-1.2.15.jar
./build/3rdparty/zookeeper-3.4.8/contrib/rest/conf/log4j.properties

{noformat}


I'm not sure if anyone uses the shipped version, but maybe we could update it, 
what do you think [~asekretenko]?

Note that at work we experienced a zookeeper bug following a failover which 
IIRC caused some ephemeral nodes to not be deleted on the promoted leader, 
leading to inconsistencies in the Mesos registry - so updating could also solve 
this issue for whoever happens to use it.

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
>     URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-02-15 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492484#comment-17492484
 ] 

Sangita Nalkar commented on MESOS-10234:


Hello [~cf.natali] , [~asekretenko] ,

Could you please help in answering my query in above comment?

Thanks,

Sangita

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (MESOS-10235) v1 Operator API GET_MASTER is missing Capability.Type.QUOTA_V2

2022-01-26 Thread Dan Leary (Jira)
Dan Leary created MESOS-10235:
-

 Summary: v1 Operator API GET_MASTER is missing 
Capability.Type.QUOTA_V2
 Key: MESOS-10235
 URL: https://issues.apache.org/jira/browse/MESOS-10235
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Affects Versions: 1.11.0, 1.9.0
 Environment: Ubuntu 18.04, Mesos 1.11.0 built from tarball.

 
Reporter: Dan Leary


A GET_MASTER call on the v1 HTTP Operator API at [http://master/api/v1] returns 
a master_info that is missing Capability.Type.QUOTA_V2.  E.g.:
{noformat}
{
   "type" : "GET_MASTER",
   "get_master" : {
      "master_info" : {
         "capabilities" : [
            {
               "type" : "AGENT_UPDATE"
            },
            {
               "type" : "AGENT_DRAINING"
            },
            {}
         ], etc...
{noformat}
I suspect that this change to include/mesos/mesos.proto:
{noformat}
0bc857d672      (  Meng Zhu     2019-06-25 15:19:44 -0700       927)
0bc857d672      (  Meng Zhu     2019-06-25 15:19:44 -0700       928)      // 
The master can handle the new quota API, which supports setting
0bc857d672      (  Meng Zhu     2019-06-25 15:19:44 -0700       929)      // 
limits separately from guarantees (introduced in Mesos 1.9).
0bc857d672      (  Meng Zhu     2019-06-25 15:19:44 -0700       930)      
QUOTA_V2 = 3;
{noformat}
Is also needed in include/mesos/v1/mesos.proto.

Consequently, enums org.apache.mesos.Protos.MasterInfo.Capability.Type and 
org.apache.mesos.v1.Protos.MasterInfo.Capability.Type differ in the same 
respect for those of us using protobufs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-01-07 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470538#comment-17470538
 ] 

Sangita Nalkar commented on MESOS-10234:


Hello,

While building mesos from source, I see that log4j v1.2.17 is being used.

Since you mentioned that example frameworks and tests might be affected due to 
log4j, do you plan to fix or update the log4j version?

Thanks

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2022-01-02 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467809#comment-17467809
 ] 

Sangita Nalkar commented on MESOS-10234:


Thank you [~cf.natali] and [~asekretenko] for your response.
Also [~cf.natali] please let me know if in case you find out anything else 
later.

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2021-12-31 Thread Andrei Sekretenko (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467338#comment-17467338
 ] 

Andrei Sekretenko commented on MESOS-10234:
---

Talking about production code  - I don't see how agent/master could be 
affected; the only potentially affected thing are the Java scheduler libraries. 

On a first glance there, it indeed looks like scheduler libraries do not use 
log4j. Which would mean that only example frameworks and tests might be 
affected.

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2021-12-31 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467229#comment-17467229
 ] 

Charles Natali commented on MESOS-10234:


Hi [~snalkar]

Sorry for the delay, but Mesos has very little resources, and holiday season 
doesn't help.

I've had a quick look, and log4j only seems to be used for tests - Mesos is 
mostly written in C++, so it's not surprising.
It's possible it's used in some third-party dependencies included, but I'd be 
surprised if it was exploitable.

I'll have a more thorough look after the holidays.

Cheers,

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2021-12-28 Thread Sangita Nalkar (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466312#comment-17466312
 ] 

Sangita Nalkar commented on MESOS-10234:


Hello,

Would appreciate it if you could respond to me on above query.

Thanks,

Sangita

> CVE-2021-44228 Log4j vulnerability for apache mesos
> ---
>
> Key: MESOS-10234
> URL: https://issues.apache.org/jira/browse/MESOS-10234
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11.0
>Reporter: Sangita Nalkar
>Priority: Critical
>
> Hi,
> Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache 
> mesos.
> We see that log4j v1.2.17 is used while building apache mesos from source.
> Snippet from build logs:
> std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
> jvm/org/apache/.deps/libjava_la-log4j.Tpo -c 
> ../../src/jvm/org/apache/log4j.cpp  -fPIC -DPIC -o 
> jvm/org/apache/.libs/libjava_la-log4j.o
> Thanks,
> Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (MESOS-10234) CVE-2021-44228 Log4j vulnerability for apache mesos

2021-12-22 Thread Sangita Nalkar (Jira)
Sangita Nalkar created MESOS-10234:
--

 Summary: CVE-2021-44228 Log4j vulnerability for apache mesos
 Key: MESOS-10234
 URL: https://issues.apache.org/jira/browse/MESOS-10234
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.11.0
Reporter: Sangita Nalkar


Hi,

Wanted to know if CVE-2021-44228 Log4j vulnerability is affecting Apache mesos.
We see that log4j v1.2.17 is used while building apache mesos from source.

Snippet from build logs:
std=c++11 -MT jvm/org/apache/libjava_la-log4j.lo -MD -MP -MF 
jvm/org/apache/.deps/libjava_la-log4j.Tpo -c ../../src/jvm/org/apache/log4j.cpp 
 -fPIC -DPIC -o jvm/org/apache/.libs/libjava_la-log4j.o
Thanks,

Sangita



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (MESOS-10233) containers were not cleaned up properly and left running.

2021-10-29 Thread Tan N. Le (Jira)
Tan N. Le created MESOS-10233:
-

 Summary: containers were not cleaned up properly and left running.
 Key: MESOS-10233
 URL: https://issues.apache.org/jira/browse/MESOS-10233
 Project: Mesos
  Issue Type: Task
 Environment: aurora-scheduler 0.25.0

mesos 1.11.0

executor plugin: DCE [https://github.com/paypal/dce-go] based on mesos-go v0.002

 
Reporter: Tan N. Le


We observe that tasks were in STARTING and mesos tried to killed and cleaned 
them up dueo to OOM.

however, cgroup freezer files are not there and it assumes the containers are 
being cleaned.

the containers left running but the tasks reported lost in aurora/mesos.

 



aurora logs
I1026 05:16:55.886 [TaskEventBatchWorker, StateMachine] 
-b0b685ca-3ded-4304-a591-9241d06d7728 state machine transition INIT -> PENDING
I1026 05:16:55.984 [TaskGroupBatchWorker, StateMachine] 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 
state machine transition PENDING -> ASSIGNED
I1026 05:16:55.984 [TaskGroupBatchWorker, TaskAssignerImpl] Offer on agent 
gpma771518.gpf-prod.us-central1.gcp.dev.paypalinc.com (id 
a76961ab-bba0-46e5-ae7b-b234057b7a33-S307) is being assigned task for 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728.
I1026 05:16:57.402 [Thread-1715969, 
MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for task 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 in 
state TASK_STARTING from SOURCE_EXECUTOR
I1026 05:16:57.402 [TaskStatusHandlerImpl, StateMachine] 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 
state machine transition ASSIGNED -> STARTING
W1026 05:18:03.376 [Thread-1717148, 
MesosCallbackHandler$MesosCallbackHandlerImpl] Lost executor 
compose-mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728
 on slave a76961ab-bba0-46e5-ae7b-b234057b7a33-S307 with status -1
I1026 05:18:03.377 [Thread-1717149, 
MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for task 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 in 
state TASK_FAILED from SOURCE_AGENT with REASON_EXECUTOR_TERMINATED: Abnormal 
executor termination: Failed to kill all processes in the container: Timed out 
after 1mins
I1026 05:18:03.390 [TaskStatusHandlerImpl, StateMachine] 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 
state machine transition STARTING -> FAILED
I1026 05:18:03.390 [TaskStatusHandlerImpl, StateManagerImpl] Task being 
rescheduled: 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728
I1026 05:21:08.948 [Thread-1720928, 
MesosCallbackHandler$MesosCallbackHandlerImpl] Received status update for task 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 in 
state TASK_RUNNING from SOURCE_EXECUTOR
I1026 05:21:08.949 [TaskStatusHandlerImpl, StateMachine] 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 
state machine transition FAILED -> RUNNING (not allowed)
I1026 05:21:08.950 [TaskStatusHandlerImpl, StateMachine] 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 
state machine transition FAILED -> LOST (not allowed)

=

mesos-master logs

I1026 05:16:55.991168 29973 master.cpp:3873] Adding executor 
'compose-mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728'
 with resources cpus(allocated: aurora):0.1; mem(allocated: aurora):256 of 
framework 9f48d831-63e7-4556-86ab-463a69389e4d- (Aurora) at 
scheduler-bf829a38-5c60-46cb-82dc-9c7fc7be7130@10.180.52.175:8083 on agent 
a76961ab-bba0-46e5-ae7b-b234057b7a33-S307 at slave(1)@10.180.50.210:5051 
(gpma771518.gpf-prod.us-central1.gcp.dev.paypalinc.com)
I1026 05:16:55.991324 29973 master.cpp:3899] Adding task 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 
with resources cpus(allocated: aurora):0.9; disk(allocated: aurora):100; 
mem(allocated: aurora):4096; ports(allocated: aurora):[10020-10020, 
10076-10076, 10137-10137, 10139-10139, 10150-10150] of framework 
9f48d831-63e7-4556-86ab-463a69389e4d- (Aurora) at 
scheduler-bf829a38-5c60-46cb-82dc-9c7fc7be7130@10.180.52.175:8083 on agent 
a76961ab-bba0-46e5-ae7b-b234057b7a33-S307 at slave(1)@10.180.50.210:5051 
(gpma771518.gpf-prod.us-central1.gcp.dev.paypalinc.com)
I1026 05:16:56.090255 29973 master.cpp:5035] Launching task 
mstestenv-msmaster4int-g-mppnodeweb-a-3-b0b685ca-3ded-4304-a591-9241d06d7728 of 
framework 9f48d831-63e7-4556-86ab-463a69389e4d- (Aurora) at 
scheduler-bf829a38-5c60-46cb-82dc-9c7fc7be7130@10.180.52.175:8083 with 
resources 
[\{"allocation_info":{"role":"aurora"},"name":"cpus","scalar":\{"value":

[jira] [Assigned] (MESOS-9657) Launching a command task twice can crash the agent

2021-10-16 Thread Charles Natali (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Natali reassigned MESOS-9657:
-

Fix Version/s: 1.12.0
 Assignee: Charles Natali
   Resolution: Fixed

> Launching a command task twice can crash the agent
> --
>
> Key: MESOS-9657
> URL: https://issues.apache.org/jira/browse/MESOS-9657
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Charles Natali
>Priority: Major
> Fix For: 1.12.0
>
>
> When launching a command task, we verify that the framework has no existing 
> executor for that task:
> {noformat}
>   // We are dealing with command task; a new command executor will be
>   // launched.
>   CHECK(executor == nullptr);
> {noformat}
> and afterwards an executor is created with the same executor id as the task 
> id:
> {noformat}
>   // (slave.cpp)
>   // Either the master explicitly requests launching a new executor
>   // or we are in the legacy case of launching one if there wasn't
>   // one already. Either way, let's launch executor now.
>   if (executor == nullptr) {
> Try added = framework->addExecutor(executorInfo);
>   [...]
> {noformat}
> This means that if we relaunch the task with the same task id before the 
> executor is removed, it will crash the agent:
> {noformat}
> F0315 16:39:32.822818 38112 slave.cpp:2865] Check failed: executor == nullptr 
> *** Check failure stack trace: ***
> @ 0x7feb29a407af  google::LogMessage::Flush()
> @ 0x7feb29a43c3f  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7feb28a5a886  mesos::internal::slave::Slave::__run()
> @ 0x7feb28af4f0e  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSK_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaISU_EERKSK_IbESG_SJ_SO_SS_SY_S11_EEvRKNS1_3PIDIT_EEMS13_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSE_OSH_OSM_OSQ_OSW_OSZ_S3_E_JSE_SH_SM_SQ_SW_SZ_St12_PlaceholderILi1EEclEOS3_
> @ 0x7feb2998a620  process::ProcessBase::consume()
> @ 0x7feb29987675  process::ProcessManager::resume()
> @ 0x7feb299a2d2b  
> _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvE3$_8E6_M_runEv
> @ 0x7feb2632f523  (unknown)
> @ 0x7feb25e40594  start_thread
> @ 0x7feb25b73e6f  __GI___clone
> Aborted (core dumped)
> {noformat}
> Instead of crashing, the agent should just drop the task with an appropriate 
> error in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10231) Mesos master crashes during framework teardown

2021-10-04 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423868#comment-17423868
 ] 

Andreas Peters commented on MESOS-10231:


Can u show us the configuration how u start spark? Would be helpfull to try it 
out byself.

> Mesos master crashes during framework teardown
> --
>
> Key: MESOS-10231
> URL: https://issues.apache.org/jira/browse/MESOS-10231
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, master
>Affects Versions: 1.9.0
> Environment: CentOS Linux release 7.9.2009
> Mesos version - 1.9.0
>Reporter: Divyansh Jamuaar
>Priority: Major
>
> I have setup a Mesos cluster with a single Mesos Master and I submit spark 
> jobs to it in "cluster" mode.
> After running few spark jobs correctly, the Mesos master crashes while trying 
> to shutdown one of the Spark frameworks with the following error -
>  
> {code:java}
> F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: 
> totalOfferedResources.filter(allocatedToRole).empty() 
> *** Check failure stack trace: ***
> @ 0x7f1e024ded2e  google::LogMessage::Fail()
> @ 0x7f1e024dec8d  google::LogMessage::SendToLog()
> @ 0x7f1e024de637  google::LogMessage::Flush()
> @ 0x7f1e024e191c  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f1dff93978d  
> mesos::internal::master::Framework::untrackUnderRole()
> @ 0x7f1dffad004b  mesos::internal::master::Master::removeFramework()
> @ 0x7f1dfface859  mesos::internal::master::Master::teardown()
> @ 0x7f1dffa8ba25  mesos::internal::master::Master::receive()
> @ 0x7f1dffb2f1cf  ProtobufProcess<>::handlerMutM<>()
> @ 0x7f1dffbe6809  std::__invoke_impl<>()
> @ 0x7f1dffbdae22  std::__invoke<>()
> @ 0x7f1dffbc8079  
> _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcS4_SD_St12_PlaceholderILi1EESO_ILi26__callIvJS8_SL_EJLm0ELm1ELm2ELm3T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7f1dffbaaae5  std::_Bind<>::operator()<>()
> @ 0x7f1dffb833c9  std::_Function_handler<>::_M_invoke()
> @ 0x7f1dff330281  std::function<>::operator()()
> @ 0x7f1dffb13329  ProtobufProcess<>::consume()
> @ 0x7f1dffa85436  mesos::internal::master::Master::_consume()
> @ 0x7f1dffa84ad5  mesos::internal::master::Master::consume()
> @ 0x7f1dffafb9ae  
> _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE
> @ 0x564c359f7002  process::ProcessBase::serve()
> @ 0x7f1e023a7bbd  process::ProcessManager::resume()
> @ 0x7f1e023a407c  
> _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
> @ 0x7f1e023cf1ba  
> _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
> @ 0x7f1e023cd9c9  
> _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_
> @ 0x7f1e023cc482  
> _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0vSt12_Index_tupleIJXspT_EEE
> @ 0x7f1e023cb53b  
> _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv
> @ 0x7f1e023ca3c4  
> _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_E6_M_runEv
> @ 0x7f1e051f419d  execute_native_thread_routine
> @ 0x7f1df4200ea5  start_thread
> @ 0x7f1df3f2996d  __clone
> {code}
>  
>  
> It seems like an assertion check is failing which is categorized as fatal but 
> I am not able to figure out the root cause of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2021-10-04 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423785#comment-17423785
 ] 

Andreas Peters commented on MESOS-10230:


Here comes the PR. :)

https://github.com/apache/mesos/pull/411

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2021-10-03 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423666#comment-17423666
 ] 

Andreas Peters commented on MESOS-10230:


My dry run on a server (master and agent) works well with jquery 3.6.0. 
Everything at the mesos ui still works. Nothing is broken, no error messages. I 
will build mesos to see if the build scripts missing sth. If everything is fine 
too, I will open a PR.

[~cf.natali] : As I know,  "mesos-site" is generated via jenkins. The source is 
the "mesos/site"! As I say, like I know. It does not mean I'm 100% sure. :) The 
website I will change later. [~pengels]  security scanner will not be affected 
by that. 

 

Have a nice weekend

Andreas

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10232) Old sandboxes not being GC'ed caused frequent Mesos GC

2021-09-30 Thread HAO SU (Jira)
HAO SU created MESOS-10232:
--

 Summary: Old sandboxes not being GC'ed caused frequent Mesos GC
 Key: MESOS-10232
 URL: https://issues.apache.org/jira/browse/MESOS-10232
 Project: Mesos
  Issue Type: Bug
  Components: agent
Reporter: HAO SU


Customers reported that their logs (sandbox files) are missing soon after the 
job completes. Mesos agent logs indicate that the files were GC-ed within 
minutes of container exit. Checking the host, there were a lot of old sandboxes 
dating back to Jan 2020. These are occupying a lot of space (~88% of all 
sandbox usage) and likely causing frequent GC of recently running containers. 

Mesos does recognize these sandbox and try to schedule them for deletion
{code:java}
 I0902 18:02:27.511576 467334 gc.cpp:95] Scheduling 
'/var/lib/mesos/meta/slaves/68caec4c-6ea5-44e7-9f8-fad1922d5-S162/frameworks/3dcc744f-016c-6579-9b82-6325402d2-/executors/fa00-29a3-4c47-95fd-808d52ac53-13-1'
 for gc -85.5641509780737weeks in the future
{code}
but the deletion seems to never happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10231) Mesos master crashes during framework teardown

2021-09-28 Thread Divyansh Jamuaar (Jira)
Divyansh Jamuaar created MESOS-10231:


 Summary: Mesos master crashes during framework teardown
 Key: MESOS-10231
 URL: https://issues.apache.org/jira/browse/MESOS-10231
 Project: Mesos
  Issue Type: Bug
  Components: framework, master
Affects Versions: 1.9.0
 Environment: CentOS Linux release 7.9.2009

Mesos version - 1.9.0
Reporter: Divyansh Jamuaar


I have setup a Mesos cluster with a single Mesos Master and I submit spark jobs 
to it in "cluster" mode.

After running few spark jobs correctly, the Mesos master crashes while trying 
to shutdown one of the Spark frameworks with the following error -

 
{code:java}
F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: 
totalOfferedResources.filter(allocatedToRole).empty() 
*** Check failure stack trace: ***
@ 0x7f1e024ded2e  google::LogMessage::Fail()
@ 0x7f1e024dec8d  google::LogMessage::SendToLog()
@ 0x7f1e024de637  google::LogMessage::Flush()
@ 0x7f1e024e191c  google::LogMessageFatal::~LogMessageFatal()
@ 0x7f1dff93978d  mesos::internal::master::Framework::untrackUnderRole()
@ 0x7f1dffad004b  mesos::internal::master::Master::removeFramework()
@ 0x7f1dfface859  mesos::internal::master::Master::teardown()
@ 0x7f1dffa8ba25  mesos::internal::master::Master::receive()
@ 0x7f1dffb2f1cf  ProtobufProcess<>::handlerMutM<>()
@ 0x7f1dffbe6809  std::__invoke_impl<>()
@ 0x7f1dffbdae22  std::__invoke<>()
@ 0x7f1dffbc8079  
_ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcS4_SD_St12_PlaceholderILi1EESO_ILi26__callIvJS8_SL_EJLm0ELm1ELm2ELm3T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
@ 0x7f1dffbaaae5  std::_Bind<>::operator()<>()
@ 0x7f1dffb833c9  std::_Function_handler<>::_M_invoke()
@ 0x7f1dff330281  std::function<>::operator()()
@ 0x7f1dffb13329  ProtobufProcess<>::consume()
@ 0x7f1dffa85436  mesos::internal::master::Master::_consume()
@ 0x7f1dffa84ad5  mesos::internal::master::Master::consume()
@ 0x7f1dffafb9ae  
_ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE
@ 0x564c359f7002  process::ProcessBase::serve()
@ 0x7f1e023a7bbd  process::ProcessManager::resume()
@ 0x7f1e023a407c  
_ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
@ 0x7f1e023cf1ba  
_ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
@ 0x7f1e023cd9c9  
_ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_
@ 0x7f1e023cc482  
_ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0vSt12_Index_tupleIJXspT_EEE
@ 0x7f1e023cb53b  
_ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv
@ 0x7f1e023ca3c4  
_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_E6_M_runEv
@ 0x7f1e051f419d  execute_native_thread_routine
@ 0x7f1df4200ea5  start_thread
@ 0x7f1df3f2996d  __clone

{code}
 

 

It seems like an assertion check is failing which is categorized as fatal but I 
am not able to figure out the root cause of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2021-09-27 Thread p engels (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420736#comment-17420736
 ] 

p engels commented on MESOS-10230:
--

Thank you!

 

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2021-09-27 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420523#comment-17420523
 ] 

Andreas Peters commented on MESOS-10230:


No problem. :)  I will.

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10198) Mesos-master service is activating state

2021-09-26 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420368#comment-17420368
 ] 

Charles Natali commented on MESOS-10198:


[~kiranjshetty]

I assume you've since moved on, so unless there is an update to this ticket 
soon, I will close.

Cheers,


> Mesos-master service is activating state
> 
>
> Key: MESOS-10198
> URL: https://issues.apache.org/jira/browse/MESOS-10198
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 1.9.0
>Reporter: Kiran J Shetty
>Priority: Major
>
> mesos-master service showing activating state on all 3 master node and which 
> intern making marathon to restart frequently . in logs I can see below entry.
>  Mesos-master logs:
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9 
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854 
> mesos::internal::log::Replica::Replica()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65 
> mesos::internal::log::LogProcess::LogProcess()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34 
> mesos::log::Log::Log()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a8207 
> __libc_start_main
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown)
>  Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process 
> exited, code=killed, status=6/ABRT
>  Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered 
> failed state.
>  Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed.
>  Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time 
> over, scheduling restart.
>  Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master.
>  Nov 12 08:36:49 servername systemd[1]: Started Mesos Master.
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024 
> logging.cpp:201] INFO level logging started!
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024 
> main.cpp:243] Build: 2019-10-21 12:10:14 by centos
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024 
> main.cpp:244] Version: 1.9.0
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024 
> main.cpp:247] Git tag: 1.9.0
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024 
> main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024 
> main.cpp:345] Using 'hierarchical' allocator
>  Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: 
> ./db/skiplist.h:344: void leveldb::SkipList::Insert(const 
> Key&) [with Key = const char*; Comparator = 
> leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key, 
> x->key)' failed.
>  Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 
> (unix time) try "date -d @1605150409" if you are using GNU date ***
>  Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 
> __GI_raise
>  Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) 
> received by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: ***
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown)
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6 
> __assert_fail_base
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e6252 
> __GI___assert_fail
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3dc2 
> leveldb::SkipList<>::Insert()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3735 
> leveldb::MemTable::Add()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00168 
> leveldb::WriteBatch::Iterate()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00424 
> leveldb::WriteBatchInternal::InsertInto()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5ce8575 
> leveldb::DBImpl::RecoverLogFile()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec0fc 
> leveldb::DBImpl::Recover()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec3fa 
> leveldb::DB::Open()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fd

[jira] [Commented] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2021-09-26 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420364#comment-17420364
 ] 

Charles Natali commented on MESOS-10230:


[~apeters]
Would you be able to look at this?

I think [~pengels] might be referring to 
https://github.com/apache/mesos/blob/master/src/webui/assets/libs/jquery-3.2.1.min.js

Note however that we are also using jquery1.10.1 which is also affected:
https://github.com/apache/mesos/blob/master/site/source/assets/js/jquery-1.10.1.min.js

and in mesos-site: 
https://github.com/apache/mesos-site/blob/asf-site/content/assets/js/jquery-1.10.1.min.js

I am absolutely not familiar with web development so even though I could 
probably update it I wouldn't know how to check if it broke anything.

> Please update JQuery from 3.2.1 to 3.5.0+
> -
>
> Key: MESOS-10230
> URL: https://issues.apache.org/jira/browse/MESOS-10230
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 1.11.0
>Reporter: p engels
>Priority: Minor
>
> JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
> cross-site-scripting vulnerabilities. More info can be found on JQuery's 
> website:
> blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]
> My organization's vulnerability scanner locates the out-of-date jquery at 
> this url (sanitized for security reasons):
> [http://example.com:5050/assets/libs/jquery-3.2.1.min.js]
>  
> Please remove the old version of JQuery and replace it with version 3.5.0 or 
> greater. If this is already planned for a future release, please comment on 
> this request with the version this will be fixed in.
>  
> Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10228) My current problem is that after mesos-Agent added the option to support GPU, starting Docker through Marathon cannot succeed

2021-09-26 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420361#comment-17420361
 ] 

Charles Natali commented on MESOS-10228:


Hi [~barrylee],

It's not clear to me if this is linked to the other issue you opened: 
https://issues.apache.org/jira/browse/MESOS-10227

Note that Marathon is a project distinct from Mesos, so you might want to 
report it with them (although I am not sure the project is still active).

> My current problem is that after mesos-Agent added the option to support GPU, 
> starting Docker through Marathon cannot succeed
> -
>
> Key: MESOS-10228
> URL: https://issues.apache.org/jira/browse/MESOS-10228
> Project: Mesos
>  Issue Type: Task
>  Components: agent, framework
>Affects Versions: 1.11.0
>Reporter: barry lee
>Priority: Major
> Fix For: 1.11.0
>
> Attachments: image-2021-08-19-19-22-51-456.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> My current problem is that after mesos-Agent added the option to support GPU, 
> starting Docker through Marathon cannot succeed.
> mesos-agent \
> --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos
>  \
> --log_dir=/var/log/mesos \
> --containerizers=docker,mesos \
> --executor_registration_timeout=5mins \
> --hostname=192.168.10.19 \
> --ip=192.168.10.19 \
> --port=5051 \
> --work_dir=/var/lib/mesos \
> --image_providers=docker \
> —executor_environment_variables="{}" \
> --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia"
>  
> In the MESos-Agent GPU option, this is useful when there is no GPU node.
>  
> !image-2021-08-19-19-22-51-456.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10227) After mesos-agent starts, mesos-exeute fails to be executed using the GPU

2021-09-26 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420360#comment-17420360
 ] 

Charles Natali commented on MESOS-10227:


Hi [~barrylee],

Sorry for the delay.
Is this still a problem?
The log you're providing is truncated, it would be useful to get:
- the agent logs, when the task is started
- the executor log



> After mesos-agent starts, mesos-exeute fails to be executed using the GPU
> -
>
> Key: MESOS-10227
> URL: https://issues.apache.org/jira/browse/MESOS-10227
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Affects Versions: 1.11.0
> Environment: mesos-agent \
> --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos
>  \
> --log_dir=/var/log/mesos --containerizers=docker,mesos \
> --executor_registration_timeout=5mins \
> --hostname=192.168.10.19 \
> --ip=192.168.10.19 \
> --port=5051 \
> --work_dir=/var/lib/mesos \
> --image_providers=docker \
> —executor_environment_variables="{}" \
> --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia"
>  
>  
> mesos-execute \
> --master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos
>  \
> --name=gpu-test \
> --docker_image=nvidia/cuda \
> --command="nvidia-smi" \
> --framework_capabilities="GPU_RESOURCES" \
> --resources="gpus:1"
>  
>Reporter: barry lee
>Priority: Major
> Fix For: 1.11.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I0819 18:14:26.088129 9337 containerizer.cpp:3414] Transitioning the state of 
> container fab468e6-bcbd-499c-9c24-ccd572c8317b from PROVISIONING to 
> DESTROYING after 2.207289088secs
> I0819 18:14:26.089609 9339 slave.cpp:7100] Executor 'gpu-test' of framework 
> d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 has terminated with unknown status
> I0819 18:14:26.091435 9339 slave.cpp:5981] Handling status update TASK_FAILED 
> (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of 
> framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 from @0.0.0.0:0
> E0819 18:14:26.092530 9346 slave.cpp:6357] Failed to update resources for 
> container fab468e6-bcbd-499c-9c24-ccd572c8317b of executor 'gpu-test' running 
> task gpu-test on status update for terminal task, destroying container: 
> Container not found
> W0819 18:14:26.092737 9341 composing.cpp:614] Attempted to destroy unknown 
> container fab468e6-bcbd-499c-9c24-ccd572c8317b
> I0819 18:14:26.092895 9331 task_status_update_manager.cpp:328] Received task 
> status update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) 
> for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
> I0819 18:14:26.093626 9333 slave.cpp:6527] Forwarding the update TASK_FAILED 
> (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of 
> framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 to 
> master@192.168.10.192:5050
> I0819 18:14:26.102195 9342 slave.cpp:4310] Shutting down framework 
> d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
> I0819 18:14:26.102257 9342 slave.cpp:7218] Cleaning up executor 'gpu-test' of 
> framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
> I0819 18:14:26.102448 9332 gc.cpp:95] Scheduling 
> '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test/runs/fab468e6-bcbd-499c-9c24-ccd572c8317b'
>  for gc 6.988156days in the future
> I0819 18:14:26.102600 9332 gc.cpp:95] Scheduling 
> '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test'
>  for gc 6.9881303111days in the future
> I0819 18:14:26.102725 9342 slave.cpp:7347] Cleaning up framework 
> d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
> I0819 18:14:26.102805 9335 task_status_update_manager.cpp:289] Closing task 
> status update streams for framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
> I0819 18:14:26.102901 9342 gc.cpp:95] Scheduling 
> '/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027'
>  for gc 6.9881020741days in the future
> I0819 18:14:34.385221 9334 http.cpp:1436] HTTP GET for 
> /files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._67
>  from 192.168.110.142:11640 with User-Agent='Mozill

[jira] [Created] (MESOS-10230) Please update JQuery from 3.2.1 to 3.5.0+

2021-09-16 Thread p engels (Jira)
p engels created MESOS-10230:


 Summary: Please update JQuery from 3.2.1 to 3.5.0+
 Key: MESOS-10230
 URL: https://issues.apache.org/jira/browse/MESOS-10230
 Project: Mesos
  Issue Type: Improvement
  Components: security
Affects Versions: 1.11.0
Reporter: p engels


JQuery versions between 1.2 and 3.5.0 are vulnerable to multiple 
cross-site-scripting vulnerabilities. More info can be found on JQuery's 
website:

blog.jquery.com: [https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/]

My organization's vulnerability scanner locates the out-of-date jquery at this 
url (sanitized for security reasons):

[http://example.com:5050/assets/libs/jquery-3.2.1.min.js]

 

Please remove the old version of JQuery and replace it with version 3.5.0 or 
greater. If this is already planned for a future release, please comment on 
this request with the version this will be fixed in.

 

Keep up the good work, Apache community <3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10229) [backport] Backport fixes to allow compilation of 1.11 on Ubuntu 20.04

2021-08-25 Thread Renan DelValle (Jira)
Renan DelValle created MESOS-10229:
--

 Summary: [backport] Backport fixes to allow compilation of 1.11 on 
Ubuntu 20.04
 Key: MESOS-10229
 URL: https://issues.apache.org/jira/browse/MESOS-10229
 Project: Mesos
  Issue Type: Task
Affects Versions: 1.11.0
Reporter: Renan DelValle


Two recently landed commits are necessary in order to compile Mesos 1.11 on 
Ubuntu 20.04.

 
 * 
[https://github.com/apache/mesos/commit/4ce33ca185fde0c6b258b85311fde3384e488f0d]
 * 
[https://github.com/apache/mesos/commit/7141572d64cc43d3aafe2b4f5de7492cc0803b78]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10228) My current problem is that after mesos-Agent added the option to support GPU, starting Docker through Marathon cannot succeed

2021-08-19 Thread barry lee (Jira)
barry lee created MESOS-10228:
-

 Summary: My current problem is that after mesos-Agent added the 
option to support GPU, starting Docker through Marathon cannot succeed
 Key: MESOS-10228
 URL: https://issues.apache.org/jira/browse/MESOS-10228
 Project: Mesos
  Issue Type: Task
  Components: agent, framework
Affects Versions: 1.11.0
Reporter: barry lee
 Fix For: 1.11.0


My current problem is that after mesos-Agent added the option to support GPU, 
starting Docker through Marathon cannot succeed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10227) After mesos-agent starts, mesos-exeute fails to be executed using the GPU

2021-08-19 Thread barry lee (Jira)
barry lee created MESOS-10227:
-

 Summary: After mesos-agent starts, mesos-exeute fails to be 
executed using the GPU
 Key: MESOS-10227
 URL: https://issues.apache.org/jira/browse/MESOS-10227
 Project: Mesos
  Issue Type: Task
  Components: agent
Affects Versions: 1.11.0
 Environment: mesos-agent \

--master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos 
\

--log_dir=/var/log/mesos --containerizers=docker,mesos \

--executor_registration_timeout=5mins \

--hostname=192.168.10.19 \

--ip=192.168.10.19 \

--port=5051 \

--work_dir=/var/lib/mesos \

--image_providers=docker \

—executor_environment_variables="{}" \

--isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia"

 

 

mesos-execute \

--master=zk://192.168.10.191:2181,192.168.10.192:2181,192.168.10.193:2181/mesos 
\

--name=gpu-test \

--docker_image=nvidia/cuda \

--command="nvidia-smi" \

--framework_capabilities="GPU_RESOURCES" \

--resources="gpus:1"

 
Reporter: barry lee
 Fix For: 1.11.0


I0819 18:14:26.088129 9337 containerizer.cpp:3414] Transitioning the state of 
container fab468e6-bcbd-499c-9c24-ccd572c8317b from PROVISIONING to DESTROYING 
after 2.207289088secs
I0819 18:14:26.089609 9339 slave.cpp:7100] Executor 'gpu-test' of framework 
d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 has terminated with unknown status
I0819 18:14:26.091435 9339 slave.cpp:5981] Handling status update TASK_FAILED 
(Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of 
framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 from @0.0.0.0:0
E0819 18:14:26.092530 9346 slave.cpp:6357] Failed to update resources for 
container fab468e6-bcbd-499c-9c24-ccd572c8317b of executor 'gpu-test' running 
task gpu-test on status update for terminal task, destroying container: 
Container not found
W0819 18:14:26.092737 9341 composing.cpp:614] Attempted to destroy unknown 
container fab468e6-bcbd-499c-9c24-ccd572c8317b
I0819 18:14:26.092895 9331 task_status_update_manager.cpp:328] Received task 
status update TASK_FAILED (Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) 
for task gpu-test of framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.093626 9333 slave.cpp:6527] Forwarding the update TASK_FAILED 
(Status UUID: 0abd4e4b-59a6-4610-b624-05762ab9fc17) for task gpu-test of 
framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027 to 
master@192.168.10.192:5050
I0819 18:14:26.102195 9342 slave.cpp:4310] Shutting down framework 
d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102257 9342 slave.cpp:7218] Cleaning up executor 'gpu-test' of 
framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102448 9332 gc.cpp:95] Scheduling 
'/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test/runs/fab468e6-bcbd-499c-9c24-ccd572c8317b'
 for gc 6.988156days in the future
I0819 18:14:26.102600 9332 gc.cpp:95] Scheduling 
'/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027/executors/gpu-test'
 for gc 6.9881303111days in the future
I0819 18:14:26.102725 9342 slave.cpp:7347] Cleaning up framework 
d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102805 9335 task_status_update_manager.cpp:289] Closing task 
status update streams for framework d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027
I0819 18:14:26.102901 9342 gc.cpp:95] Scheduling 
'/var/lib/mesos/slaves/d5cb56f3-1f2f-49e6-b63b-a401e445104d-S125/frameworks/d5cb56f3-1f2f-49e6-b63b-a401e445104d-0027'
 for gc 6.9881020741days in the future
I0819 18:14:34.385221 9334 http.cpp:1436] HTTP GET for 
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._67
 from 192.168.110.142:11640 with User-Agent='Mozilla/5.0 (Windows NT 10.0; 
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 
Safari/537.36'
I0819 18:14:45.385519 9344 http.cpp:1436] HTTP GET for 
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6a
 from 192.168.110.142:11690 with User-Agent='Mozilla/5.0 (Windows NT 10.0; 
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 
Safari/537.36'
I0819 18:14:56.381196 9334 http.cpp:1436] HTTP GET for 
/files/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2Fd5cb56f3-1f2f-49e6-b63b-a401e445104d-S125&jsonp=angular.callbacks._6d
 from 192.168.110.142:11716 with User-Agent='Mozilla/5.0 (Windows NT 10.0; 
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 
Safari/537.36'
I0819 18:15:07.385897 9340 http.cpp:1436] HTTP GET for 
/files

[jira] [Commented] (MESOS-10200) cmake target "install" not available in 1.10.x branch

2021-08-16 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399644#comment-17399644
 ] 

Andreas Peters commented on MESOS-10200:


I test it some min. ago, is gone with the current master.

> cmake target "install" not available in 1.10.x branch
> -
>
> Key: MESOS-10200
> URL: https://issues.apache.org/jira/browse/MESOS-10200
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.10.0
> Environment: OS: Mac OS X Catalina (10.15.7).
>Reporter: PRUDHVI RAJ MULAGAPATI
>Priority: Major
> Attachments: 10198.html
>
>
> I am trying to build mesos on Mac OS X 10.15.7 (Catalina) following the 
> official documentation. While in 1.10.x branch cmake target "install" is not 
> found. However I was able to build and install with 3.11.x and master 
> branches. Listed below are the available targets as shown by cmake --target 
> help.
>  
> cmake --build . --target install
> make: *** No rule to make target `install'. Stop.
>  
> cmake --build . --target help
> The following are some of the valid targets for this Makefile:
> ... all (the default if no target is provided)
> ... clean
> ... depend
> ... edit_cache
> ... package
> ... package_source
> ... rebuild_cache
> ... test
> ... boost-1.65.0
> ... check
> ... concurrentqueue-7b69a8f
> ... csi_v0-0.2.0
> ... csi_v1-1.1.0
> ... dist
> ... distcheck
> ... elfio-3.2
> ... glog-0.4.0
> ... googletest-1.8.0
> ... grpc-1.10.0
> ... http_parser-2.6.2
> ... leveldb-1.19
> ... libarchive-3.3.2
> ... libev-4.22
> ... make_bin_include_dir
> ... make_bin_java_dir
> ... make_bin_jni_dir
> ... make_bin_src_dir
> ... nvml-352.79
> ... picojson-1.3.0
> ... protobuf-3.5.0
> ... rapidjson-1.1.0
> ... tests
> ... zookeeper-3.4.8
> ... balloon-executor
> ... balloon-framework
> ... benchmarks
> ... disk-full-framework
> ... docker-no-executor-framework
> ... dynamic-reservation-framework
> ... example
> ... examplemodule
> ... fixed_resource_estimator
> ... inverse-offer-framework
> ... libprocess-tests
> ... load-generator-framework
> ... load_qos_controller
> ... logrotate_container_logger
> ... long-lived-executor
> ... long-lived-framework
> ... mesos
> ... mesos-agent
> ... mesos-cli
> ... mesos-cni-port-mapper
> ... mesos-containerizer
> ... mesos-default-executor
> ... mesos-docker-executor
> ... mesos-execute
> ... mesos-executor
> ... mesos-fetcher
> ... mesos-io-switchboard
> ... mesos-local
> ... mesos-log
> ... mesos-logrotate-logger
> ... mesos-master
> ... mesos-protobufs
> ... mesos-resolve
> ... mesos-tcp-connect
> ... mesos-tests
> ... mesos-usage
> ... no-executor-framework
> ... operation-feedback-framework
> ... persistent-volume-framework
> ... process
> ... stout-tests
> ... test-csi-user-framework
> ... test-executor
> ... test-framework
> ... test-helper
> ... test-http-executor
> ... test-http-framework
> ... test-linkee
> ... testallocator
> ... testanonymous
> ... testauthentication
> ... testauthorizer
> ... testcontainer_logger
> ... testhook
> ... testhttpauthenticator
> ... testisolator
> ... testmastercontender
> ... testmasterdetector
> ... testqos_controller
> ... testresource_estimator
> ... uri_disk_profile_adaptor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10219) 1.11.0 does not build on Windows

2021-08-16 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399577#comment-17399577
 ] 

Andreas Peters commented on MESOS-10219:


Hi [~acecile555] how it it going with?

> 1.11.0 does not build on Windows
> 
>
> Key: MESOS-10219
> URL: https://issues.apache.org/jira/browse/MESOS-10219
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, build, cmake
>Affects Versions: 1.11.0
>Reporter: acecile555
>Priority: Major
> Attachments: mesos_slave_windows_longpath.png, 
> patch_1.10.0_windows_build.diff
>
>
> Hello,
>  
> I just tried building Mesos 1.11.0 on Windows and this is not working.
>  
> The first issue is libarchive compilation that can be easily workarounded by 
> adding the following hunk to 3rdparty/libarchive-3.3.2.patch:
> {noformat}
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -137,7 +137,7 @@
># This is added into CMAKE_C_FLAGS when CMAKE_BUILD_TYPE is "Debug"
># Enable level 4 C4061: The enumerate has no associated handler in a switch
>#   statement.
> -  SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /we4061")
> +  #SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /we4061")
># Enable level 4 C4254: A larger bit field was assigned to a smaller bit
>#   field.
>SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /we4254")
> {noformat}
> Sadly it is failing later with issue I cannot solve myself:
> {noformat}
> C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot 
> open include file: 'csi/state.pb.h': No such file or directory (compiling 
> source file C:\Users\earthlab\mesos\src\slave\csi_server.cpp) 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
>   qos_controller.cpp
>   resource_estimator.cpp
>   slave.cpp
>   state.cpp
>   task_status_update_manager.cpp
>   sandbox.cpp
> C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot 
> open include file: 'csi/state.pb.h': No such file or directory (compiling 
> source file C:\Users\earthlab\mesos\src\slave\slave.cpp) 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
>   composing.cpp
>   isolator.cpp
> C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot 
> open include file: 'csi/state.pb.h': No such file or directory (compiling 
> source file C:\Users\earthlab\mesos\src\slave\task_status_update_manager.cpp) 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
>   isolator_tracker.cpp
>   launch.cpp
> C:\Users\earthlab\mesos\src\csi/state.hpp(22,10): fatal error C1083: Cannot 
> open include file: 'csi/state.pb.h': No such file or directory (compiling 
> source file C:\Users\earthlab\mesos\src\slave\containerizer\composing.cpp) 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
>   launcher.cpp
> C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp(524,34): 
> error C2668: 'os::spawn': ambiguous call to overloaded function 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
> C:\Users\earthlab\mesos\3rdparty\stout\include\stout/os/exec.hpp(52,20): 
> message : could be 'Option os::spawn(const std::string &,const 
> std::vector> &)' 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
>   with
>   [
>   T=int
>   ] (compiling source file 
> C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp)
> C:\Users\earthlab\mesos\3rdparty\stout\include\stout/os/windows/exec.hpp(412,20):
>  message : or   'Option os::spawn(const std::string &,const 
> std::vector> &,const 
> Option,std::allocator  std::string,std::string>>>> &)' 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
>   with
>   [
>   T=int
>   ] (compiling source file 
> C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp)
> C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp(525,75): 
> message : while trying to match the argument list '(const char [3], 
> initializer list)' [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
> C:\Users\earthlab\mesos\src\slave\containerizer\mesos\launch.cpp(893,47): 
> error C2668: 'os::spawn': ambiguous call to overloaded function 
> [C:\Users\earthlab\mesos\build\src\mesos.vcxproj]
> C:\Users\earthlab\mesos\3rdparty\stout\include\stout/os/exec.hpp(52,20): 
> message : could be 'Op

[jira] [Commented] (MESOS-10198) Mesos-master service is activating state

2021-08-07 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395290#comment-17395290
 ] 

Charles Natali commented on MESOS-10198:


Hi [~kiranjshetty], sorry for the delay, I know it's been a while.


{noformat}
Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: 
./db/skiplist.h:344: void leveldb::SkipList::Insert(const 
Key&) [with Key = const char*; Comparator = leveldb::MemTable::KeyComparator]: 
Assertion `x == __null || !Equal(key, x->key)' failed.
{noformat}


This points to a corruption of the on-disk leveldb database - it's been a long 
time, but do you remember if:
- this specific error was present in all the masters logs?
- did the hosts maybe crash prior to that?
- I guess it's too late, but it would have been interesting to see the log of 
the first time the masters crashed

Looking at our code, it's not clear to me what we could do to introduce a 
leveldb corruption - the only possibilities I can think of are a leveldb bug, 
or maybe in specific conditions some unrelated code ends up writing to the 
leveldb file descriptors, which could cause such a corruption.
But having it occur across all masters seems very unlikely.

> Mesos-master service is activating state
> 
>
> Key: MESOS-10198
> URL: https://issues.apache.org/jira/browse/MESOS-10198
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 1.9.0
>Reporter: Kiran J Shetty
>Priority: Major
>
> mesos-master service showing activating state on all 3 master node and which 
> intern making marathon to restart frequently . in logs I can see below entry.
>  Mesos-master logs:
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9 
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854 
> mesos::internal::log::Replica::Replica()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65 
> mesos::internal::log::LogProcess::LogProcess()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34 
> mesos::log::Log::Log()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a8207 
> __libc_start_main
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown)
>  Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process 
> exited, code=killed, status=6/ABRT
>  Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered 
> failed state.
>  Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed.
>  Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time 
> over, scheduling restart.
>  Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master.
>  Nov 12 08:36:49 servername systemd[1]: Started Mesos Master.
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024 
> logging.cpp:201] INFO level logging started!
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024 
> main.cpp:243] Build: 2019-10-21 12:10:14 by centos
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024 
> main.cpp:244] Version: 1.9.0
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024 
> main.cpp:247] Git tag: 1.9.0
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024 
> main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024 
> main.cpp:345] Using 'hierarchical' allocator
>  Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: 
> ./db/skiplist.h:344: void leveldb::SkipList::Insert(const 
> Key&) [with Key = const char*; Comparator = 
> leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key, 
> x->key)' failed.
>  Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 
> (unix time) try "date -d @1605150409" if you are using GNU date ***
>  Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 
> __GI_raise
>  Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) 
> received by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: ***
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown)
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6 

[jira] [Commented] (MESOS-10200) cmake target "install" not available in 1.10.x branch

2021-08-02 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391782#comment-17391782
 ] 

Charles Natali commented on MESOS-10200:


[~apeters]
It's not quite clear to me, is it still a problem in master?

> cmake target "install" not available in 1.10.x branch
> -
>
> Key: MESOS-10200
>     URL: https://issues.apache.org/jira/browse/MESOS-10200
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.10.0
> Environment: OS: Mac OS X Catalina (10.15.7).
>Reporter: PRUDHVI RAJ MULAGAPATI
>Priority: Major
> Attachments: 10198.html
>
>
> I am trying to build mesos on Mac OS X 10.15.7 (Catalina) following the 
> official documentation. While in 1.10.x branch cmake target "install" is not 
> found. However I was able to build and install with 3.11.x and master 
> branches. Listed below are the available targets as shown by cmake --target 
> help.
>  
> cmake --build . --target install
> make: *** No rule to make target `install'. Stop.
>  
> cmake --build . --target help
> The following are some of the valid targets for this Makefile:
> ... all (the default if no target is provided)
> ... clean
> ... depend
> ... edit_cache
> ... package
> ... package_source
> ... rebuild_cache
> ... test
> ... boost-1.65.0
> ... check
> ... concurrentqueue-7b69a8f
> ... csi_v0-0.2.0
> ... csi_v1-1.1.0
> ... dist
> ... distcheck
> ... elfio-3.2
> ... glog-0.4.0
> ... googletest-1.8.0
> ... grpc-1.10.0
> ... http_parser-2.6.2
> ... leveldb-1.19
> ... libarchive-3.3.2
> ... libev-4.22
> ... make_bin_include_dir
> ... make_bin_java_dir
> ... make_bin_jni_dir
> ... make_bin_src_dir
> ... nvml-352.79
> ... picojson-1.3.0
> ... protobuf-3.5.0
> ... rapidjson-1.1.0
> ... tests
> ... zookeeper-3.4.8
> ... balloon-executor
> ... balloon-framework
> ... benchmarks
> ... disk-full-framework
> ... docker-no-executor-framework
> ... dynamic-reservation-framework
> ... example
> ... examplemodule
> ... fixed_resource_estimator
> ... inverse-offer-framework
> ... libprocess-tests
> ... load-generator-framework
> ... load_qos_controller
> ... logrotate_container_logger
> ... long-lived-executor
> ... long-lived-framework
> ... mesos
> ... mesos-agent
> ... mesos-cli
> ... mesos-cni-port-mapper
> ... mesos-containerizer
> ... mesos-default-executor
> ... mesos-docker-executor
> ... mesos-execute
> ... mesos-executor
> ... mesos-fetcher
> ... mesos-io-switchboard
> ... mesos-local
> ... mesos-log
> ... mesos-logrotate-logger
> ... mesos-master
> ... mesos-protobufs
> ... mesos-resolve
> ... mesos-tcp-connect
> ... mesos-tests
> ... mesos-usage
> ... no-executor-framework
> ... operation-feedback-framework
> ... persistent-volume-framework
> ... process
> ... stout-tests
> ... test-csi-user-framework
> ... test-executor
> ... test-framework
> ... test-helper
> ... test-http-executor
> ... test-http-framework
> ... test-linkee
> ... testallocator
> ... testanonymous
> ... testauthentication
> ... testauthorizer
> ... testcontainer_logger
> ... testhook
> ... testhttpauthenticator
> ... testisolator
> ... testmastercontender
> ... testmasterdetector
> ... testqos_controller
> ... testresource_estimator
> ... uri_disk_profile_adaptor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-08-02 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391736#comment-17391736
 ] 

Charles Natali commented on MESOS-10226:


Hm, it's annoying - the gdb backtrace you posted shows that the regtest gets 
stuck in this test, but for some reason  running this test on its own isn't 
enough to reproduce it.
It's going to be very difficult to debug without being able to run them myself.

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
>     URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, 
> gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to synchronize with the task being 
> started, and if the task fails to start - in this case because we're trying 
> to launch an x86 container on an arm64 host - the test will just hang reading 
> from the pipe.
> I send Martin a tentative fix for him to test, and I'll open an MR if 
> successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390279#comment-17390279
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10226:
---

`sudo ./bin/mesos-tests.sh 
--gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* 
--verbose` didn't hang but failed with:

 
{code:java}
...
7-b49a-765ac4cd1729/backends/overlay/rootfses/ba3ccd6c-bacf-4d88-a4fc-5104ca45d19e'
 for container a6a3e1b5-4322-4b07-b49a-765ac4cd1729
I0730 05:32:29.134213 2249744 master.cpp:1149] Master terminating
I0730 05:32:29.134589 2249739 hierarchical.cpp:1232] Removed all filters for 
agent 4c43d934-41d8-4159-9b03-2dfdeee3f386-S0
I0730 05:32:29.134629 2249739 hierarchical.cpp:1108] Removed agent 
4c43d934-41d8-4159-9b03-2dfdeee3f386-S0
[  FAILED  ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/4, where 
GetParam() = "quay.io/coreos/alpine-sh" (3751 ms)
[--] 5 tests from ContainerImage/ProvisionerDockerTest (38953 ms 
total)[--] Global test environment tear-down
[==] 5 tests from 1 test case ran. (38966 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 5 tests, listed below:
[  FAILED  ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where 
GetParam() = "alpine"
[  FAILED  ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/1, where 
GetParam() = "library/alpine"
[  FAILED  ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/2, where 
GetParam() = "gcr.io/google-containers/busybox:1.24"
[  FAILED  ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/3, where 
GetParam() = "gcr.io/google-containers/busybox:1.27"
[  FAILED  ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/4, where 
GetParam() = "quay.io/coreos/alpine-sh" 5 FAILED TESTS
I0730 05:32:29.176168 2249746 process.cpp:935] Stopped the socket accept loop
 {code}
 

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, 
> gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{nofo

[jira] [Comment Edited] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390152#comment-17390152
 ] 

Charles Natali edited comment on MESOS-10226 at 7/29/21, 8:44 PM:
--

Hm, I can't reproduce it.

I updated the test to run the arm64 alpine image to cause it to fail in a 
similar way that it should be failing for you, and it's not hanging, but 
failing:



{noformat}
# ./bin/mesos-tests.sh 
--gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand*

[ RUN ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0
sh: 1: hadoop: not found
Marked '/' as rslave
I0729 21:40:16.121507 434157 exec.cpp:164] Version: 1.12.0
I0729 21:40:16.136072 434156 exec.cpp:237] Executor registered on agent 
48863f87-f283-42ab-bd93-f301fdfbd73b-S0
I0729 21:40:16.139089 434154 executor.cpp:190] Received SUBSCRIBED event
I0729 21:40:16.139974 434154 executor.cpp:194] Subscribed executor on thinkpad
I0729 21:40:16.140264 434154 executor.cpp:190] Received LAUNCH event
I0729 21:40:16.141703 434154 executor.cpp:722] Starting task 
1461a266-1ead-4bdf-9165-9c0f6c5938b8
I0729 21:40:16.147071 434154 executor.cpp:740] Forked command at 434163
Preparing rootfs at 
'/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634'
Changing root to 
/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634
Failed to execute '/bin/ls': Exec format error
I0729 21:40:16.321754 434155 executor.cpp:1041] Command exited with status 1 
(pid: 434163)
../../src/tests/containerizer/provisioner_docker_tests.cpp:785: Failure
 Expected: TASK_FINISHED
To be equal to: statusFinished->state()
 Which is: TASK_FAILED
I0729 21:40:16.333557 434157 exec.cpp:478] Executor asked to shutdown
I0729 21:40:16.334996 434158 executor.cpp:190] Received SHUTDOWN event
I0729 21:40:16.335037 434158 executor.cpp:843] Shutting down
[ FAILED ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where 
GetParam() = "arm64v8/alpine" (5851 ms)

{noformat}


 

Could you try running


{noformat}
./bin/mesos-tests.sh 
--gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* 
--verbose

{noformat}
 

And see if it hangs, and post the result?

 

Worst case we could just ignore the hang and update the test to use the arn64 
image so it passes, but I'd like to understand why it hangs.


was (Author: cf.natali):
Hm, I can't reproduce it.

I updated the test to run the arm64 alpine image to cause it to fail in a 
similar way that it should be failing for you, and it's not hanging, but 
failing:

```

# ./bin/mesos-tests.sh 
--gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand*

[ RUN ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0
sh: 1: hadoop: not found
Marked '/' as rslave
I0729 21:40:16.121507 434157 exec.cpp:164] Version: 1.12.0
I0729 21:40:16.136072 434156 exec.cpp:237] Executor registered on agent 
48863f87-f283-42ab-bd93-f301fdfbd73b-S0
I0729 21:40:16.139089 434154 executor.cpp:190] Received SUBSCRIBED event
I0729 21:40:16.139974 434154 executor.cpp:194] Subscribed executor on thinkpad
I0729 21:40:16.140264 434154 executor.cpp:190] Received LAUNCH event
I0729 21:40:16.141703 434154 executor.cpp:722] Starting task 
1461a266-1ead-4bdf-9165-9c0f6c5938b8
I0729 21:40:16.147071 434154 executor.cpp:740] Forked command at 434163
Preparing rootfs at 
'/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634'
Changing root to 
/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634
Failed to execute '/bin/ls': Exec format error
I0729 21:40:16.321754 434155 executor.cpp:1041] Command exited with status 1 
(pid: 434163)
../../src/tests/containerizer/provisioner_docker_tests.cpp:785: Failure
 Expected: TASK_FINISHED
To be equal to: statusFinished->state()
 Which is: TASK_FAILED
I0729 21:40:16.333557 434157 exec.cpp:478] Executor asked to shutdown
I0729 21:40:16.334996 434158 executor.cpp:190] Received SHUTDOWN event
I0729 21:40:16.335037 434158 executor.cpp:843] Shutting down
[ FAILED ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where 
GetParam() = "arm64v8/alpine" (5851 ms)

```

 

Could you try running

```

./bin/m

[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390152#comment-17390152
 ] 

Charles Natali commented on MESOS-10226:


Hm, I can't reproduce it.

I updated the test to run the arm64 alpine image to cause it to fail in a 
similar way that it should be failing for you, and it's not hanging, but 
failing:

```

# ./bin/mesos-tests.sh 
--gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand*

[ RUN ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0
sh: 1: hadoop: not found
Marked '/' as rslave
I0729 21:40:16.121507 434157 exec.cpp:164] Version: 1.12.0
I0729 21:40:16.136072 434156 exec.cpp:237] Executor registered on agent 
48863f87-f283-42ab-bd93-f301fdfbd73b-S0
I0729 21:40:16.139089 434154 executor.cpp:190] Received SUBSCRIBED event
I0729 21:40:16.139974 434154 executor.cpp:194] Subscribed executor on thinkpad
I0729 21:40:16.140264 434154 executor.cpp:190] Received LAUNCH event
I0729 21:40:16.141703 434154 executor.cpp:722] Starting task 
1461a266-1ead-4bdf-9165-9c0f6c5938b8
I0729 21:40:16.147071 434154 executor.cpp:740] Forked command at 434163
Preparing rootfs at 
'/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634'
Changing root to 
/tmp/ContainerImage_ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_0_GxQGxF/provisioner/containers/77c499a5-6d34-46aa-86a4-e993d53aa56a/backends/overlay/rootfses/629e6501-86d4-447e-bf17-412cd1cb6634
Failed to execute '/bin/ls': Exec format error
I0729 21:40:16.321754 434155 executor.cpp:1041] Command exited with status 1 
(pid: 434163)
../../src/tests/containerizer/provisioner_docker_tests.cpp:785: Failure
 Expected: TASK_FINISHED
To be equal to: statusFinished->state()
 Which is: TASK_FAILED
I0729 21:40:16.333557 434157 exec.cpp:478] Executor asked to shutdown
I0729 21:40:16.334996 434158 executor.cpp:190] Received SHUTDOWN event
I0729 21:40:16.335037 434158 executor.cpp:843] Shutting down
[ FAILED ] 
ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where 
GetParam() = "arm64v8/alpine" (5851 ms)

```

 

Could you try running

```

./bin/mesos-tests.sh 
--gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* 
--verbose

```

 

And see if it hangs, and post the result?

 

Worst case we could just ignore the hang and update the test to use the arn64 
image so it passes, but I'd like to understand why it hangs.

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, 
> gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B

[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390106#comment-17390106
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10226:
---

Hi [~cf.natali] !

It still hangs since 6 hours ago.

This is the new thread dump - [^gdb-thread-apply-bt-all-29.07.2021-2.txt]

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, 
> gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to synchronize with the task being 
> started, and if the task fails to start - in this case because we're trying 
> to launch an x86 container on an arm64 host - the test will just hang reading 
> from the pipe.
> I send Martin a tentative fix for him to test, and I'll open an MR if 
> successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390058#comment-17390058
 ] 

Charles Natali edited comment on MESOS-10226 at 7/29/21, 6:09 PM:
--

[~mgrigorov] Looking at the code corresponding to the backtrace, I don't think 
it should hang foreverm but only up to 10 minutes:

 
{noformat}
#13 0xb7ca1418 in AwaitAssertReady 
(expr=0xba1c1d58 "statusStarting", actual=..., duration=...) at 
../../3rdparty/libprocess/include/process/gtest.hpp:126
#14 0xb97c588c in 
mesos::internal::tests::ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_Test::TestBody
 (this=0xcd4207a0) at 
../../src/tests/containerizer/provisioner_docker_tests.cpp:782
{noformat}
 

 
{noformat}
 
AWAIT_READY_FOR(statusStarting, Minutes(10));{noformat}
 

Are you sure it was stuck indefinitely and not just taking a long time?

 

Also, it would help to have the output of running the tests with {{--verbose}}.


was (Author: cf.natali):
[~mgrigorov] Looking at the code corresponding to the backtrace, I don't think 
it should hang foreverm but only up to 10 minutes:

 
{noformat}
#13 0xb7ca1418 in AwaitAssertReady 
(expr=0xba1c1d58 "statusStarting", actual=..., duration=...) at 
../../3rdparty/libprocess/include/process/gtest.hpp:126
#14 0xb97c588c in 
mesos::internal::tests::ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_Test::TestBody
 (this=0xcd4207a0) at 
../../src/tests/containerizer/provisioner_docker_tests.cpp:782
{noformat}
 

 
{noformat}
 
AWAIT_READY_FOR(statusStarting, Minutes(10));{noformat}
 

Are you sure it was stuck indefinitely and not just taking a long time?

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
>     URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0x000

[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390058#comment-17390058
 ] 

Charles Natali commented on MESOS-10226:


[~mgrigorov] Looking at the code corresponding to the backtrace, I don't think 
it should hang foreverm but only up to 10 minutes:

 
{noformat}
#13 0xb7ca1418 in AwaitAssertReady 
(expr=0xba1c1d58 "statusStarting", actual=..., duration=...) at 
../../3rdparty/libprocess/include/process/gtest.hpp:126
#14 0xb97c588c in 
mesos::internal::tests::ProvisionerDockerTest_ROOT_INTERNET_CURL_SimpleCommand_Test::TestBody
 (this=0xcd4207a0) at 
../../src/tests/containerizer/provisioner_docker_tests.cpp:782
{noformat}
 

 
{noformat}
 
AWAIT_READY_FOR(statusStarting, Minutes(10));{noformat}
 

Are you sure it was stuck indefinitely and not just taking a long time?

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
>     URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDeb

[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390055#comment-17390055
 ] 

Charles Natali commented on MESOS-10226:


Thanks, I'll have a look - I hope there won't be too many hanging tests...

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to synchronize with the task being 
> started, and if the task fails to start - in this case because we're trying 
> to launch an x86 container on an arm64 host - the test will just hang reading 
> from the pipe.
> I send Martin a tentative fix for him to test, and I'll open an MR if 
> successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389882#comment-17389882
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10226:
---

Attached [^gdb-thread-apply-bt-all-29.07.2021.txt]

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
> Attachments: gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to synchronize with the task being 
> started, and if the task fails to start - in this case because we're trying 
> to launch an x86 container on an arm64 host - the test will just hang reading 
> from the pipe.
> I send Martin a tentative fix for him to test, and I'll open an MR if 
> successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389867#comment-17389867
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10226:
---

The test properly fails now:
 
{color:#500050}[--] Global test environment tear-down
{color}[==] 34 tests from 2 test cases ran. (66593 ms total)
[  PASSED  ] 33 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] 
NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
 
But 'sudo make check` still hangs, probably on a different test this time.
I am trying to get the backtraces with gdb but gdb also hangs ...
I'll send you the new info once I have it!
 

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to sync

[jira] [Commented] (MESOS-10226) test suite hangs on ARM64

2021-07-29 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389717#comment-17389717
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10226:
---

Hi Charles,

 

I could reproduce the issue easily with `sudo 
{color:#22}./bin/mesos-tests.sh 
--gtest_filter=*NestedMesosCon{color}{color:#22}tainerizerTest*{color}`.

Now I am re-building Mesos with your patch!

I update this ticket in half an hour or so!

> test suite hangs on ARM64
> -
>
> Key: MESOS-10226
> URL: https://issues.apache.org/jira/browse/MESOS-10226
> Project: Mesos
>  Issue Type: Bug
>Reporter: Charles Natali
>Assignee: Charles Natali
>Priority: Major
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
> #0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=, posix_mode=, prot=prot@entry=438, 
> read_write=8, is32not64=) at fileops.c:189
> #2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode= out>, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
> #4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0xd74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to synchronize with the task being 
> started, and if the task fails to start - in this case because we're trying 
> to launch an x86 container on an arm64 host - the test will just hang reading 
> from the pipe.
> I send Martin a tentative fix for him to test, and I'll open an MR if 
> successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10226) test suite hangs on ARM64

2021-07-28 Thread Charles Natali (Jira)
Charles Natali created MESOS-10226:
--

 Summary: test suite hangs on ARM64
 Key: MESOS-10226
 URL: https://issues.apache.org/jira/browse/MESOS-10226
 Project: Mesos
  Issue Type: Bug
Reporter: Charles Natali
Assignee: Charles Natali


Reported by [~mgrigorov].

 
{noformat}
[ RUN      ] 
NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
sh: 1: hadoop: not found
Marked '/' as rslave
I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
9076f44b-846d-4f00-a2dc-11f694cc1900-S0
I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
martin-arm64
I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
Preparing rootfs at 
'/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
Changing root to 
/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
Failed to execute 'sh': Exec format error
I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
(pid: 38)
../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:: Failure
Mock function called more times than expected - returning directly.
    Function call: statusUpdate(0xc28527f0, @0xa2cf3a60 136-byte object 
<08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 00-00 A8-F6 
C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 A0-F1 05-94 
FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 20-BD 01-78 
FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 03-00 00-00>)
         Expected: to be called twice
           Actual: called 3 times - over-saturated and active
I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
loop{noformat}
 

I asked him to provide a gdb traceback and we can see the following:

 
{noformat}


Thread 1 (Thread 0xa3bc2c60 (LWP 173475)):
#0 0xa518db20 in __libc_open64 (file=0xaaab00f342e0 "/tmp/7VXP3w/pipe", 
oflag=) at ../sysdeps/unix/sysv/linux/open64.c:48
#1 0xa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
filename=, posix_mode=, prot=prot@entry=438, 
read_write=8, is32not64=) at fileops.c:189
#2 0xa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=, mode@entry=0xd762f3c8 "r", is32not64=is32not64@e
ntry=1) at fileops.c:281 
#3 0xa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
"/tmp/7VXP3w/pipe", mode=0xd762f3c8 "r", is32=1) at iofopen.c:75
#4 0xd54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
../../3rdparty/stout/include/stout/os/read.hpp:136
#5 0xd74f1c1c in 
mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
 (this=0xaaab00f88f50) at ../../src/tests/containeri
zer/nested_mesos_containerizer_tests.cpp:1126
{noformat}
 

 

Basically the test uses a named pipe to synchronize with the task being 
started, and if the task fails to start - in this case because we're trying to 
launch an x86 container on an arm64 host - the test will just hang reading from 
the pipe.

I send Martin a tentative fix for him to test, and I'll open an MR if 
successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9352) Data in persistent volume deleted accidentally when using Docker container and Persistent volume

2021-07-20 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384479#comment-17384479
 ] 

Charles Natali commented on MESOS-9352:
---

If it's fixed feel free to close!

> Data in persistent volume deleted accidentally when using Docker container 
> and Persistent volume
> 
>
> Key: MESOS-9352
> URL: https://issues.apache.org/jira/browse/MESOS-9352
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker
>Affects Versions: 1.5.1, 1.5.2
> Environment: DCOS 1.11.6
> Mesos 1.5.2
>Reporter: David Ko
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: dcos, dcos-1.11.6, mesosphere, persistent-volumes
> Attachments: image-2018-10-24-22-20-51-059.png, 
> image-2018-10-24-22-21-13-399.png
>
>
> Using docker image w/ persistent volume to start a service, it will cause 
> data in persistent volume deleted accidentally when task killed and 
> restarted, also old mount points not unmounted, even the service already 
> deleted. 
> *The expected result should be data in persistent volume kept until task 
> deleted completely, also dangling mount points should be unmounted correctly.*
>  
> *Step 1:* Use below JSON config to create a Mysql server using Docker image 
> and Persistent Volume
> {code:javascript}
> {
>   "env": {
> "MYSQL_USER": "wordpress",
> "MYSQL_PASSWORD": "secret",
> "MYSQL_ROOT_PASSWORD": "supersecret",
> "MYSQL_DATABASE": "wordpress"
>   },
>   "id": "/mysqlgc",
>   "backoffFactor": 1.15,
>   "backoffSeconds": 1,
>   "constraints": [
> [
>   "hostname",
>   "IS",
>   "172.27.12.216"
> ]
>   ],
>   "container": {
> "portMappings": [
>   {
> "containerPort": 3306,
> "hostPort": 0,
> "protocol": "tcp",
> "servicePort": 1
>   }
> ],
> "type": "DOCKER",
> "volumes": [
>   {
> "persistent": {
>   "type": "root",
>   "size": 1000,
>   "constraints": []
> },
> "mode": "RW",
> "containerPath": "mysqldata"
>   },
>   {
> "containerPath": "/var/lib/mysql",
> "hostPath": "mysqldata",
> "mode": "RW"
>   }
> ],
> "docker": {
>   "image": "mysql",
>   "forcePullImage": false,
>   "privileged": false,
>   "parameters": []
> }
>   },
>   "cpus": 1,
>   "disk": 0,
>   "instances": 1,
>   "maxLaunchDelaySeconds": 3600,
>   "mem": 512,
>   "gpus": 0,
>   "networks": [
> {
>   "mode": "container/bridge"
> }
>   ],
>   "residency": {
> "relaunchEscalationTimeoutSeconds": 3600,
> "taskLostBehavior": "WAIT_FOREVER"
>   },
>   "requirePorts": false,
>   "upgradeStrategy": {
> "maximumOverCapacity": 0,
> "minimumHealthCapacity": 0
>   },
>   "killSelection": "YOUNGEST_FIRST",
>   "unreachableStrategy": "disabled",
>   "healthChecks": [],
>   "fetch": []
> }
> {code}
> *Step 2:* Kill mysqld process to force rescheduling new Mysql task, but found 
> 2 mount points to the same persistent volume, it means old mount point did 
> not be unmounted immediately.
> !image-2018-10-24-22-20-51-059.png!
> *Step 3:* After GC, data in persistent volume was deleted accidentally, but 
> mysqld (Mesos task) still running
> !image-2018-10-24-22-21-13-399.png!
> *Step 4:* Delete Mysql service from Marathon, all mount points unable to 
> unmount, even the service already deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9352) Data in persistent volume deleted accidentally when using Docker container and Persistent volume

2021-07-20 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384020#comment-17384020
 ] 

Andreas Peters commented on MESOS-9352:
---

Is this still a issue in the current mesos version? I tried to reproduce it in 
1.11.0 but it's working as expected.

> Data in persistent volume deleted accidentally when using Docker container 
> and Persistent volume
> 
>
> Key: MESOS-9352
> URL: https://issues.apache.org/jira/browse/MESOS-9352
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker
>Affects Versions: 1.5.1, 1.5.2
> Environment: DCOS 1.11.6
> Mesos 1.5.2
>Reporter: David Ko
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: dcos, dcos-1.11.6, mesosphere, persistent-volumes
> Attachments: image-2018-10-24-22-20-51-059.png, 
> image-2018-10-24-22-21-13-399.png
>
>
> Using docker image w/ persistent volume to start a service, it will cause 
> data in persistent volume deleted accidentally when task killed and 
> restarted, also old mount points not unmounted, even the service already 
> deleted. 
> *The expected result should be data in persistent volume kept until task 
> deleted completely, also dangling mount points should be unmounted correctly.*
>  
> *Step 1:* Use below JSON config to create a Mysql server using Docker image 
> and Persistent Volume
> {code:javascript}
> {
>   "env": {
> "MYSQL_USER": "wordpress",
> "MYSQL_PASSWORD": "secret",
> "MYSQL_ROOT_PASSWORD": "supersecret",
> "MYSQL_DATABASE": "wordpress"
>   },
>   "id": "/mysqlgc",
>   "backoffFactor": 1.15,
>   "backoffSeconds": 1,
>   "constraints": [
> [
>   "hostname",
>   "IS",
>   "172.27.12.216"
> ]
>   ],
>   "container": {
> "portMappings": [
>   {
> "containerPort": 3306,
> "hostPort": 0,
> "protocol": "tcp",
> "servicePort": 1
>   }
> ],
> "type": "DOCKER",
> "volumes": [
>   {
> "persistent": {
>   "type": "root",
>   "size": 1000,
>   "constraints": []
> },
> "mode": "RW",
> "containerPath": "mysqldata"
>   },
>   {
> "containerPath": "/var/lib/mysql",
> "hostPath": "mysqldata",
> "mode": "RW"
>   }
> ],
> "docker": {
>   "image": "mysql",
>   "forcePullImage": false,
>   "privileged": false,
>   "parameters": []
> }
>   },
>   "cpus": 1,
>   "disk": 0,
>   "instances": 1,
>   "maxLaunchDelaySeconds": 3600,
>   "mem": 512,
>   "gpus": 0,
>   "networks": [
> {
>   "mode": "container/bridge"
> }
>   ],
>   "residency": {
> "relaunchEscalationTimeoutSeconds": 3600,
> "taskLostBehavior": "WAIT_FOREVER"
>   },
>   "requirePorts": false,
>   "upgradeStrategy": {
> "maximumOverCapacity": 0,
> "minimumHealthCapacity": 0
>   },
>   "killSelection": "YOUNGEST_FIRST",
>   "unreachableStrategy": "disabled",
>   "healthChecks": [],
>   "fetch": []
> }
> {code}
> *Step 2:* Kill mysqld process to force rescheduling new Mysql task, but found 
> 2 mount points to the same persistent volume, it means old mount point did 
> not be unmounted immediately.
> !image-2018-10-24-22-20-51-059.png!
> *Step 3:* After GC, data in persistent volume was deleted accidentally, but 
> mysqld (Mesos task) still running
> !image-2018-10-24-22-21-13-399.png!
> *Step 4:* Delete Mysql service from Marathon, all mount points unable to 
> unmount, even the service already deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-6285) Agents may OOM during recovery if there are too many tasks or executors

2021-07-20 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383897#comment-17383897
 ] 

Andreas Peters commented on MESOS-6285:
---

Is this still a issue or can we close it? :)

> Agents may OOM during recovery if there are too many tasks or executors
> ---
>
> Key: MESOS-6285
> URL: https://issues.apache.org/jira/browse/MESOS-6285
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Joseph Wu
>Priority: Critical
>  Labels: foundations, mesosphere
>
> On an test cluster, we encountered a degenerate case where running the 
> example {{long-lived-framework}} for over a week would render the agent 
> un-recoverable.  
> The {{long-lived-framework}} creates one custom {{long-lived-executor}} and 
> launches a single task on that executor every time it receives an offer from 
> that agent.  Over a week's worth of time, the framework manages to launch 
> some 400k tasks (short sleeps) on one executor.  During runtime, this is not 
> problematic, as each completed task is quickly rotated out of the agent's 
> memory (and checkpointed to disk).
> During recovery, however, the agent reads every single task into memory, 
> which leads to slow recovery; and often results in the agent being OOM-killed 
> before it finishes recovering.
> To repro this condition quickly:
> 1) Apply this patch to the {{long-lived-framework}}:
> {code}
> diff --git a/src/examples/long_lived_framework.cpp 
> b/src/examples/long_lived_framework.cpp
> index 7c57eb5..1263d82 100644
> --- a/src/examples/long_lived_framework.cpp
> +++ b/src/examples/long_lived_framework.cpp
> @@ -358,16 +358,6 @@ private:
>// Helper to launch a task using an offer.
>void launch(const Offer& offer)
>{
> -int taskId = tasksLaunched++;
> -++metrics.tasks_launched;
> -
> -TaskInfo task;
> -task.set_name("Task " + stringify(taskId));
> -task.mutable_task_id()->set_value(stringify(taskId));
> -task.mutable_agent_id()->MergeFrom(offer.agent_id());
> -task.mutable_resources()->CopyFrom(taskResources);
> -task.mutable_executor()->CopyFrom(executor);
> -
>  Call call;
>  call.set_type(Call::ACCEPT);
>  
> @@ -380,7 +370,23 @@ private:
>  Offer::Operation* operation = accept->add_operations();
>  operation->set_type(Offer::Operation::LAUNCH);
>  
> -operation->mutable_launch()->add_task_infos()->CopyFrom(task);
> +// Launch as many tasks as possible in the given offer.
> +Resources remaining = Resources(offer.resources()).flatten();
> +while (remaining.contains(taskResources)) {
> +  int taskId = tasksLaunched++;
> +  ++metrics.tasks_launched;
> +
> +  TaskInfo task;
> +  task.set_name("Task " + stringify(taskId));
> +  task.mutable_task_id()->set_value(stringify(taskId));
> +  task.mutable_agent_id()->MergeFrom(offer.agent_id());
> +  task.mutable_resources()->CopyFrom(taskResources);
> +  task.mutable_executor()->CopyFrom(executor);
> +
> +  operation->mutable_launch()->add_task_infos()->CopyFrom(task);
> +
> +  remaining -= taskResources;
> +}
>  
>  mesos->send(call);
>}
> {code}
> 2) Run a master, agent, and {{long-lived-framework}}.  On a 1 CPU, 1 GB agent 
> + this patch, it should take about 10 minutes to build up sufficient task 
> launches.
> 3) Restart the agent and watch it flail during recovery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-8679) If the first KILL stuck in the default executor, all other KILLs will be ignored.

2021-07-20 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383894#comment-17383894
 ] 

Andreas Peters commented on MESOS-8679:
---

Is this still a issue or can we close it? :)

> If the first KILL stuck in the default executor, all other KILLs will be 
> ignored.
> -
>
> Key: MESOS-8679
> URL: https://issues.apache.org/jira/browse/MESOS-8679
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Gilbert Song
>Priority: Critical
>  Labels: default-executor, foundations
>
> If the first {{KILL}} call gets stuck in the default executor, all other 
> {{KILL}} requests will be ignored. It would make a particular task become 
> unkillable (stuck in {{TASK_KILLING}) forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-8608) RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.

2021-07-20 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383887#comment-17383887
 ] 

Andreas Peters commented on MESOS-8608:
---

Is this still a issue or can we close it? :)

> RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
> -
>
> Key: MESOS-8608
> URL: https://issues.apache.org/jira/browse/MESOS-8608
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Affects Versions: 1.4.1, 1.8.0
> Environment: Docker 17.12.0
> Ubuntu 16.04, Ubuntu 18.04
>Reporter: Pierre-Louis Chevallier
>Priority: Critical
>  Labels: flaky-test, foundations, newbie, test
>
> I'm trying to run mesos on docker and when i "make check", i have 1 test that 
> is failed, i followed all the requirements & instructions on mesos getting 
> started guide. The Failed test say 
> RmDirContinueOnErrorTest.RemoveWithContinuedOnError 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.

2021-07-01 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372510#comment-17372510
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10223:
---

Running it! I will attach the logs once it finishes!

> Crashes on ARM64 due to bad interaction of libunwind with libgcc. 
> --
>
> Key: MESOS-10223
> URL: https://issues.apache.org/jira/browse/MESOS-10223
> Project: Mesos
>  Issue Type: Bug
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Charles Natali
>Priority: Major
> Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, 
> mesos-on-arm64.tgz, sudo_make_check_output.txt
>
>
> Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors:
>  
> {code:java}
>  [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.NumberFormat
> [   OK ] JsonTest.NumberFormat (0 ms)
> [ RUN  ] JsonTest.Find
> terminate called after throwing an instance of 
> 'boost::exception_detail::clone_impl
>  >'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from 
> PID 2317; stack trace: ***
> @ 0xa80e77fc ([vdso]+0x7fb)
> @ 0xa7b71188 gsignal
> @ 0xa7b5ddac abort
> @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d715b0 __cxa_rethrow
> @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d71544 __cxa_throw
> @ 0xab4ee114 boost::throw_exception<>()
> @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>()
> @ 0xab5c2228 boost::lexical_cast<>()
> @ 0xab5bf89c numify<>()
> @ 0xab5e00e8 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5cdd2c JsonTest_Find_Test::TestBody()
> @ 0xab886fec 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87f1d4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab85a9d0 testing::Test::Run()
> @ 0xab85b258 testing::TestInfo::Run()
> @ 0xab85b8d0 testing::TestCase::Run()
> @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests()
> @ 0xab888440 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87ffd4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab86100c testing::UnitTest::Run()
> @ 0xab630950 RUN_ALL_TESTS()
> @ 0xab630418 main
> @ 0xa7b5e110 __libc_start_main
> @ 0xab4b41d4 (unknown)
> [FAIL]: 8 shard(s) have failed tests
> make[6]: *** [Makefile:2092: check-local] Error 8
> make[6]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[5]: *** [Makefile:1840: check-am] Error 2
> make[5]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[4]: *** [Makefile:1685: check-recursive] Error 1
> make[4]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[3]: *** [Makefile:1842: check] Error 2
> make[3]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[2]: *** [Makefile:1153: check-recursive] Error 1
> make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make[1]: *** [Makefile:1306: check] Error 2
> make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make: *** [Makefile:785: check-recursive] Error 1
> {code}
>  
> {code:java}
> [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.InvalidUTF8
> [   OK ] JsonTest.InvalidUTF8 (0 ms)
> [ RUN  ] JsonTest.ParseError
> terminate called after throwing an instance of 'std::overflow_error'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIG

[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.

2021-07-01 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372475#comment-17372475
 ] 

Charles Natali commented on MESOS-10223:


It must be a different issue then.

 

Could you run

 
{noformat}
# ./bin/mesos-tests.sh --verbose > mesos-tests.log 2>&1{noformat}
And post the result?

> Crashes on ARM64 due to bad interaction of libunwind with libgcc. 
> --
>
> Key: MESOS-10223
> URL: https://issues.apache.org/jira/browse/MESOS-10223
> Project: Mesos
>  Issue Type: Bug
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Charles Natali
>Priority: Major
> Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, 
> mesos-on-arm64.tgz, sudo_make_check_output.txt
>
>
> Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors:
>  
> {code:java}
>  [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.NumberFormat
> [   OK ] JsonTest.NumberFormat (0 ms)
> [ RUN  ] JsonTest.Find
> terminate called after throwing an instance of 
> 'boost::exception_detail::clone_impl
>  >'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from 
> PID 2317; stack trace: ***
> @ 0xa80e77fc ([vdso]+0x7fb)
> @ 0xa7b71188 gsignal
> @ 0xa7b5ddac abort
> @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d715b0 __cxa_rethrow
> @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d71544 __cxa_throw
> @ 0xab4ee114 boost::throw_exception<>()
> @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>()
> @ 0xab5c2228 boost::lexical_cast<>()
> @ 0xab5bf89c numify<>()
> @ 0xab5e00e8 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5cdd2c JsonTest_Find_Test::TestBody()
> @ 0xab886fec 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87f1d4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab85a9d0 testing::Test::Run()
> @ 0xab85b258 testing::TestInfo::Run()
> @ 0xab85b8d0 testing::TestCase::Run()
> @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests()
> @ 0xab888440 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87ffd4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab86100c testing::UnitTest::Run()
> @ 0xab630950 RUN_ALL_TESTS()
> @ 0xab630418 main
> @ 0xa7b5e110 __libc_start_main
> @ 0xab4b41d4 (unknown)
> [FAIL]: 8 shard(s) have failed tests
> make[6]: *** [Makefile:2092: check-local] Error 8
> make[6]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[5]: *** [Makefile:1840: check-am] Error 2
> make[5]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[4]: *** [Makefile:1685: check-recursive] Error 1
> make[4]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[3]: *** [Makefile:1842: check] Error 2
> make[3]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[2]: *** [Makefile:1153: check-recursive] Error 1
> make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make[1]: *** [Makefile:1306: check] Error 2
> make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make: *** [Makefile:785: check-recursive] Error 1
> {code}
>  
> {code:java}
> [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.InvalidUTF8
> [   OK ] JsonTest.InvalidUTF8 (0 ms)
> [ RUN  ] JsonTest.ParseError
> terminate called after throwing an instance of 'std::overflow_error'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 

[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.

2021-06-30 Thread Martin Tzvetanov Grigorov (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372398#comment-17372398
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10223:
---

[~cf.natali] *sudo make check* still hangs on the same step here.

HEAD is:

commit 72f16f68973bf7d2ce5c621539a21fc4eccfa56e

Author: Charles-Francois Natali 

Date: Sat Jun 26 19:04:33 2021 +0100

 

Fixed a bug where timers wouldn't expire after `process:reinitialize`.

> Crashes on ARM64 due to bad interaction of libunwind with libgcc. 
> --
>
> Key: MESOS-10223
> URL: https://issues.apache.org/jira/browse/MESOS-10223
> Project: Mesos
>  Issue Type: Bug
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Charles Natali
>Priority: Major
> Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, 
> mesos-on-arm64.tgz, sudo_make_check_output.txt
>
>
> Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors:
>  
> {code:java}
>  [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.NumberFormat
> [   OK ] JsonTest.NumberFormat (0 ms)
> [ RUN  ] JsonTest.Find
> terminate called after throwing an instance of 
> 'boost::exception_detail::clone_impl
>  >'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from 
> PID 2317; stack trace: ***
> @ 0xa80e77fc ([vdso]+0x7fb)
> @ 0xa7b71188 gsignal
> @ 0xa7b5ddac abort
> @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d715b0 __cxa_rethrow
> @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d71544 __cxa_throw
> @ 0xab4ee114 boost::throw_exception<>()
> @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>()
> @ 0xab5c2228 boost::lexical_cast<>()
> @ 0xab5bf89c numify<>()
> @ 0xab5e00e8 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5cdd2c JsonTest_Find_Test::TestBody()
> @ 0xab886fec 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87f1d4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab85a9d0 testing::Test::Run()
> @ 0xab85b258 testing::TestInfo::Run()
> @ 0xab85b8d0 testing::TestCase::Run()
> @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests()
> @ 0xab888440 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87ffd4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab86100c testing::UnitTest::Run()
> @ 0xab630950 RUN_ALL_TESTS()
> @ 0xab630418 main
> @ 0xa7b5e110 __libc_start_main
> @ 0xab4b41d4 (unknown)
> [FAIL]: 8 shard(s) have failed tests
> make[6]: *** [Makefile:2092: check-local] Error 8
> make[6]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[5]: *** [Makefile:1840: check-am] Error 2
> make[5]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[4]: *** [Makefile:1685: check-recursive] Error 1
> make[4]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[3]: *** [Makefile:1842: check] Error 2
> make[3]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[2]: *** [Makefile:1153: check-recursive] Error 1
> make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make[1]: *** [Makefile:1306: check] Error 2
> make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make: *** [Makefile:785: check-recursive] Error 1
> {code}
>  
> {code:java}
> [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.InvalidUTF8
> [   OK ] JsonTest.InvalidUTF8 (0 ms)
> [ RUN  ] JsonTest.ParseError
> terminate called after throwing an instance of 'std

[jira] [Commented] (MESOS-10223) Crashes on ARM64 due to bad interaction of libunwind with libgcc.

2021-06-29 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371617#comment-17371617
 ] 

Charles Natali commented on MESOS-10223:


[~mgrigorov]

The hang should be fixed in master - it'd be great if you could give it a try.


> Crashes on ARM64 due to bad interaction of libunwind with libgcc. 
> --
>
> Key: MESOS-10223
> URL: https://issues.apache.org/jira/browse/MESOS-10223
> Project: Mesos
>  Issue Type: Bug
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Charles Natali
>Priority: Major
> Attachments: 0001-Fixed-crashes-on-ARM64-due-to-libunwind.patch, 
> mesos-on-arm64.tgz, sudo_make_check_output.txt
>
>
> Running `make check` on Ubuntu 20.04.2 aarch64 fails with such errors:
>  
> {code:java}
>  [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.NumberFormat
> [   OK ] JsonTest.NumberFormat (0 ms)
> [ RUN  ] JsonTest.Find
> terminate called after throwing an instance of 
> 'boost::exception_detail::clone_impl
>  >'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGABRT (@0x3e8090d) received by PID 2317 (TID 0xa80d9010) from 
> PID 2317; stack trace: ***
> @ 0xa80e77fc ([vdso]+0x7fb)
> @ 0xa7b71188 gsignal
> @ 0xa7b5ddac abort
> @ 0xa7d73848 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d715b0 __cxa_rethrow
> @ 0xa7d737e4 __gnu_cxx::__verbose_terminate_handler()
> @ 0xa7d711ec (unknown)
> @ 0xa7d71250 std::terminate()
> @ 0xa7d71544 __cxa_throw
> @ 0xab4ee114 boost::throw_exception<>()
> @ 0xab5c512c boost::conversion::detail::throw_bad_cast<>()
> @ 0xab5c2228 boost::lexical_cast<>()
> @ 0xab5bf89c numify<>()
> @ 0xab5e00e8 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5e0584 JSON::Object::find<>()
> @ 0xab5cdd2c JsonTest_Find_Test::TestBody()
> @ 0xab886fec 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87f1d4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab85a9d0 testing::Test::Run()
> @ 0xab85b258 testing::TestInfo::Run()
> @ 0xab85b8d0 testing::TestCase::Run()
> @ 0xab862344 testing::internal::UnitTestImpl::RunAllTests()
> @ 0xab888440 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0xab87ffd4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0xab86100c testing::UnitTest::Run()
> @ 0xab630950 RUN_ALL_TESTS()
> @ 0xab630418 main
> @ 0xa7b5e110 __libc_start_main
> @ 0xab4b41d4 (unknown)
> [FAIL]: 8 shard(s) have failed tests
> make[6]: *** [Makefile:2092: check-local] Error 8
> make[6]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[5]: *** [Makefile:1840: check-am] Error 2
> make[5]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[4]: *** [Makefile:1685: check-recursive] Error 1
> make[4]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[3]: *** [Makefile:1842: check] Error 2
> make[3]: Leaving directory 
> '/home/ubuntu/git/apache/mesos/build/3rdparty/stout'
> make[2]: *** [Makefile:1153: check-recursive] Error 1
> make[2]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make[1]: *** [Makefile:1306: check] Error 2
> make[1]: Leaving directory '/home/ubuntu/git/apache/mesos/build/3rdparty'
> make: *** [Makefile:785: check-recursive] Error 1
> {code}
>  
> {code:java}
> [--] 3 tests from JsonTest
> [ RUN  ] JsonTest.InvalidUTF8
> [   OK ] JsonTest.InvalidUTF8 (0 ms)
> [ RUN  ] JsonTest.ParseError
> terminate called after throwing an instance of 'std::overflow_error'
> terminate called recursively
> *** Aborted at 1622796321 (unix time) try "date -d @1622796321" if you are 
> using GNU date ***
> PC: @

[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-29 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371513#comment-17371513
 ] 

Andreas Peters commented on MESOS-10225:


I open a PR: https://github.com/apache/mesos/pull/398

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-28 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371166#comment-17371166
 ] 

Andreas Peters commented on MESOS-10225:


I think so too. Is more visible. Then I will create a section. :)

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-28 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370808#comment-17370808
 ] 

Charles Natali commented on MESOS-10225:


Good question - I think having a dedicated section might be better, maybe 
"Interaction with systemd" or something like that?

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-28 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370464#comment-17370464
 ] 

Andreas Peters commented on MESOS-10225:


Where would be the reference more visible. As notice 
"--[no]-cgroups_cpu_enable_pids_and_tids_count" or maybe it would be better to 
create a sub header like "Notice" where we add these kind of information?

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-27 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370208#comment-17370208
 ] 

Charles Natali commented on MESOS-10225:


Thanks Andreas, that'd be great - hopefully will avoid some surprises to users.

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-26 Thread Andreas Peters (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369850#comment-17369850
 ] 

Andreas Peters commented on MESOS-10225:


I will do it. I can also add the "delegate" into the systemd scripts of the 
mesos-agent.

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-26 Thread Andreas Peters (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Peters reassigned MESOS-10225:
--

Assignee: Andreas Peters

> mention that systemd agent unit should have Delegate=yes
> 
>
> Key: MESOS-10225
> URL: https://issues.apache.org/jira/browse/MESOS-10225
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Charles Natali
>Assignee: Andreas Peters
>Priority: Major
>
> If managed by systemd, the agent unit should have 
> [Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
>  to prevent systemd from manipulating cgroups created by the agent, which can 
> break things quite badly.
> See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
> https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
> causes.
> I think it's quite important and should figure in good place in the 
> documentation, maybe in the agent configuration page 
> [http://mesos.apache.org/documentation/latest/configuration/agent/] ?
>  
> [~surahman] or [~apeters] if either one of you wants to have a look at it, I 
> think it's important that at least someone is familiar with the documentation 
> part.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-9950) memory cgroup gone before isolator cleaning up

2021-06-25 Thread Subhajit Palit (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369759#comment-17369759
 ] 

Subhajit Palit commented on MESOS-9950:
---

Thanks for that option [~cf.natali] - I will try it out further and share my 
observation.

> memory cgroup gone before isolator cleaning up
> --
>
> Key: MESOS-9950
> URL: https://issues.apache.org/jira/browse/MESOS-9950
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: longfei
>Priority: Major
>
> The memcg created by mesos may have been deleted before cgroup/memory 
> isolator cleaning up.
> This would let the termination fail and lose information in the old 
> termination(before fail). 
> {code:java}
> I0821 15:16:03.025796 3354800 paths.cpp:745] Creating sandbox 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
>  for user 'tiger'
> I0821 15:16:03.026199 3354800 paths.cpp:748] Creating sandbox 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
> I0821 15:16:03.026304 3354800 slave.cpp:9064] Launching executor 
> 'mt:z03584687:1' of framework 
> 8e4967e5-736e-4a22-90c3-7b32d526914d- with resources 
> [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}]
>  in work directory 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89'
> I0821 15:16:03.051795 3354800 slave.cpp:3520] Launching container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 for executor 
> 'mt:z03584687:1' of framework 
> 8e4967e5-736e-4a22-90c3-7b32d526914d-
> I0821 15:16:03.076608 3354807 containerizer.cpp:1325] Starting container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.076911 3354807 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PROVISIONING to 
> PREPARING
> I0821 15:16:03.077906 3354802 memory.cpp:478] Started listening for OOM 
> events for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.079540 3354804 memory.cpp:198] Updated 
> 'memory.soft_limit_in_bytes' to 4032MB for container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.079587 3354820 cpu.cpp:92] Updated 'cpu.shares' to 1126 (cpus 
> 1.1) for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.079589 3354804 memory.cpp:227] Updated 'memory.limit_in_bytes' 
> to 4032MB for container a0706ca0-fe2c-4477-8161-329b26ea5d89
> I0821 15:16:03.080901 3354802 switchboard.cpp:316] Container logger module 
> finished preparing container a0706ca0-fe2c-4477-8161-329b26ea5d89; 
> IOSwitchboard server is not required
> I0821 15:16:03.081593 3354801 linux_launcher.cpp:492] Launching container 
> a0706ca0-fe2c-4477-8161-329b26ea5d89 and cloning with namespaces
> I0821 15:16:03.083823 3354808 containerizer.cpp:2107] Checkpointing 
> container's forked pid 1857418 to 
> '/opt/tiger/mesos_deploy_videoarch/mesos_zeus/slave/meta/slaves/fb5c1a5b-e106-47c1-9fe3-6ebd311b30ee-S628/frameworks/8e4967e5-736e-4a22-90c3-7b32d526914d-/executors/mt:z03584687:1/runs/a0706ca0-fe2c-4477-8161-329b26ea5d89/pids/forked.pid'
> I0821 15:16:03.084156 3354808 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from PREPARING to ISOLATING
> I0821 15:16:03.091468 3354808 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from ISOLATING to FETCHING
> I0821 15:16:03.094933 3354808 containerizer.cpp:3185] Transitioning the state 
> of container a0706ca0-fe2c-4477-8161-329b26ea5d89 from FETCHING to RUNNING
> I0821 15:16:03.197753 3354808 memory.cpp:198] Updated 
> 'memory.soft_limit_in_bytes' to 4032MB for container 
>

[jira] [Created] (MESOS-10225) mention that systemd agent unit should have Delegate=yes

2021-06-25 Thread Charles Natali (Jira)
Charles Natali created MESOS-10225:
--

 Summary: mention that systemd agent unit should have Delegate=yes
 Key: MESOS-10225
 URL: https://issues.apache.org/jira/browse/MESOS-10225
 Project: Mesos
  Issue Type: Documentation
Reporter: Charles Natali


If managed by systemd, the agent unit should have 
[Delegate=yes|https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=]
 to prevent systemd from manipulating cgroups created by the agent, which can 
break things quite badly.

See for example https://issues.apache.org/jira/browse/MESOS-3488 and 
https://issues.apache.org/jira/browse/MESOS-3009 for the kind of problems it 
causes.


I think it's quite important and should figure in good place in the 
documentation, maybe in the agent configuration page 
[http://mesos.apache.org/documentation/latest/configuration/agent/] ?

 

[~surahman] or [~apeters] if either one of you wants to have a look at it, I 
think it's important that at least someone is familiar with the documentation 
part.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-10129) Build fails on Maven javadoc generation when using JDK11

2021-06-24 Thread Saad Ur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369026#comment-17369026
 ] 

Saad Ur Rahman edited comment on MESOS-10129 at 6/24/21, 6:46 PM:
--

I am on it. I just ran a clean build of Mesos from the updated _upstream:main_ 
without issues.

*OS:* Ubuntu 21.04

*Javac:* openjdk-11-jdk-headless:amd64: 
/usr/lib/jvm/java-11-openjdk-amd64/bin/javac

*Java:* openjdk-11-jre-headless:amd64: 
/usr/lib/jvm/java-11-openjdk-amd64/bin/java

Sorry, I am still a bit of a newbie with Mesos, is there a specific build 
command I should run to try and replicate this? It might be a Debian-specific 
issue.


was (Author: surahman):
I am on it.

> Build fails on Maven javadoc generation when using JDK11
> 
>
> Key: MESOS-10129
> URL: https://issues.apache.org/jira/browse/MESOS-10129
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: master, 1.10.0
> Environment: Debian 10 Buster (2020-04-29) with OpenJdk 11.0.7 
> (2020-04-14)
>Reporter: Carlos Saltos
>Priority: Major
>  Labels: Java11, beginner, build, java11, jdk11
> Attachments: mesos.10.0.maven.javadoc.fix.patch
>
>
> h3. CURRENT BEHAVIOR:
> When using Java 11 (or newer versions) the Javadoc generation step fails with 
> the error:
> {{[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar 
> (build-and-attach-javadocs) on project mesos: MavenReportException: Error 
> while creating archive:}}
> {{[ERROR] Exit code: 1 - javadoc: error - The code being documented uses 
> modules but the packages defined in 
> http://download.oracle.com/javase/6/docs/api/ are in the unnamed module.}}
> {{[ERROR]}}
> {{[ERROR] Command line was: /usr/lib/jvm/java-11-openjdk-amd64/bin/javadoc 
> @options}}
> {{[ERROR]}}
> {{[ERROR] Refer to the generated Javadoc files in 
> '/home/admin/mesos-deb-packaging/mesos-repo/build/src/java/target/apidocs' 
> dir.}}
> {{[ERROR] -> [Help 1]}}
> {{[ERROR]}}
> {{[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.}}
> {{[ERROR] Re-run Maven using the -X switch to enable full debug logging.}}
> {{[ERROR]}}
> {{[ERROR] For more information about the errors and possible solutions, 
> please read the following articles:}}
> {{[ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException}}
> {{make[1]: *** [Makefile:17533: java/target/mesos-1.11.0.jar] Error 1}}
> {{make[1]: Leaving directory 
> '/home/admin/mesos-deb-packaging/mesos-repo/build/src'}}
> {{make: *** [Makefile:785: all-recursive] Error 1}}
> *NOTE:* The error is at the Maven javadoc plugin call when it tries to 
> include references to the non-existant old Java 6 documentation.
> h3. POSSIBLE SOLUTION:
> Just remove the old reference with adding 
> false to the  javadoc maven plugin 
> configuration section



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >