[jira] [Comment Edited] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-29 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119576#comment-17119576
 ] 

Andrei Budnik edited comment on MESOS-10131 at 5/29/20, 1:04 PM:
-

Please keep posting error messages on agent crash. Hopefully, we'll capture a 
part of `mountinfo` containing the loop.
 I think it might be worth capturing mount info after the moment it happens. We 
could check if there are duplicate records or even detect a loop or find some 
other anomalies. `mount && cat /proc/1/mountinfo` && `cat /proc//mountinfo`


was (Author: abudnik):
Please keep posting error messages on agent crash. Hopefully, we'll capture a 
part of `mountinfo` containing the loop.
I think it might be worth capturing mount info after the moment it happens. We 
could check then if there are duplicate records or even detect a loop or find 
some other anomalies. `mount && cat /proc/1/mountinfo` && `cat /proc//mountinfo`

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota

[jira] [Comment Edited] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-28 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118896#comment-17118896
 ] 

Andrei Budnik edited comment on MESOS-10131 at 5/28/20, 5:21 PM:
-

I think the message containing the whole mount table is long enough (~30k 
bytes) to reach the limit of the logger buffer...
 [~tomplummer] Could you capture both truncated log message and the output of 
"cat /proc//mountinfo" next time it crashes? (and/or `mount 
&& cat /proc/1/mountinfo` if mesos agent can't start)


was (Author: abudnik):
I think the message containing the whole mount table is long enough (~30k 
bytes) to reach the limit of the logger buffer...
[~tomplummer] Could you capture both truncated log message and the output of 
"cat /proc//mountinfo" next time it crashes?

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> 

[jira] [Comment Edited] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-27 Thread Thomas Plummer (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118003#comment-17118003
 ] 

Thomas Plummer edited comment on MESOS-10131 at 5/27/20, 6:41 PM:
--

[~abudnik] I have attached the appropriate portion of the log.  I was initially 
copying from a terminal, not realized that it was several terminal screens long.

Thanks for looking into this for us.  Please let me know if there is anything 
else you may need.


was (Author: tomplummer):
[~abudnik] I have attached the appropriate portion of the log.  I was initially 
copying from a terminal, not realized that it was several terminal screens long.

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
>