[jira] [Comment Edited] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"
[ https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119576#comment-17119576 ] Andrei Budnik edited comment on MESOS-10131 at 5/29/20, 1:04 PM: - Please keep posting error messages on agent crash. Hopefully, we'll capture a part of `mountinfo` containing the loop. I think it might be worth capturing mount info after the moment it happens. We could check if there are duplicate records or even detect a loop or find some other anomalies. `mount && cat /proc/1/mountinfo` && `cat /proc//mountinfo` was (Author: abudnik): Please keep posting error messages on agent crash. Hopefully, we'll capture a part of `mountinfo` containing the loop. I think it might be worth capturing mount info after the moment it happens. We could check then if there are duplicate records or even detect a loop or find some other anomalies. `mount && cat /proc/1/mountinfo` && `cat /proc//mountinfo` > Agent frequently dies with error "Cycle found in mount table hierarchy" > --- > > Key: MESOS-10131 > URL: https://issues.apache.org/jira/browse/MESOS-10131 > Project: Mesos > Issue Type: Bug > Components: agent, framework >Affects Versions: 1.9.0 >Reporter: Thomas Plummer >Assignee: Andrei Budnik >Priority: Major > Attachments: log.txt > > > Our mesos agent frequently dies with the follow error in the slave logs: > > {code:java} > F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: > !visitedParents.contains(parentId) Cycle found in mount table hierarchy at > entry '1954': > 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs > rw,seclabel > 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw > 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs > rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755 > 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - > securityfs securityfs rw > 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs > rw,seclabel > 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts > rw,seclabel,gid=5,mode=620,ptmxmode=000 > 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755 > 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs > ro,seclabel,mode=755 > 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 > - cgroup cgroup > rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd > 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - > pstore pstore rw > 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime > shared:21 - efivarfs efivarfs rw > 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime > shared:10 - cgroup cgroup rw,seclabel,perf_event > 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime > shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls > 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 > - cgroup cgroup rw,seclabel,cpuset > 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - > cgroup cgroup rw,seclabel,blkio > 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 > - cgroup cgroup rw,seclabel,freezer > 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 > - cgroup cgroup rw,seclabel,hugetlb > 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 > - cgroup cgroup rw,seclabel,devices > 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime > shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu > 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 > - cgroup cgroup rw,seclabel,memory > 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - > cgroup cgroup rw,seclabel,pids > 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw > 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root > rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota > 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw > 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs > systemd-1 > rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414 > 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw > 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel > 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs > rw,seclabel > 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 > rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
[jira] [Comment Edited] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"
[ https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118896#comment-17118896 ] Andrei Budnik edited comment on MESOS-10131 at 5/28/20, 5:21 PM: - I think the message containing the whole mount table is long enough (~30k bytes) to reach the limit of the logger buffer... [~tomplummer] Could you capture both truncated log message and the output of "cat /proc//mountinfo" next time it crashes? (and/or `mount && cat /proc/1/mountinfo` if mesos agent can't start) was (Author: abudnik): I think the message containing the whole mount table is long enough (~30k bytes) to reach the limit of the logger buffer... [~tomplummer] Could you capture both truncated log message and the output of "cat /proc//mountinfo" next time it crashes? > Agent frequently dies with error "Cycle found in mount table hierarchy" > --- > > Key: MESOS-10131 > URL: https://issues.apache.org/jira/browse/MESOS-10131 > Project: Mesos > Issue Type: Bug > Components: agent, framework >Affects Versions: 1.9.0 >Reporter: Thomas Plummer >Assignee: Andrei Budnik >Priority: Major > Attachments: log.txt > > > Our mesos agent frequently dies with the follow error in the slave logs: > > {code:java} > F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: > !visitedParents.contains(parentId) Cycle found in mount table hierarchy at > entry '1954': > 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs > rw,seclabel > 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw > 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs > rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755 > 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - > securityfs securityfs rw > 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs > rw,seclabel > 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts > rw,seclabel,gid=5,mode=620,ptmxmode=000 > 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755 > 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs > ro,seclabel,mode=755 > 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 > - cgroup cgroup > rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd > 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - > pstore pstore rw > 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime > shared:21 - efivarfs efivarfs rw > 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime > shared:10 - cgroup cgroup rw,seclabel,perf_event > 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime > shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls > 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 > - cgroup cgroup rw,seclabel,cpuset > 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - > cgroup cgroup rw,seclabel,blkio > 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 > - cgroup cgroup rw,seclabel,freezer > 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 > - cgroup cgroup rw,seclabel,hugetlb > 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 > - cgroup cgroup rw,seclabel,devices > 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime > shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu > 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 > - cgroup cgroup rw,seclabel,memory > 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - > cgroup cgroup rw,seclabel,pids > 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw > 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root > rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota > 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw > 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs > systemd-1 > rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414 > 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw > 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel > 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs > rw,seclabel > 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 > rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota > 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 >
[jira] [Comment Edited] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"
[ https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118003#comment-17118003 ] Thomas Plummer edited comment on MESOS-10131 at 5/27/20, 6:41 PM: -- [~abudnik] I have attached the appropriate portion of the log. I was initially copying from a terminal, not realized that it was several terminal screens long. Thanks for looking into this for us. Please let me know if there is anything else you may need. was (Author: tomplummer): [~abudnik] I have attached the appropriate portion of the log. I was initially copying from a terminal, not realized that it was several terminal screens long. > Agent frequently dies with error "Cycle found in mount table hierarchy" > --- > > Key: MESOS-10131 > URL: https://issues.apache.org/jira/browse/MESOS-10131 > Project: Mesos > Issue Type: Bug > Components: agent, framework >Affects Versions: 1.9.0 >Reporter: Thomas Plummer >Assignee: Andrei Budnik >Priority: Major > Attachments: log.txt > > > Our mesos agent frequently dies with the follow error in the slave logs: > > {code:java} > F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: > !visitedParents.contains(parentId) Cycle found in mount table hierarchy at > entry '1954': > 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs > rw,seclabel > 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw > 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs > rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755 > 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - > securityfs securityfs rw > 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs > rw,seclabel > 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts > rw,seclabel,gid=5,mode=620,ptmxmode=000 > 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755 > 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs > ro,seclabel,mode=755 > 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 > - cgroup cgroup > rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd > 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - > pstore pstore rw > 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime > shared:21 - efivarfs efivarfs rw > 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime > shared:10 - cgroup cgroup rw,seclabel,perf_event > 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime > shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls > 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 > - cgroup cgroup rw,seclabel,cpuset > 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - > cgroup cgroup rw,seclabel,blkio > 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 > - cgroup cgroup rw,seclabel,freezer > 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 > - cgroup cgroup rw,seclabel,hugetlb > 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 > - cgroup cgroup rw,seclabel,devices > 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime > shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu > 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 > - cgroup cgroup rw,seclabel,memory > 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - > cgroup cgroup rw,seclabel,pids > 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw > 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root > rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota > 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw > 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs > systemd-1 > rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414 > 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw > 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel > 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs > rw,seclabel > 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 > rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota > 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 > rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro > 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var >