[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2021-04-14 Thread Charles Natali (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321152#comment-17321152
 ] 

Charles Natali commented on MESOS-10131:


I think this could possibly happen without a loop in {{/proc/PID/mountinfo}} 
because reading from {{/proc/PID/mountinfo}} isn't atomic - definitely not if 
it can't be read in a single {{read}} syscall, which is very likely the case 
here since it's larger than 30K.

Could explain why it happens randomly especially if there are many short-lived 
tasks being started.

 

Since it didn't re-occur and the potential fix for it would be far from 
trivial, probably time to close.

 

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=51

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-06-15 Thread Thomas Plummer (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135978#comment-17135978
 ] 

Thomas Plummer commented on MESOS-10131:


We haven't seen this issue in over two weeks now.  We are not sure what the 
root cause of the issue would be, but here are a few of the things we changed:

1)  We set --gc_delay=1days from the default of 1 week.

2)  We upgraded to new hardware.

3)  We found some issues with our custom mesos framework (we were taking a long 
time to respond to status updates and resource offers)

We are comfortable with closing this issue and we will be sure to reopen it if 
the error comes back again.

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-29 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119576#comment-17119576
 ] 

Andrei Budnik commented on MESOS-10131:
---

Please keep posting error messages on agent crash. Hopefully, we'll capture a 
part of `mountinfo` containing the loop.
I think it might be worth capturing mount info after the moment it happens. We 
could check then if there are duplicate records or even detect a loop or find 
some other anomalies. `mount && cat /proc/1/mountinfo` && `cat /proc//mountinfo`

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 50 41 253:5 / /home rw,nodev,relatime shared:32 - xfs 
> /dev/mapper/vg_system-home 
> rw,seclabel,attr2,inode64,

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-28 Thread Rick Naik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119177#comment-17119177
 ] 

Rick Naik commented on MESOS-10131:
---

Andrei, is this something we need to capture the moment it happens? If so that 
will be challenging. I think what Tom posted is the extent of what we can 
capture hours after the fact.

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 50 41 253:5 / /home rw,nodev,relatime shared:32 - xfs 
> /dev/mapper/vg_system-home 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 51 41 253:4 / /tmp rw,nosuid,nodev,noexec,relatime shared:33 - xfs 
> /dev/mapper/vg_system-tmp 
> rw,seclabel,attr2,inode64,logbsize=256k

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-28 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118896#comment-17118896
 ] 

Andrei Budnik commented on MESOS-10131:
---

I think the message containing the whole mount table is long enough (~30k 
bytes) to reach the limit of the logger buffer...
[~tomplummer] Could you capture both truncated log message and the output of 
"cat /proc//mountinfo" next time it crashes?

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 50 41 253:5 / /home rw,nodev,relatime shared:32 - xfs 
> /dev/mapper/vg_system-home 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 51 41 253:4 / /tmp rw,nosuid,nodev,noexec,relatime shared:33 - xfs 

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-27 Thread Thomas Plummer (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118003#comment-17118003
 ] 

Thomas Plummer commented on MESOS-10131:


[~abudnik] I have attached the appropriate portion of the log.  I was initially 
copying from a terminal, not realized that it was several terminal screens long.

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
> Attachments: log.txt
>
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 50 41 253:5 / /home rw,nodev,relatime shared:32 - xfs 
> /dev/mapper/vg_system-home 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 51 41 253:4 / /tmp rw,nosuid,nodev,noexec,relatime shared:33 - xfs 
> /dev/mapper/vg_system-tmp 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,sw

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-27 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117903#comment-17117903
 ] 

Andrei Budnik commented on MESOS-10131:
---

[~tomplummer] It seems that the tail of the log message is missing. Could you 
please provide the whole log message containing the mount table? We will try to 
reproduce the problem by running a unit test to ensure that this is not a bug 
in the code.

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 50 41 253:5 / /home rw,nodev,relatime shared:32 - xfs 
> /dev/mapper/vg_system-home 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 51 41 253:4 / /tmp rw,nosuid,nodev,noexec,relatime shared:33 - xfs 
> /dev/mapper/vg_system-tmp 

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-27 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117892#comment-17117892
 ] 

Andrei Budnik commented on MESOS-10131:
---

Mount table without extra newlines:
{code:java}
18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
rw,seclabel
19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
securityfs securityfs rw
22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs rw,seclabel
23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
rw,seclabel,gid=5,mode=620,ptmxmode=000
24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
ro,seclabel,mode=755
26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - 
cgroup cgroup 
rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - pstore 
pstore rw
28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
shared:21 - efivarfs efivarfs rw
29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
shared:10 - cgroup cgroup rw,seclabel,perf_event
30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 - 
cgroup cgroup rw,seclabel,cpuset
32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
cgroup cgroup rw,seclabel,blkio
33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 - 
cgroup cgroup rw,seclabel,freezer
34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 - 
cgroup cgroup rw,seclabel,hugetlb
35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 - 
cgroup cgroup rw,seclabel,devices
36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 - 
cgroup cgroup rw,seclabel,memory
38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
cgroup cgroup rw,seclabel,pids
39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs systemd-1 
rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
rw,seclabel
47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
49 41 253:2 / /var rw,relatime shared:31 - xfs /dev/mapper/vg_system-var 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
50 41 253:5 / /home rw,nodev,relatime shared:32 - xfs 
/dev/mapper/vg_system-home 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
51 41 253:4 / /tmp rw,nosuid,nodev,noexec,relatime shared:33 - xfs 
/dev/mapper/vg_system-tmp 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
53 49 253:4 / /var/tmp rw,nosuid,nodev,noexec,relatime shared:33 - xfs 
/dev/mapper/vg_system-tmp 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
52 49 253:3 / /var/log rw,relatime shared:34 - xfs /dev/mapper/vg_system-varlog 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
54 52 253:6 / /var/log/audit rw,relatime shared:35 - xfs 
/dev/mapper/vg_system-varlogaudit 
rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
187 41 0:41 / /mnt/receipt rw,relatime shared:165 - nfs4 
dtmetlnfsa01p.a.carfax.us:/ 
rw,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.18.154.117,local_lock=none,addr=172.18.138.237
188 41 0:42 / /mnt/receipt_web_dev rw,relatime shared:169 - nfs4 
dtmetlnfsa01b.a.carfax.us:/ 
rw,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.18.154.117,local_lock=none,addr=172.18.137.248
192 41 0:41 / /mnt/rece

[jira] [Commented] (MESOS-10131) Agent frequently dies with error "Cycle found in mount table hierarchy"

2020-05-27 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117873#comment-17117873
 ] 

Andrei Budnik commented on MESOS-10131:
---

I've copy-pasted the mount table from the log excerpt into one of our unit 
tests (`FsTest.MountInfoTableReadSortedParentOfSelf`). It failed with the 
following error message:

{code:java}
../../src/tests/containerizer/fs_tests.cpp:344: Failure
table: Failed to parse entry 
'docker/overlay2/l/LOG7DILAFLJBIQ7CKDQVFXJLP7:/var/lib/docker/overlay2/l/6JVIPP3XCCWKZPFAUWKXCDWYXL:/var/lib/docker/overlay2/l/L5VKHJHVOWG24VJPJCAKGTQX5G:/var/lib/docker/overlay2/l/ZIIS5MWCIF4C6KXI2LVKVU4TMF:/var/lib/docker/overlay2/l/4JXI':
 Could not find separator ' - '
{code}

It seems that there was a memory corruption. I'm investigating what could be 
the cause.

> Agent frequently dies with error "Cycle found in mount table hierarchy"
> ---
>
> Key: MESOS-10131
> URL: https://issues.apache.org/jira/browse/MESOS-10131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, framework
>Affects Versions: 1.9.0
>Reporter: Thomas Plummer
>Assignee: Andrei Budnik
>Priority: Major
>
> Our mesos agent frequently dies with the follow error in the slave logs:
>  
> {code:java}
> F0509 22:10:33.036993 17723 fs.cpp:217] Check failed: 
> !visitedParents.contains(parentId) Cycle found in mount table hierarchy at 
> entry '1954': 
> 18 41 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs 
> rw,seclabel
> 19 41 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 20 41 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs 
> rw,seclabel,size=65852208k,nr_inodes=16463052,mode=755
> 21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - 
> securityfs securityfs rw
> 22 20 0:19 / /dev/shm rw,nosuid,nodev,noexec shared:3 - tmpfs tmpfs 
> rw,seclabel
> 23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts 
> rw,seclabel,gid=5,mode=620,ptmxmode=000
> 24 41 0:20 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,seclabel,mode=755
> 25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs 
> ro,seclabel,mode=755
> 26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 
> - cgroup cgroup 
> rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - 
> pstore pstore rw
> 28 18 0:24 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime 
> shared:21 - efivarfs efivarfs rw
> 29 25 0:25 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime 
> shared:10 - cgroup cgroup rw,seclabel,perf_event
> 30 25 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime 
> shared:11 - cgroup cgroup rw,seclabel,net_prio,net_cls
> 31 25 0:27 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 
> - cgroup cgroup rw,seclabel,cpuset
> 32 25 0:28 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - 
> cgroup cgroup rw,seclabel,blkio
> 33 25 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 
> - cgroup cgroup rw,seclabel,freezer
> 34 25 0:30 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 
> - cgroup cgroup rw,seclabel,hugetlb
> 35 25 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:16 
> - cgroup cgroup rw,seclabel,devices
> 36 25 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
> shared:17 - cgroup cgroup rw,seclabel,cpuacct,cpu
> 37 25 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:18 
> - cgroup cgroup rw,seclabel,memory
> 38 25 0:34 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - 
> cgroup cgroup rw,seclabel,pids
> 39 18 0:35 / /sys/kernel/config rw,relatime shared:22 - configfs configfs rw
> 41 0 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/vg_system-root 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 42 18 0:16 / /sys/fs/selinux rw,relatime shared:23 - selinuxfs selinuxfs rw
> 43 19 0:37 / /proc/sys/fs/binfmt_misc rw,relatime shared:25 - autofs 
> systemd-1 
> rw,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11414
> 44 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
> 45 20 0:15 / /dev/mqueue rw,relatime shared:27 - mqueue mqueue rw,seclabel
> 46 20 0:38 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs 
> rw,seclabel
> 47 41 8:2 / /boot rw,relatime shared:29 - xfs /dev/sda2 
> rw,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
> 48 47 8:1 / /boot/efi rw,relatime shared:30 - vfat /dev/sda1 
> rw,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro
> 49 41 2