Hi Mark, > On 3 Nov 2017, at 15:51, Mark Vitale <[email protected]> wrote: > > Ragge, > >> On Nov 3, 2017, at 9:46 AM, Ragnar Sundblad <[email protected]> wrote: >> >> We have compute clusters where the nodes have almost everything of their >> roots in afs; most things in /, as /etc and /usr, are soft links into a >> complete os installation in afs. To be able to have some writable files and >> directories, such as /etc/adjtime or /var/tmp, we bind mount files and >> directories in the tree which is actually in afs (mainly using the rwtab >> functionality), and a lustre client that also gets mounted in the afs tree. >> >> When we upgraded from CentOS 7.3 to 7.4, kernel 3.10.0-693.5.2.el7.x86_64, >> and using OpenAFS client 1.6.21.1 or 1.6.20.1, when users having home >> directories in afs log in and start accessing their data, mounts in the afs >> tree starts to get randomly unmounted. In the lustre case, the lustre client >> nicely reports that it unmounts, so the unmounts seem to be handled in an >> orderly manner. >> >> We have a suspicion this may be related to the problem reported in the >> thread âgetcwd() error for RHEL 7.4 kernelâ, and that the kernel for >> some reason decides that path to the mount point is no good and unmounts. >> In addition, when this has started to happen, we are not able to mount >> anything more into afs, mount returns ENOENT. >> >> This is pretty easy to repeat. > Thank you for your detailed report. > I have an idea about what this may be, but I will try to duplicate it on my > test system first.
Thanks for investigating! :-) >> Our workaround for now is to use the tpmfs based root all the way down to >> the mount points, and have soft links into afs further down for the rest, >> which seems to work. > Itâs good that you have a workaround; thank you for sharing that as well. > >> Please let us know if we can provide any help debugging this. > For now I would like to see your afsd options, and also the output from > âcmdebug <client> -cacheâ for an affected client. We start it like so: /bin/chroot /sysimage /usr/vice/etc/afsd -memcache -verbose -nosettime -dynroot -mountdir /afs (Before systemd is started, we set up the runtime root in /sysimage, then chroot there, and start systemd to let it bring up the system.) Here is a cmdebug: # cmdebug tegner-login-2 -cache Chunk files: 1562 Stat caches: 2343 Data caches: 1562 Volume caches: 200 Chunk size: 65536 Cache size: 100000 kB Set time: no Cache type: memory I now see that I forgot to mention that we use memory cache (since the nodes are diskless). > Although you havenât reported the getcwd() problem, could you please > confirm if youâve seen it or not? We have not seen it, but we haven’t really looked for it either. Is there some test we could try? > And finally, just to confirm, you have seen bind mounts in /afs unmounted at > CentOS 7.4 with both OpenAFS 1.6.21.1 and 1.6.20.1, but _not_ with CentOS 7.3 > and those same OpenAFS client releases - correct? With 7.3 (kernel 3.10.0-514.26.2.el7.x86_64) we actually used openafs client 1.6.20.2, but with that combination this mount-within-afs thing worked just fine. Thanks! /ragge _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
