Ragge, > On Nov 3, 2017, at 9:46 AM, Ragnar Sundblad <[email protected]> wrote: > > We have compute clusters where the nodes have almost everything of their > roots in afs; most things in /, as /etc and /usr, are soft links into a > complete os installation in afs. To be able to have some writable files and > directories, such as /etc/adjtime or /var/tmp, we bind mount files and > directories in the tree which is actually in afs (mainly using the rwtab > functionality), and a lustre client that also gets mounted in the afs tree. > > When we upgraded from CentOS 7.3 to 7.4, kernel 3.10.0-693.5.2.el7.x86_64, > and using OpenAFS client 1.6.21.1 or 1.6.20.1, when users having home > directories in afs log in and start accessing their data, mounts in the afs > tree starts to get randomly unmounted. In the lustre case, the lustre client > nicely reports that it unmounts, so the unmounts seem to be handled in an > orderly manner. > > We have a suspicion this may be related to the problem reported in the thread > “getcwd() error for RHEL 7.4 kernel”, and that the kernel for some reason > decides that path to the mount point is no good and unmounts. > In addition, when this has started to happen, we are not able to mount > anything more into afs, mount returns ENOENT. > > This is pretty easy to repeat. Thank you for your detailed report. I have an idea about what this may be, but I will try to duplicate it on my test system first.
> Our workaround for now is to use the tpmfs based root all the way down to the > mount points, and have soft links into afs further down for the rest, which > seems to work. It’s good that you have a workaround; thank you for sharing that as well. > Please let us know if we can provide any help debugging this. For now I would like to see your afsd options, and also the output from ‘cmdebug <client> -cache’ for an affected client. Although you haven’t reported the getcwd() problem, could you please confirm if you’ve seen it or not? And finally, just to confirm, you have seen bind mounts in /afs unmounted at CentOS 7.4 with both OpenAFS 1.6.21.1 and 1.6.20.1, but _not_ with CentOS 7.3 and those same OpenAFS client releases - correct? Thanks, — Mark Vitale OpenAFS release team
