Hi all,

We have compute clusters where the nodes have almost everything of their roots 
in afs; most things in /, as /etc and /usr, are soft links into a complete os 
installation in afs. To be able to have some writable files and directories, 
such as /etc/adjtime or /var/tmp, we bind mount files and directories in the 
tree which is actually in afs (mainly using the rwtab functionality), and a 
lustre client that also gets mounted in the afs tree.

When we upgraded from CentOS 7.3 to 7.4, kernel 3.10.0-693.5.2.el7.x86_64, and 
using OpenAFS client 1.6.21.1 or 1.6.20.1, when users having home directories 
in afs log in and start accessing their data, mounts in the afs tree starts to 
get randomly unmounted. In the lustre case, the lustre client nicely reports 
that it unmounts, so the unmounts seem to be handled in an orderly manner.

We have a suspicion this may be related to the problem reported in the thread 
“getcwd() error for RHEL 7.4 kernel”, and that the kernel for some reason 
decides that path to the mount point is no good and unmounts.
In addition, when this has started to happen, we are not able to mount anything 
more into afs, mount returns ENOENT.

This is pretty easy to repeat.

Our workaround for now is to use the tpmfs based root all the way down to the 
mount points, and have soft links into afs further down for the rest, which 
seems to work.

Please let us know if we can provide any help debugging this.


/ragge

PDC Center for High Performance Computing, KTH Royal Institute of Technology, 
Stockholm, Sweden

_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to