Hello, We're having some strange issues with OpenAFS lately.
It started after installing the base RHEL 7.4 kernel, 3.10.0-693.el7.x86_64 back in August, with the latest version of OpenAFS client at the time, 1.6.21. We've tried using the now latest version, 1.6.21.1, and still have the same issues. This happens with all the subsequent RHEL 7.4 kernels as well, including the latest kernel, 3.10.0-693.2.2.el7.x86_64. When a user logs in they sometimes get a message similar to this: shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory tcsh: No such file or directory tcsh: Trying to start from "<user AFS home directory>" This doesn't happen for every user and seems to be a transient issue. We've had issues replicating it reliably internally. The users are able to access their files just fine afterwards though. Then, for what seems like random applications, they get an error message like '<application name>: getcwd() failed'. For example, this has happened often with the qsub command that is used to submit jobs to our batch system. So, an example message would be: qsub: getcwd() failed We've also seen it with other applications, including git. This is a major issue that has caused us to have to stay at the latest pre-RHEL 7.4 kernel for a long time now while this issue has existed. This may be related to previous issues with getcwd() but something in the RHEL 7.4 kernel seems to have made it much worse. Simply rebooting a system does not fix it, nor does clearing the AFS cache. Has anyone else experienced this issue with RHEL 7.4? Is there anything that we can do to narrow down what is causing this? Thank you in advance for any assistance!
