Hello,

We're having some strange issues with OpenAFS lately.

It started after installing the base RHEL 7.4 kernel, 3.10.0-693.el7.x86_64
back in August, with the latest version of OpenAFS client at the time,
1.6.21. We've tried using the now latest version, 1.6.21.1, and still have
the same issues. This happens with all the subsequent RHEL 7.4 kernels as
well, including the latest kernel, 3.10.0-693.2.2.el7.x86_64.

When a user logs in they sometimes get a message similar to this:

shell-init: error retrieving current directory: getcwd: cannot access
parent directories: No such file or directory
tcsh: No such file or directory
tcsh: Trying to start from "<user AFS home directory>"


This doesn't happen for every user and seems to be a transient issue.
We've had issues replicating it reliably internally. The users are able to
access their files just fine afterwards though.

Then, for what seems like random applications, they get an error message
like '<application name>: getcwd() failed'. For example, this has happened
often with the qsub command that is used to submit jobs to our batch
system. So, an example message would be:

qsub: getcwd() failed


We've also seen it with other applications, including git.

This is a major issue that has caused us to have to stay at the latest
pre-RHEL 7.4 kernel for a long time now while this issue has existed. This
may be related to previous issues with getcwd() but something in the RHEL
7.4 kernel seems to have made it much worse. Simply rebooting a system does
not fix it, nor does clearing the AFS cache.

Has anyone else experienced this issue with RHEL 7.4? Is there anything
that we can do to narrow down what is causing this?

Thank you in advance for any assistance!

Reply via email to