> On Nov 16, 2017, at 12:26 PM, Stephan Wiesand <[email protected]> wrote: > > > On Nov 16, 2017, at 07:06 , Benjamin Kaduk wrote: > >> On Wed, Nov 15, 2017 at 01:02:15PM -0500, Matt Vander Werf wrote: >>> Hello, >>> >>> Are there any updates or progress on a potential fix for this issue? >>> Anything we can do to help figure things out? >> >> This topic was on the agenda for our release-team meeting yesterday. > > Well, it has been for the last couple of weeks. > >> If I remmber correctly, multiple developers have gotten fairly >> reliable ways to reproduce the issue locally. >> It also seems that as a workaround, reverting >> https://gerrit.openafs.org/#/c/12451/ is likely to reduce the >> likelihood of triggering events. > > Yes, but there’s at least one known client configuration (small stat cache, > -disable-dynamic-vcaches) for which reverting that change actually makes > things worse.
The root cause is that the semantics of Linux d_invalidate() changed between 3.10.0-514 (RH/CentOS 7.3) and 3.10.0-693 (RH/CentOS 7.4). The former would return -EBUSY if you attempted to invalidate the current working directory. The latter will invalidate (unhash) the current working directory’s dentry without a second thought. OpenAFS code in afs_ShakeLooseVCaches() currently relies on the former behavior to prevent the getcwd() ENOENT problem. I am working on a patch and will submit it to gerrit when it passes my tests. Thank you to everyone who shared debugging and test results. I will post here again when the patch is available in gerrit, so that anyone who wishes may test it in their setup. Regards, — Mark Vitale Sine Nomine Associates
