> On Nov 16, 2017, at 12:26 PM, Stephan Wiesand <[email protected]> wrote:
> 
> 
> On Nov 16, 2017, at 07:06 , Benjamin Kaduk wrote:
> 
>> On Wed, Nov 15, 2017 at 01:02:15PM -0500, Matt Vander Werf wrote:
>>> Hello,
>>> 
>>> Are there any updates or progress on a potential fix for this issue?
>>> Anything we can do to help figure things out?
>> 
>> This topic was on the agenda for our release-team meeting yesterday.
> 
> Well, it has been for the last couple of weeks.
> 
>> If I remmber correctly, multiple developers have gotten fairly
>> reliable ways to reproduce the issue locally.
>> It also seems that as a workaround, reverting
>> https://gerrit.openafs.org/#/c/12451/ is likely to reduce the
>> likelihood of triggering events.
> 
> Yes, but there’s at least one known client configuration (small stat cache, 
> -disable-dynamic-vcaches) for which reverting that change actually makes 
> things worse.

The root cause is that the semantics of Linux d_invalidate() changed between
3.10.0-514 (RH/CentOS 7.3) and 3.10.0-693 (RH/CentOS 7.4).  
The former would return -EBUSY if you attempted to invalidate the
current working directory.  The latter will invalidate (unhash)
the current working directory’s dentry without a second thought.
OpenAFS code in afs_ShakeLooseVCaches() currently relies on the former behavior
to prevent the getcwd() ENOENT problem.

I am working on a patch and will submit it to gerrit when it passes my tests.

Thank you to everyone who shared debugging and test results.
I will post here again when the patch is available in gerrit, so that anyone
who wishes may test it in their setup.

Regards,
—
Mark Vitale
Sine Nomine Associates

Reply via email to