I noticed you added your patch(es) to gerrit for the RHEL 7.4 getcwd issue
(Thanks!).

Responding to your comment on the latest commit, "I can submit an
equivalent, but simpler, "emergency" 1.6.x backport of just this top commit
on request.": This definitely would be preferred from our end! (Would allow
us to test just the getcwd patch in the 1.6.x branch, which is what we
use.) Once this is available, I can test this in our setup to confirm it
fixes the getcwd issue for us as well.

Thanks!

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Sun, Nov 19, 2017 at 3:41 PM, Mark Vitale <[email protected]> wrote:

>
> > On Nov 16, 2017, at 12:26 PM, Stephan Wiesand <[email protected]>
> wrote:
> >
> >
> > On Nov 16, 2017, at 07:06 , Benjamin Kaduk wrote:
> >
> >> On Wed, Nov 15, 2017 at 01:02:15PM -0500, Matt Vander Werf wrote:
> >>> Hello,
> >>>
> >>> Are there any updates or progress on a potential fix for this issue?
> >>> Anything we can do to help figure things out?
> >>
> >> This topic was on the agenda for our release-team meeting yesterday.
> >
> > Well, it has been for the last couple of weeks.
> >
> >> If I remmber correctly, multiple developers have gotten fairly
> >> reliable ways to reproduce the issue locally.
> >> It also seems that as a workaround, reverting
> >> https://gerrit.openafs.org/#/c/12451/ is likely to reduce the
> >> likelihood of triggering events.
> >
> > Yes, but there’s at least one known client configuration (small stat
> cache, -disable-dynamic-vcaches) for which reverting that change actually
> makes things worse.
>
> The root cause is that the semantics of Linux d_invalidate() changed
> between
> 3.10.0-514 (RH/CentOS 7.3) and 3.10.0-693 (RH/CentOS 7.4).
> The former would return -EBUSY if you attempted to invalidate the
> current working directory.  The latter will invalidate (unhash)
> the current working directory’s dentry without a second thought.
> OpenAFS code in afs_ShakeLooseVCaches() currently relies on the former
> behavior
> to prevent the getcwd() ENOENT problem.
>
> I am working on a patch and will submit it to gerrit when it passes my
> tests.
>
> Thank you to everyone who shared debugging and test results.
> I will post here again when the patch is available in gerrit, so that
> anyone
> who wishes may test it in their setup.
>
> Regards,
> —
> Mark Vitale
> Sine Nomine Associates
>
>

Reply via email to