On Oct 20, 2017, at 21:17 , Mark Vitale wrote:
>
>> On Oct 20, 2017, at 8:27 AM, Stephan Wiesand <[email protected]> wrote:
>>
>> [taking this thread to -devel]
>>
>>> On 20. Oct 2017, at 12:04, Stephan Wiesand <[email protected]> wrote:
>>>
>>> I ran configure against the EL7.3 and EL7.4 GA kernels (3.10.0-514.el7 and
>>> 3.10.0-696.el7) and compared the results.
>>>
>>> Besides the fact that in the 7.4 case conftest.c is compiled with an
>>> additional -DCONFIG_AVX512, which I doubt makes a difference, there are
>>> some differences in configure test results:
>>>
>>> 7.3 7.4
>>> locks_lock_file_wait no yes
>>> inode_lock no yes
>>> exported tasklist_lock yes no
>>
>
> Thank you for this good information, Stephan. Were those 3 the only OpenAFS
> config differences you found?
Yes of course.
>> It turns out the EL7.4 kernel turns tasklist_lock from an rwlock_t into a
>> qrwlock_t and all read_{,un}lock() calls into qread_{,un}lock() ones. And
>> no, it's not what mainline kernels do, including 4.14-rc5.
>>
>> We should probably adapt to this, and I guess it shouldn’t be too hard, but
>> is this change likely to be the reason for more frequent getcwd() problems?
>
>
> I took a look at all three differences with regard to the OpenAFS 1.6.20.2
> code, and I don’t see a way that any of them could be causing the getcwd
> problems.
>
> In particular, the threadlist_lock references in OpenAFS 1.6.20.2 source will
> not actually result in any OpenAFS kernel module references, due to the
> results from other parts of the autoconfig for RHEL 7.4. You can verify this
> for yourself by issuing: ’nm <openafs.ko> | grep threadlist_lock’
>
> However, don’t rely on the nm trick to look for the other symbols referenced
> above. inode_lock() is defined as static inline and is thus inlined as a
> mutex_unlock(&inode->i_lock), which is indistinguishable from other
> mutex_unlock references. And locks_lock_file_wait() is also static inline -
> it shows up as locks_lock_inode_wait in the nm output.
>
> So in summary, thank you, but I don’t believe any of these explain the
> current getcwd symptoms.
>
> Has anyone seen this with RHEL 7.4 and the previous OpenAFS releases -
> 1.6.20.1 or older?
Not here. It was 1.6.21, and the statistics isn't exactly great.
You mean it could simply be "shake harder" unmasking the actual issue again?
--
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel