On Oct 20, 2017, at 21:17 , Mark Vitale wrote:

> 
>> On Oct 20, 2017, at 8:27 AM, Stephan Wiesand <[email protected]> wrote:
>> 
>> [taking this thread to -devel]
>> 
>>> On 20. Oct 2017, at 12:04, Stephan Wiesand <[email protected]> wrote:
>>> 
>>> I ran configure against the EL7.3 and EL7.4 GA kernels (3.10.0-514.el7 and 
>>> 3.10.0-696.el7) and compared the results.
>>> 
>>> Besides the fact that in the 7.4 case conftest.c is compiled with an 
>>> additional -DCONFIG_AVX512, which I doubt makes a difference, there are 
>>> some differences in configure test results:
>>> 
>>>                     7.3     7.4
>>> locks_lock_file_wait        no      yes
>>> inode_lock          no      yes
>>> exported tasklist_lock      yes     no
>> 
> 
> Thank you for this good information, Stephan.  Were those 3 the only OpenAFS 
> config differences you found?

Yes of course.

>> It turns out the EL7.4 kernel turns tasklist_lock from an rwlock_t into a 
>> qrwlock_t and all read_{,un}lock() calls into qread_{,un}lock() ones. And 
>> no, it's not what mainline kernels do, including 4.14-rc5.
>> 
>> We should probably adapt to this, and I guess it shouldn’t be too hard, but 
>> is this change likely to be the reason for more frequent getcwd() problems?
> 
> 
> I took a look at all three differences with regard to the OpenAFS 1.6.20.2 
> code, and I don’t see a way that any of them could be causing the getcwd 
> problems.  
> 
> In particular, the threadlist_lock references in OpenAFS 1.6.20.2 source will 
> not actually result in any OpenAFS kernel module references, due to the 
> results from other parts of the autoconfig for RHEL 7.4.  You can verify this 
> for yourself by issuing:  ’nm <openafs.ko> | grep threadlist_lock’
> 
> However, don’t rely on the nm trick to look for the other symbols referenced 
> above. inode_lock() is defined as static inline and is thus inlined as a 
> mutex_unlock(&inode->i_lock), which is indistinguishable from other 
> mutex_unlock references.  And locks_lock_file_wait() is also static inline - 
> it shows up as locks_lock_inode_wait in the nm output. 
> 
> So in summary, thank you, but I don’t believe any of these explain the 
> current getcwd symptoms.
> 
> Has anyone seen this with RHEL 7.4 and the previous OpenAFS releases -  
> 1.6.20.1 or older?


Not here. It was 1.6.21, and the statistics isn't exactly great.

You mean it could simply be "shake harder" unmasking the actual issue again?

-- 
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to