Re: [Nfs-ganesha-devel] deadlock in lru_reap_impl()

2017-08-04 Thread Daniel Gryniewicz
I think we need to ensure that the partition lock is taken before the 
qlane lock.  I have a patch for this, but it introduced a refcount 
issue, so I'm debugging.


Daniel

On 08/03/2017 08:52 PM, Pradeep wrote:
Thanks Franks. I merged your patch and now hitting another deadlock. 
Here are the two threads:


This thread below holds the partition lock in 'read' mode and try to 
acquire queue lock:


Thread 143 (Thread 0x7faf82f72700 (LWP 143573)):
#0  0x7fafd1c371bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7fafd1c32d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7fafd1c32c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x005221fd in _mdcache_lru_ref (entry=0x7fae78d19000, 
flags=2, func=0x58ec80 <__func__.23467> "mdcache_find_keyed", line=881) 
at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1813
#4  0x00532686 in mdcache_find_keyed (key=0x7faf82f70760, 
entry=0x7faf82f707e8) at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:881


 874 *entry = cih_get_by_key_latch(key, ,
 875 CIH_GET_RLOCK | 
CIH_GET_UNLOCK_ON_MISS,

 876 __func__, __LINE__);
 877 if (likely(*entry)) {
 878 fsal_status_t status;
 879
 880 /* Initial Ref on entry */
 881 status = mdcache_lru_ref(*entry, LRU_REQ_INITIAL);


This thread is already holding queue lock and trying to acquire 
partition lock in write mode:


Thread 188 (Thread 0x7faf9979f700 (LWP 143528)):
#0  0x7fafd1c3403e in pthread_rwlock_wrlock () from 
/lib64/libpthread.so.0
#1  0x0052fc61 in cih_remove_checked (entry=0x7fad62914e00) at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394
#2  0x00530b3e in mdc_clean_entry (entry=0x7fad62914e00) at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272
#3  0x0051df7e in mdcache_lru_clean (entry=0x7fad62914e00) at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590
#4  0x00522cca in _mdcache_lru_unref (entry=0x7fad62914e00, 
flags=8, func=0x58b700 <__func__.23710> "lru_reap_impl", line=690) at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1922
#5  0x0051ea38 in lru_reap_impl (qid=LRU_ENTRY_L1) at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:690





On Fri, Jul 28, 2017 at 1:34 PM, Frank Filz <ffilz...@mindspring.com 
<mailto:ffilz...@mindspring.com>> wrote:


Hmm, well, that’s easy to fix…

__ __

Instead of:

__ __

mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);

goto next_lane;

__ __

It could:

__ __

QUNLOCK(qlane);

mdcache_put(entry);

continue;

__ __

Fix posted here:

__ __

https://review.gerrithub.io/371764
<https://review.gerrithub.io/371764>

__ __

Frank

__ __

__ __

*From:*Pradeep [mailto:pradeep.tho...@gmail.com
<mailto:pradeep.tho...@gmail.com>]
*Sent:* Friday, July 28, 2017 12:44 PM
*To:* nfs-ganesha-devel@lists.sourceforge.net
<mailto:nfs-ganesha-devel@lists.sourceforge.net>
*Subject:* [Nfs-ganesha-devel] deadlock in lru_reap_impl()

__ __

__ __

I'm hitting another deadlock in mdcache with 2.5.1 base.  In this
case two threads are in different places in lru_reap_impl()

__ __

Thread 1:

__ __

 636 QLOCK(qlane);

 637 lru = glist_first_entry(>q,
mdcache_lru_t, q);

 638 if (!lru)

 639 goto next_lane;

 640 refcnt = atomic_inc_int32_t(>refcnt);

 641 entry = container_of(lru, mdcache_entry_t,
lru);

 642 if (unlikely(refcnt !=
(LRU_SENTINEL_REFCOUNT + 1))) {

 643 /* cant use it. */

 644 mdcache_lru_unref(entry,
LRU_UNREF_QLOCKED);

__ __

​mdcache_lru_unref() could lead to the set of calls below:​

__ __

​mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry()
-> cih_remove_checked()

__ __

This tries to get partition lock which is held by 'Thread 2' which
is trying to acquire queue lane lock.

__ __

Thread 2:

 650 if (cih_latch_entry(>fh_hk.key,
, CIH_GET_WLOCK,

 651 

Re: [Nfs-ganesha-devel] deadlock in lru_reap_impl()

2017-08-03 Thread Pradeep
Thanks Franks. I merged your patch and now hitting another deadlock. Here
are the two threads:

This thread below holds the partition lock in 'read' mode and try to
acquire queue lock:

Thread 143 (Thread 0x7faf82f72700 (LWP 143573)):
#0  0x7fafd1c371bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7fafd1c32d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7fafd1c32c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x005221fd in _mdcache_lru_ref (entry=0x7fae78d19000, flags=2,
func=0x58ec80 <__func__.23467> "mdcache_find_keyed", line=881) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1813
#4  0x00532686 in mdcache_find_keyed (key=0x7faf82f70760,
entry=0x7faf82f707e8) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:881

874 *entry = cih_get_by_key_latch(key, ,
875 CIH_GET_RLOCK |
CIH_GET_UNLOCK_ON_MISS,
876 __func__, __LINE__);
877 if (likely(*entry)) {
878 fsal_status_t status;
879
880 /* Initial Ref on entry */
881 status = mdcache_lru_ref(*entry, LRU_REQ_INITIAL);


This thread is already holding queue lock and trying to acquire partition
lock in write mode:

Thread 188 (Thread 0x7faf9979f700 (LWP 143528)):
#0  0x7fafd1c3403e in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1  0x0052fc61 in cih_remove_checked (entry=0x7fad62914e00) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394
#2  0x00530b3e in mdc_clean_entry (entry=0x7fad62914e00) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272
#3  0x0051df7e in mdcache_lru_clean (entry=0x7fad62914e00) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590
#4  0x00522cca in _mdcache_lru_unref (entry=0x7fad62914e00,
flags=8, func=0x58b700 <__func__.23710> "lru_reap_impl", line=690) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1922
#5  0x0051ea38 in lru_reap_impl (qid=LRU_ENTRY_L1) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:690




On Fri, Jul 28, 2017 at 1:34 PM, Frank Filz <ffilz...@mindspring.com> wrote:

> Hmm, well, that’s easy to fix…
>
>
>
> Instead of:
>
>
>
> mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);
>
> goto next_lane;
>
>
>
> It could:
>
>
>
> QUNLOCK(qlane);
>
> mdcache_put(entry);
>
> continue;
>
>
>
> Fix posted here:
>
>
>
> https://review.gerrithub.io/371764
>
>
>
> Frank
>
>
>
>
>
> *From:* Pradeep [mailto:pradeep.tho...@gmail.com]
> *Sent:* Friday, July 28, 2017 12:44 PM
> *To:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* [Nfs-ganesha-devel] deadlock in lru_reap_impl()
>
>
>
>
>
> I'm hitting another deadlock in mdcache with 2.5.1 base.  In this case two
> threads are in different places in lru_reap_impl()
>
>
>
> Thread 1:
>
>
>
> 636 QLOCK(qlane);
>
> 637 lru = glist_first_entry(>q, mdcache_lru_t, q);
>
> 638 if (!lru)
>
> 639 goto next_lane;
>
> 640 refcnt = atomic_inc_int32_t(>refcnt);
>
> 641 entry = container_of(lru, mdcache_entry_t, lru);
>
> 642 if (unlikely(refcnt != (LRU_SENTINEL_REFCOUNT +
> 1))) {
>
> 643 /* cant use it. */
>
> 644 mdcache_lru_unref(entry,
> LRU_UNREF_QLOCKED);
>
>
>
> ​mdcache_lru_unref() could lead to the set of calls below:​
>
>
>
> ​mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry()
> -> cih_remove_checked()
>
>
>
> This tries to get partition lock which is held by 'Thread 2' which is
> trying to acquire queue lane lock.
>
>
>
> Thread 2:
>
> 650 if (cih_latch_entry(>fh_hk.key, ,
> CIH_GET_WLOCK,
>
> 651 __func__, __LINE__)) {
>
> 652 QLOCK(qlane);
>
>
>
> Stack traces:
>
>
>
> Thread 1:
>
>
> #0  0x7f571328103e in pthread_rwlock_wrlock () from
> /lib64/libpthread.so.0
>
> #1  0x0052f928 in cih_remove_checked (entry=0x7f548e86c400)
>
> at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/
> Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394
>
> 

Re: [Nfs-ganesha-devel] deadlock in lru_reap_impl()

2017-07-28 Thread Frank Filz
Hmm, well, that’s easy to fix…



Instead of:



mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);

goto next_lane;



It could:



QUNLOCK(qlane);

mdcache_put(entry);

continue;



Fix posted here:



https://review.gerrithub.io/371764



Frank





From: Pradeep [mailto:pradeep.tho...@gmail.com]
Sent: Friday, July 28, 2017 12:44 PM
To: nfs-ganesha-devel@lists.sourceforge.net
Subject: [Nfs-ganesha-devel] deadlock in lru_reap_impl()





I'm hitting another deadlock in mdcache with 2.5.1 base.  In this case two 
threads are in different places in lru_reap_impl()



Thread 1:



636 QLOCK(qlane);

637 lru = glist_first_entry(>q, mdcache_lru_t, q);

638 if (!lru)

639 goto next_lane;

640 refcnt = atomic_inc_int32_t(>refcnt);

641 entry = container_of(lru, mdcache_entry_t, lru);

642 if (unlikely(refcnt != (LRU_SENTINEL_REFCOUNT + 1))) {

643 /* cant use it. */

644 mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);



​mdcache_lru_unref() could lead to the set of calls below:​



​mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry() -> 
cih_remove_checked()



This tries to get partition lock which is held by 'Thread 2' which is trying to 
acquire queue lane lock.



Thread 2:

650 if (cih_latch_entry(>fh_hk.key, , 
CIH_GET_WLOCK,

651 __func__, __LINE__)) {

652 QLOCK(qlane);



Stack traces:



Thread 1:


#0  0x7f571328103e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0

#1  0x0052f928 in cih_remove_checked (entry=0x7f548e86c400)

at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394

#2  0x00530805 in mdc_clean_entry (entry=0x7f548e86c400)

at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272

#3  0x0051df7e in mdcache_lru_clean (entry=0x7f548e86c400)

at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590

#4  0x005229c0 in _mdcache_lru_unref (entry=0x7f548e86c400, flags=8, 
func=0x58b5c0 <__func__.23710> "lru_reap_impl", line=687)

at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1918

#5  0x0051e83a in lru_reap_impl (qid=LRU_ENTRY_L1)

at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:687



Thread 2:

#0  0x7f57132841bd in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x7f571327fd02 in _L_lock_791 () from /lib64/libpthread.so.0

#2  0x7f571327fc08 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x0051e4f5 in lru_reap_impl (qid=LRU_ENTRY_L1)

at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:652









---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] deadlock in lru_reap_impl()

2017-07-28 Thread Pradeep
I'm hitting another deadlock in mdcache with 2.5.1 base.  In this case two
threads are in different places in lru_reap_impl()

Thread 1:

636 QLOCK(qlane);
637 lru = glist_first_entry(>q, mdcache_lru_t, q);
638 if (!lru)
639 goto next_lane;
640 refcnt = atomic_inc_int32_t(>refcnt);
641 entry = container_of(lru, mdcache_entry_t, lru);
642 if (unlikely(refcnt != (LRU_SENTINEL_REFCOUNT +
1))) {
643 /* cant use it. */
644 mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);

​mdcache_lru_unref() could lead to the set of calls below:​

​mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry()
-> cih_remove_checked()

This tries to get partition lock which is held by 'Thread 2' which is
trying to acquire queue lane lock.

Thread 2:
650 if (cih_latch_entry(>fh_hk.key, ,
CIH_GET_WLOCK,
651 __func__, __LINE__)) {
652 QLOCK(qlane);

Stack traces:

Thread 1:

#0  0x7f571328103e in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1  0x0052f928 in cih_remove_checked (entry=0x7f548e86c400)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394
#2  0x00530805 in mdc_clean_entry (entry=0x7f548e86c400)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272
#3  0x0051df7e in mdcache_lru_clean (entry=0x7f548e86c400)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590
#4  0x005229c0 in _mdcache_lru_unref (entry=0x7f548e86c400,
flags=8, func=0x58b5c0 <__func__.23710> "lru_reap_impl", line=687)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1918
#5  0x0051e83a in lru_reap_impl (qid=LRU_ENTRY_L1)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:687

Thread 2:
#0  0x7f57132841bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f571327fd02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7f571327fc08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0051e4f5 in lru_reap_impl (qid=LRU_ENTRY_L1)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:652
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel