Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/06/2014 11:38 AM, Pranith Kumar Karampuri wrote: On 06/06/2014 11:37 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 10:56 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 06/06/2014 10:47 AM, Pranith Kumar Karampuri wrote: On 06/06/2014 10:43 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 9:54 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 06/06/2014 10:02 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; +if (!list_empty (&dom->blocked_entrylks)) + is_empty = 0; + if (!list_empty (&dom->inodelk_list)) is_empty = 0; + +if (!list_empty (&dom->blocked_inodelks)) + is_empty = 0; } return is_empty; @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) struct list_head granted_list; posix_lock_t *tmp = NULL; posix_lock_t *lock = NULL; + inode_t *unref = NULL; INIT_LIST_HEAD (&granted_list); pthread_mutex_lock (&pl_inode->mutex); { __grant_blocked_locks (this, pl_inode, &granted_list); + + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) { + unref = pl_inode->refkeeper; + pl_inode->refkeeper = NULL; + } } pthread_mutex_unlock (&pl_inode->mutex); @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) GF_FREE (lock); } + if (unref) + inode_unref (unref); + return; } This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer. Thoughts? Lets say C1 is doing pl_inodelk_client_cleanup. After the second for-loop(All granted and blocked locks are out of the domain) if an unlock on the final granted lock on that inode from client C2 completes, refkeeper would be set to NULL and unrefed leading to zero refs on that inode i.e. pl_forget will also happen. In 3rd for-loop pl_inode is already freed and leads to free'd memory access and will crash. We also need: diff --git a/xlators/features/locks/src/inodelk.c b/xlators/features/locks/src/inodelk.c index c76cb7f..2aceb8a 100644 --- a/xlators/features/locks/src/inodelk.c +++ b/xlators/features/locks/src/inodelk.c @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t *ctx) dom = get_domain (pl_inode, l->volume); - grant_blocked_inode_locks (this, pl_inode, dom); - pthread_mutex_lock (&pl_inode->mutex); { __pl_inodelk_unref (l); } pthread_mutex_unlock (&pl_inode->mutex); + + grant_blocked_inode_locks (this, pl_inode, dom); } return 0; Missed this in the last patch. It still doesn't solve the problem I described earlier. By the time it executes this third loop refkeeper is already unreffed when C2 unlocks. Right, we will need to introduce an "in_cleanup" counter, if set pl_update_refkeeper() should not unref. Increment the in_cleanup() in the first lookup, and decrement it in the last loop, just before calling grant_blocked_locks() (along with the patches in my last 2 mails) s/first lookup/first loop/ ? Consider the following scenario: There are two granted locks L1, L2 from C1, C2 clients respectively on same inode. C1 gets disconnected. C2 issues a unlock. This is the se
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/06/2014 11:37 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 10:56 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 06/06/2014 10:47 AM, Pranith Kumar Karampuri wrote: On 06/06/2014 10:43 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 9:54 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 06/06/2014 10:02 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; +if (!list_empty (&dom->blocked_entrylks)) + is_empty = 0; + if (!list_empty (&dom->inodelk_list)) is_empty = 0; + +if (!list_empty (&dom->blocked_inodelks)) + is_empty = 0; } return is_empty; @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) struct list_head granted_list; posix_lock_t *tmp = NULL; posix_lock_t *lock = NULL; + inode_t *unref = NULL; INIT_LIST_HEAD (&granted_list); pthread_mutex_lock (&pl_inode->mutex); { __grant_blocked_locks (this, pl_inode, &granted_list); + + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) { + unref = pl_inode->refkeeper; + pl_inode->refkeeper = NULL; + } } pthread_mutex_unlock (&pl_inode->mutex); @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) GF_FREE (lock); } + if (unref) + inode_unref (unref); + return; } This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer. Thoughts? Lets say C1 is doing pl_inodelk_client_cleanup. After the second for-loop(All granted and blocked locks are out of the domain) if an unlock on the final granted lock on that inode from client C2 completes, refkeeper would be set to NULL and unrefed leading to zero refs on that inode i.e. pl_forget will also happen. In 3rd for-loop pl_inode is already freed and leads to free'd memory access and will crash. We also need: diff --git a/xlators/features/locks/src/inodelk.c b/xlators/features/locks/src/inodelk.c index c76cb7f..2aceb8a 100644 --- a/xlators/features/locks/src/inodelk.c +++ b/xlators/features/locks/src/inodelk.c @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t *ctx) dom = get_domain (pl_inode, l->volume); - grant_blocked_inode_locks (this, pl_inode, dom); - pthread_mutex_lock (&pl_inode->mutex); { __pl_inodelk_unref (l); } pthread_mutex_unlock (&pl_inode->mutex); + + grant_blocked_inode_locks (this, pl_inode, dom); } return 0; Missed this in the last patch. It still doesn't solve the problem I described earlier. By the time it executes this third loop refkeeper is already unreffed when C2 unlocks. Right, we will need to introduce an "in_cleanup" counter, if set pl_update_refkeeper() should not unref. Increment the in_cleanup() in the first lookup, and decrement it in the last loop, just before calling grant_blocked_locks() (along with the patches in my last 2 mails) s/first lookup/first loop/ ? Consider the following scenario: There are two granted locks L1, L2 from C1, C2 clients respectively on same inode. C1 gets disconnected. C2 issues a unlock. This is the sequence of steps: 1) C1 executes first loop, increment
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On Thu, Jun 5, 2014 at 10:56 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > On 06/06/2014 10:47 AM, Pranith Kumar Karampuri wrote: > > > On 06/06/2014 10:43 AM, Anand Avati wrote: > > > > > On Thu, Jun 5, 2014 at 9:54 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> >> On 06/06/2014 10:02 AM, Anand Avati wrote: >> >> On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri < >> pkara...@redhat.com> wrote: >> >>> >>> This sounds a bit complicated. I think there is a much simpler >>> solution: >>> >>> - First, make update_refkeeper() check for blocked locks (which I >>> mentioned as "optional" previously) >>> >>> - Make grant_blocked_locks() double up and do the job of >>> update_refkeeper() internally. >>> >>> Something which looks like this: >>> >>> diff --git a/xlators/features/locks/src/common.c >>> b/xlators/features/locks/src/common.c >>> index f6c71c1..38df385 100644 >>> --- a/xlators/features/locks/src/common.c >>> +++ b/xlators/features/locks/src/common.c >>> @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) >>> if (!list_empty (&dom->entrylk_list)) >>> is_empty = 0; >>> >>> +if (!list_empty (&dom->blocked_entrylks)) >>> +is_empty = 0; >>> + >>> if (!list_empty (&dom->inodelk_list)) >>> is_empty = 0; >>> + >>> +if (!list_empty (&dom->blocked_inodelks)) >>> +is_empty = 0; >>> } >>> >>> return is_empty; >>> @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t >>> *pl_inode) >>> struct list_head granted_list; >>> posix_lock_t *tmp = NULL; >>> posix_lock_t *lock = NULL; >>> + inode_t *unref = NULL; >>> >>> INIT_LIST_HEAD (&granted_list); >>> >>> pthread_mutex_lock (&pl_inode->mutex); >>> { >>> __grant_blocked_locks (this, pl_inode, &granted_list); >>> + >>> + if (__pl_inode_is_empty (pl_inode) && >>> pl_inode->refkeeper) { >>> + unref = pl_inode->refkeeper; >>> + pl_inode->refkeeper = NULL; >>> + } >>> } >>> pthread_mutex_unlock (&pl_inode->mutex); >>> >>> @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t >>> *pl_inode) >>> GF_FREE (lock); >>> } >>> >>> + if (unref) >>> + inode_unref (unref); >>> + >>> return; >>> } >>> >>> >>> This should make pl_disconnect_cbk() pretty much race free w.r.t >>> refkpeer. Thoughts? >>> >>> Lets say C1 is doing pl_inodelk_client_cleanup. After the second >>> for-loop(All granted and blocked locks are out of the domain) if an unlock >>> on the final granted lock on that inode from client C2 completes, refkeeper >>> would be set to NULL and unrefed leading to zero refs on that inode i.e. >>> pl_forget will also happen. In 3rd for-loop pl_inode is already freed and >>> leads to free'd memory access and will crash. >>> >> >> >> We also need: >> >> diff --git a/xlators/features/locks/src/inodelk.c >> b/xlators/features/locks/src/inodelk.c >> index c76cb7f..2aceb8a 100644 >> --- a/xlators/features/locks/src/inodelk.c >> +++ b/xlators/features/locks/src/inodelk.c >> @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t >> *ctx) >> >> dom = get_domain (pl_inode, l->volume); >> >> - grant_blocked_inode_locks (this, pl_inode, dom); >> - >> pthread_mutex_lock (&pl_inode->mutex); >> { >> __pl_inodelk_unref (l); >> } >> pthread_mutex_unlock (&pl_inode->mutex); >> + >> + grant_blocked_inode_locks (this, pl_inode, dom); >> } >> >> >> return 0; >> >> Missed this in the last patch. >> >> It still doesn't solve the problem I described earlier. By the time it >> executes this third loop refkeeper is already unreffed when C2 unlocks. >> > > > Right, we will need to introduce an "in_cleanup" counter, if set > pl_update_refkeeper() should not unref. Increment the in_cleanup() in the > first lookup, and decrement it in the last loop, just before calling > grant_blocked_locks() (along with the patches in my last 2 mails) > > s/first lookup/first loop/ ? > > Consider the following scenario: > There are two granted locks L1, L2 from C1, C2 clients respectively on > same inode. > C1 gets disconnected. > C2 issues a unlock. > > This is the sequence of steps: > 1) C1 executes first loop, increments in_cleanup to 1 > 2) C2 executes pl_inode_setlk and removed L2 from granted list. It is now > just before grant_blocked_inode_locks() > 3) C1 starts 3rd for loop and unrefs L1, decrements in_cleanup to 0 > 4) C2 executes grant_blocked_inode_locks() and decrements the refkeepr, > sets it to NULL and unwinds. This destroys the inode so pl_
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/06/2014 10:47 AM, Pranith Kumar Karampuri wrote: On 06/06/2014 10:43 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 9:54 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 06/06/2014 10:02 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; + if (!list_empty (&dom->blocked_entrylks)) +is_empty = 0; + if (!list_empty (&dom->inodelk_list)) is_empty = 0; + + if (!list_empty (&dom->blocked_inodelks)) +is_empty = 0; } return is_empty; @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) struct list_head granted_list; posix_lock_t *tmp = NULL; posix_lock_t *lock = NULL; + inode_t *unref = NULL; INIT_LIST_HEAD (&granted_list); pthread_mutex_lock (&pl_inode->mutex); { __grant_blocked_locks (this, pl_inode, &granted_list); + + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) { + unref = pl_inode->refkeeper; + pl_inode->refkeeper = NULL; + } } pthread_mutex_unlock (&pl_inode->mutex); @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) GF_FREE (lock); } + if (unref) + inode_unref (unref); + return; } This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer. Thoughts? Lets say C1 is doing pl_inodelk_client_cleanup. After the second for-loop(All granted and blocked locks are out of the domain) if an unlock on the final granted lock on that inode from client C2 completes, refkeeper would be set to NULL and unrefed leading to zero refs on that inode i.e. pl_forget will also happen. In 3rd for-loop pl_inode is already freed and leads to free'd memory access and will crash. We also need: diff --git a/xlators/features/locks/src/inodelk.c b/xlators/features/locks/src/inodelk.c index c76cb7f..2aceb8a 100644 --- a/xlators/features/locks/src/inodelk.c +++ b/xlators/features/locks/src/inodelk.c @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t *ctx) dom = get_domain (pl_inode, l->volume); - grant_blocked_inode_locks (this, pl_inode, dom); - pthread_mutex_lock (&pl_inode->mutex); { __pl_inodelk_unref (l); } pthread_mutex_unlock (&pl_inode->mutex); + + grant_blocked_inode_locks (this, pl_inode, dom); } return 0; Missed this in the last patch. It still doesn't solve the problem I described earlier. By the time it executes this third loop refkeeper is already unreffed when C2 unlocks. Right, we will need to introduce an "in_cleanup" counter, if set pl_update_refkeeper() should not unref. Increment the in_cleanup() in the first lookup, and decrement it in the last loop, just before calling grant_blocked_locks() (along with the patches in my last 2 mails) s/first lookup/first loop/ ? Consider the following scenario: There are two granted locks L1, L2 from C1, C2 clients respectively on same inode. C1 gets disconnected. C2 issues a unlock. This is the sequence of steps: 1) C1 executes first loop, increments in_cleanup to 1 2) C2 executes pl_inode_setlk and removed L2 from granted list. It is now just before grant_blocked_inode_locks() 3) C1 starts 3rd for loop and unrefs L1, decrements in_cleanup to 0 4) C2 executes grant_blocked_inode_locks() and decrements the refkeepr, sets it to NULL and unwinds. This destroys the inode so pl_inode is freed. 5) C1 calls grant_blocked_inode_locks with pl_inode which is free'd Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@glus
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/06/2014 10:43 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 9:54 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 06/06/2014 10:02 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; +if (!list_empty (&dom->blocked_entrylks)) + is_empty = 0; + if (!list_empty (&dom->inodelk_list)) is_empty = 0; + +if (!list_empty (&dom->blocked_inodelks)) + is_empty = 0; } return is_empty; @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) struct list_head granted_list; posix_lock_t *tmp = NULL; posix_lock_t *lock = NULL; + inode_t *unref = NULL; INIT_LIST_HEAD (&granted_list); pthread_mutex_lock (&pl_inode->mutex); { __grant_blocked_locks (this, pl_inode, &granted_list); + + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) { + unref = pl_inode->refkeeper; + pl_inode->refkeeper = NULL; + } } pthread_mutex_unlock (&pl_inode->mutex); @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) GF_FREE (lock); } + if (unref) + inode_unref (unref); + return; } This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer. Thoughts? Lets say C1 is doing pl_inodelk_client_cleanup. After the second for-loop(All granted and blocked locks are out of the domain) if an unlock on the final granted lock on that inode from client C2 completes, refkeeper would be set to NULL and unrefed leading to zero refs on that inode i.e. pl_forget will also happen. In 3rd for-loop pl_inode is already freed and leads to free'd memory access and will crash. We also need: diff --git a/xlators/features/locks/src/inodelk.c b/xlators/features/locks/src/inodelk.c index c76cb7f..2aceb8a 100644 --- a/xlators/features/locks/src/inodelk.c +++ b/xlators/features/locks/src/inodelk.c @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t *ctx) dom = get_domain (pl_inode, l->volume); - grant_blocked_inode_locks (this, pl_inode, dom); - pthread_mutex_lock (&pl_inode->mutex); { __pl_inodelk_unref (l); } pthread_mutex_unlock (&pl_inode->mutex); + + grant_blocked_inode_locks (this, pl_inode, dom); } return 0; Missed this in the last patch. It still doesn't solve the problem I described earlier. By the time it executes this third loop refkeeper is already unreffed when C2 unlocks. Right, we will need to introduce an "in_cleanup" counter, if set pl_update_refkeeper() should not unref. Increment the in_cleanup() in the first lookup, and decrement it in the last loop, just before calling grant_blocked_locks() (along with the patches in my last 2 mails) s/first lookup/first loop/ ? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On Thu, Jun 5, 2014 at 9:54 PM, Pranith Kumar Karampuri wrote: > > On 06/06/2014 10:02 AM, Anand Avati wrote: > > On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> >> This sounds a bit complicated. I think there is a much simpler solution: >> >> - First, make update_refkeeper() check for blocked locks (which I >> mentioned as "optional" previously) >> >> - Make grant_blocked_locks() double up and do the job of >> update_refkeeper() internally. >> >> Something which looks like this: >> >> diff --git a/xlators/features/locks/src/common.c >> b/xlators/features/locks/src/common.c >> index f6c71c1..38df385 100644 >> --- a/xlators/features/locks/src/common.c >> +++ b/xlators/features/locks/src/common.c >> @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) >> if (!list_empty (&dom->entrylk_list)) >> is_empty = 0; >> >> +if (!list_empty (&dom->blocked_entrylks)) >> +is_empty = 0; >> + >> if (!list_empty (&dom->inodelk_list)) >> is_empty = 0; >> + >> +if (!list_empty (&dom->blocked_inodelks)) >> +is_empty = 0; >> } >> >> return is_empty; >> @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t >> *pl_inode) >> struct list_head granted_list; >> posix_lock_t *tmp = NULL; >> posix_lock_t *lock = NULL; >> + inode_t *unref = NULL; >> >> INIT_LIST_HEAD (&granted_list); >> >> pthread_mutex_lock (&pl_inode->mutex); >> { >> __grant_blocked_locks (this, pl_inode, &granted_list); >> + >> + if (__pl_inode_is_empty (pl_inode) && >> pl_inode->refkeeper) { >> + unref = pl_inode->refkeeper; >> + pl_inode->refkeeper = NULL; >> + } >> } >> pthread_mutex_unlock (&pl_inode->mutex); >> >> @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t >> *pl_inode) >> GF_FREE (lock); >> } >> >> + if (unref) >> + inode_unref (unref); >> + >> return; >> } >> >> >> This should make pl_disconnect_cbk() pretty much race free w.r.t >> refkpeer. Thoughts? >> >> Lets say C1 is doing pl_inodelk_client_cleanup. After the second >> for-loop(All granted and blocked locks are out of the domain) if an unlock >> on the final granted lock on that inode from client C2 completes, refkeeper >> would be set to NULL and unrefed leading to zero refs on that inode i.e. >> pl_forget will also happen. In 3rd for-loop pl_inode is already freed and >> leads to free'd memory access and will crash. >> > > > We also need: > > diff --git a/xlators/features/locks/src/inodelk.c > b/xlators/features/locks/src/inodelk.c > index c76cb7f..2aceb8a 100644 > --- a/xlators/features/locks/src/inodelk.c > +++ b/xlators/features/locks/src/inodelk.c > @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t > *ctx) > > dom = get_domain (pl_inode, l->volume); > > - grant_blocked_inode_locks (this, pl_inode, dom); > - > pthread_mutex_lock (&pl_inode->mutex); > { > __pl_inodelk_unref (l); > } > pthread_mutex_unlock (&pl_inode->mutex); > + > + grant_blocked_inode_locks (this, pl_inode, dom); > } > > > return 0; > > Missed this in the last patch. > > It still doesn't solve the problem I described earlier. By the time it > executes this third loop refkeeper is already unreffed when C2 unlocks. > Right, we will need to introduce an "in_cleanup" counter, if set pl_update_refkeeper() should not unref. Increment the in_cleanup() in the first lookup, and decrement it in the last loop, just before calling grant_blocked_locks() (along with the patches in my last 2 mails) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/06/2014 10:02 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; +if (!list_empty (&dom->blocked_entrylks)) +is_empty = 0; + if (!list_empty (&dom->inodelk_list)) is_empty = 0; + +if (!list_empty (&dom->blocked_inodelks)) +is_empty = 0; } return is_empty; @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) struct list_head granted_list; posix_lock_t *tmp = NULL; posix_lock_t *lock = NULL; + inode_t *unref = NULL; INIT_LIST_HEAD (&granted_list); pthread_mutex_lock (&pl_inode->mutex); { __grant_blocked_locks (this, pl_inode, &granted_list); + + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) { + unref = pl_inode->refkeeper; + pl_inode->refkeeper = NULL; + } } pthread_mutex_unlock (&pl_inode->mutex); @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) GF_FREE (lock); } + if (unref) + inode_unref (unref); + return; } This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer. Thoughts? Lets say C1 is doing pl_inodelk_client_cleanup. After the second for-loop(All granted and blocked locks are out of the domain) if an unlock on the final granted lock on that inode from client C2 completes, refkeeper would be set to NULL and unrefed leading to zero refs on that inode i.e. pl_forget will also happen. In 3rd for-loop pl_inode is already freed and leads to free'd memory access and will crash. We also need: diff --git a/xlators/features/locks/src/inodelk.c b/xlators/features/locks/src/inodelk.c index c76cb7f..2aceb8a 100644 --- a/xlators/features/locks/src/inodelk.c +++ b/xlators/features/locks/src/inodelk.c @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t *ctx) dom = get_domain (pl_inode, l->volume); - grant_blocked_inode_locks (this, pl_inode, dom); - pthread_mutex_lock (&pl_inode->mutex); { __pl_inodelk_unref (l); } pthread_mutex_unlock (&pl_inode->mutex); + + grant_blocked_inode_locks (this, pl_inode, dom); } return 0; Missed this in the last patch. It still doesn't solve the problem I described earlier. By the time it executes this third loop refkeeper is already unreffed when C2 unlocks. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On Thu, Jun 5, 2014 at 7:52 PM, Pranith Kumar Karampuri wrote: > > This sounds a bit complicated. I think there is a much simpler solution: > > - First, make update_refkeeper() check for blocked locks (which I > mentioned as "optional" previously) > > - Make grant_blocked_locks() double up and do the job of > update_refkeeper() internally. > > Something which looks like this: > > diff --git a/xlators/features/locks/src/common.c > b/xlators/features/locks/src/common.c > index f6c71c1..38df385 100644 > --- a/xlators/features/locks/src/common.c > +++ b/xlators/features/locks/src/common.c > @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) > if (!list_empty (&dom->entrylk_list)) > is_empty = 0; > > +if (!list_empty (&dom->blocked_entrylks)) > +is_empty = 0; > + > if (!list_empty (&dom->inodelk_list)) > is_empty = 0; > + > +if (!list_empty (&dom->blocked_inodelks)) > +is_empty = 0; > } > > return is_empty; > @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t > *pl_inode) > struct list_head granted_list; > posix_lock_t *tmp = NULL; > posix_lock_t *lock = NULL; > + inode_t *unref = NULL; > > INIT_LIST_HEAD (&granted_list); > > pthread_mutex_lock (&pl_inode->mutex); > { > __grant_blocked_locks (this, pl_inode, &granted_list); > + > + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) > { > + unref = pl_inode->refkeeper; > + pl_inode->refkeeper = NULL; > + } > } > pthread_mutex_unlock (&pl_inode->mutex); > > @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t > *pl_inode) > GF_FREE (lock); > } > > + if (unref) > + inode_unref (unref); > + > return; > } > > > This should make pl_disconnect_cbk() pretty much race free w.r.t > refkpeer. Thoughts? > > Lets say C1 is doing pl_inodelk_client_cleanup. After the second > for-loop(All granted and blocked locks are out of the domain) if an unlock > on the final granted lock on that inode from client C2 completes, refkeeper > would be set to NULL and unrefed leading to zero refs on that inode i.e. > pl_forget will also happen. In 3rd for-loop pl_inode is already freed and > leads to free'd memory access and will crash. > We also need: diff --git a/xlators/features/locks/src/inodelk.c b/xlators/features/locks/src/inodelk.c index c76cb7f..2aceb8a 100644 --- a/xlators/features/locks/src/inodelk.c +++ b/xlators/features/locks/src/inodelk.c @@ -494,13 +494,13 @@ pl_inodelk_client_cleanup (xlator_t *this, pl_ctx_t *ctx) dom = get_domain (pl_inode, l->volume); - grant_blocked_inode_locks (this, pl_inode, dom); - pthread_mutex_lock (&pl_inode->mutex); { __pl_inodelk_unref (l); } pthread_mutex_unlock (&pl_inode->mutex); + + grant_blocked_inode_locks (this, pl_inode, dom); } return 0; Missed this in the last patch. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/06/2014 12:11 AM, Anand Avati wrote: On Thu, Jun 5, 2014 at 10:46 AM, Krutika Dhananjay mailto:kdhan...@redhat.com>> wrote: To summarize, the real "problems" are: - Deref of pl_inode->refkeeper as an inode_t in the cleanup logger. It is an internal artifact of pl_update_refkeeper() working and nobody else is supposed to "touch" it. For this, the solution I have in mind is to a. have a placeholder for gfid in pl_inode_t object, b. remember the gfid of the inode at the time of pl_inode_t creation in pl_ctx_get(), and c. print pl_inode->gfid in the log message in pl_{inodelk,entrylk}_log_cleanup(). OK. - Missing call to pl_update_refkeeper() in client_disconnect_cbk(). This can result in a potential leak of inode ref (not a leak if the same inode gets any locking activity by another client in the future). As far as updates to refkeeper is concerned during DISCONNECT is concerned, Pranith and I did have discussions at length but could not find a single safe place in cleanup functions to call pl_update_refkeeper() that does not lead to illegal memory access, whether before or after the call to __pl_inodelk_unref() in the third for loop. Case #1 explained: = list_for_each_entry_safe() { ... ... pl_update_refkeeper(inode); pthread_mutex_lock (&pl_inode->mutex); __pl_inodelk_unref(l); pthread_mutex_unlock (&pl_inode->mutex); This causes the last unref in pl_update_refkeeper() to implicitly call pl_forget() where pl_inode_t object is destroyed while it is still needed in terms of having to obtain pl_inode->mutex before unrefing the lock object. Case 2 explained: = list_for_each_entry_safe() { ... ... pthread_mutex_lock (&pl_inode->mutex); __pl_inodelk_unref(l); pthread_mutex_unlock (&pl_inode->mutex); pl_update_refkeeper(inode); After the first for loop is processed, the domain's lists could have been emptied. And at this point, there could well come a second thread that could update refkeeper, triggering last unref on the inode leading to a call to pl_forget() (where pl_inode_t is freed), after which the DISCONNECT thread would be trying to perform illegal memory access in its call to pl_update_refkeeper() during its turn. So the solution Pranith and I came up with involves making sure that the inode_t object is alive for as long as there is atleast one lock on it: 1. refkeeper will not be used for inodelks and entrylks (but will continue to be used in fcntl locks). 2. Each pl_inode_t object will also host an inode_t member which is initialised during a call to pl_inode_get() in pl_common_{inodelk, entrylks}(). 3. Everytime a lock is granted/blocked, pl_inode->inode is ref'd (in pl_inode_setlk() after successful locking. 4. Everytime a lock is unlocked, pl_inode->inode is unref'd. 5. In disconnect codepath, as part of "released" list processing, the inodes associated with the client are unref'd in the same loop right after every lock object is unref'd. With all this, there is one problem with the lock object that is found to be in both blocked and granted list in the transient state, when there's a race between the unlocking thread and the disconnecting thread. This can be best explained with the following example: Consider an inode I on which there's a granted lock G and a blocked lock B, from clients C1 and C2 respectively. With this approach, at this point the number of refs taken by the locks xlator on I would be 2. C1 unlocks B, at which point I's refcount becomes 1. Now as part of granting other blocked locks in unlock codepath, B is put in granted list. Now B's client C2 sends a DISCONNECT event, which would cause I to be unref'd again. This being the last unref, pl_forget() is called on I causing its pl_inode to be destroyed At this point, the unlocking thread could try to do a mutex lock on pl_inode->mutex in grant_blocked_{inode,entry}_locks() (whereas pl_inode is now already destroyed) leading to a crash. The problem described in the example can be fixed by making sure grant_blocked_{inode,entry}_locks() refs the inode first thing and then unrefs it just before returning. This would fix illegal memory access. This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 10064
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On Thu, Jun 5, 2014 at 10:46 AM, Krutika Dhananjay wrote: > To summarize, the real "problems" are: > > - Deref of pl_inode->refkeeper as an inode_t in the cleanup logger. It > is an internal artifact of pl_update_refkeeper() working and nobody else > is supposed to "touch" it. > > For this, the solution I have in mind is to > a. have a placeholder for gfid in pl_inode_t object, > b. remember the gfid of the inode at the time of pl_inode_t creation in > pl_ctx_get(), and > c. print pl_inode->gfid in the log message in > pl_{inodelk,entrylk}_log_cleanup(). > OK. - Missing call to pl_update_refkeeper() in client_disconnect_cbk(). This > can result in a potential leak of inode ref (not a leak if the same > inode gets any locking activity by another client in the future). > > As far as updates to refkeeper is concerned during DISCONNECT is > concerned, Pranith and I did have discussions at length but could > not find a single safe place in cleanup functions to call > pl_update_refkeeper() that does not lead to illegal memory access, > whether before or after the call to __pl_inodelk_unref() in the third for > loop. > > Case #1 explained: > = > > > list_for_each_entry_safe() { > ... > ... > pl_update_refkeeper(inode); > pthread_mutex_lock (&pl_inode->mutex); > __pl_inodelk_unref(l); > pthread_mutex_unlock (&pl_inode->mutex); > > > This causes the last unref in pl_update_refkeeper() to implicitly call > pl_forget() where pl_inode_t object is destroyed while it is > still needed in terms of having to obtain pl_inode->mutex before unrefing > the lock object. > > Case 2 explained: > = > > list_for_each_entry_safe() { > ... > ... > pthread_mutex_lock (&pl_inode->mutex); > __pl_inodelk_unref(l); > pthread_mutex_unlock (&pl_inode->mutex); > pl_update_refkeeper(inode); > > > After the first for loop is processed, the domain's lists could have been > emptied. And at this point, there could well come a second thread that > could update refkeeper, > triggering last unref on the inode leading to a call to pl_forget() (where > pl_inode_t is freed), after which the DISCONNECT thread would be trying to > perform illegal > memory access in its call to pl_update_refkeeper() during its turn. > > So the solution Pranith and I came up with involves making sure that the > inode_t object is alive for as long as there is atleast one lock on it: > > 1. refkeeper will not be used for inodelks and entrylks (but will continue > to be used in fcntl locks). > 2. Each pl_inode_t object will also host an inode_t member which is > initialised during a call to pl_inode_get() in pl_common_{inodelk, > entrylks}(). > 3. Everytime a lock is granted/blocked, pl_inode->inode is ref'd (in > pl_inode_setlk() after successful locking. > 4. Everytime a lock is unlocked, pl_inode->inode is unref'd. > 5. In disconnect codepath, as part of "released" list processing, the > inodes associated with the client are unref'd in the same loop right after > every lock object is unref'd. > > With all this, there is one problem with the lock object that is found to > be in both blocked and granted list in the transient state, when there's a > race between the unlocking thread > and the disconnecting thread. This can be best explained with the > following example: > > Consider an inode I on which there's a granted lock G and a blocked lock > B, from clients C1 and C2 respectively. With this approach, at this point > the number of refs taken > by the locks xlator on I would be 2. > C1 unlocks B, at which point I's refcount becomes 1. > Now as part of granting other blocked locks in unlock codepath, B is put > in granted list. > Now B's client C2 sends a DISCONNECT event, which would cause I to be > unref'd again. This being the last unref, pl_forget() is called on I > causing its pl_inode to be destroyed > At this point, the unlocking thread could try to do a mutex lock on > pl_inode->mutex in grant_blocked_{inode,entry}_locks() (whereas pl_inode is > now already destroyed) leading to a crash. > > > The problem described in the example can be fixed by making sure > grant_blocked_{inode,entry}_locks() refs the inode first thing and then > unrefs it just before returning. > This would fix illegal memory access. > This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; +if (!list_empty (&dom->blocked_entrylks)) +
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
- Original Message - > From: "Krutika Dhananjay" > To: "Anand Avati" > Cc: gluster-devel@gluster.org > Sent: Thursday, June 5, 2014 11:16:33 PM > Subject: Re: [Gluster-devel] Regarding doing away with refkeeper in locks > xlator > - Original Message - > > From: "Anand Avati" > > > To: "Krutika Dhananjay" , "Pranith Kumar Karampuri" > > > > > Cc: gluster-devel@gluster.org > > > Sent: Thursday, June 5, 2014 12:38:41 PM > > > Subject: Re: [Gluster-devel] Regarding doing away with refkeeper in locks > > xlator > > > On 6/4/14, 9:43 PM, Krutika Dhananjay wrote: > > > > > > > > > > > > > > > > > > > > *From: *"Pranith Kumar Karampuri" > > > > *To: *"Krutika Dhananjay" , "Anand Avati" > > > > > > > > *Cc: *gluster-devel@gluster.org > > > > *Sent: *Wednesday, June 4, 2014 12:23:59 PM > > > > *Subject: *Re: [Gluster-devel] Regarding doing away with refkeeper > > > > in locks xlator > > > > > > > > > > > > On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote: > > > > > > > > > > > > On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: > > > > > > > > Hi, > > > > > > > > Recently there was a crash in locks translator (BZ 1103347, > > > > BZ 1097102) with the following backtrace: > > > > (gdb) bt > > > > #0 uuid_unpack (in=0x8 , > > > > uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 > > > > #1 0x7feeba9e19d6 in uuid_unparse_x (uu= > > > optimized out>, out=0x2350fc0 > > > > "081bbc7a-7551-44ac-85c7-aad5e2633db9", > > > > fmt=0x7feebaa08e00 > > > > "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at > > > > ../../contrib/uuid/unparse.c:55 > > > > #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 > > > out of bounds>) at common-utils.c:2138 > > > > #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup > > > > (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 > > > > #4 pl_inodelk_client_cleanup (this=0x230d910, > > > > ctx=0x7fee700f0c60) at inodelk.c:428 > > > > #5 0x7feeb06ddf3a in pl_client_disconnect_cbk > > > > (this=0x230d910, client=) at posix.c:2550 > > > > #6 0x7feeba9fa2dd in gf_client_disconnect > > > > (client=0x27724a0) at client_t.c:368 > > > > #7 0x7feeab77ed48 in server_connection_cleanup > > > > (this=0x2316390, client=0x27724a0, flags= > > > out>) at server-helpers.c:354 > > > > #8 0x7feeab77ae2c in server_rpc_notify (rpc= > > > optimized out>, xl=0x2316390, event=, > > > > data=0x2bf51c0) at server.c:527 > > > > #9 0x7feeba775155 in rpcsvc_handle_disconnect > > > > (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 > > > > #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, > > > > mydata=, event=, > > > > data=0x2bf51c0) at rpcsvc.c:758 > > > > #11 0x7feeba778638 in rpc_transport_notify (this= > > > optimized out>, event=, data= > > > optimized out>) at rpc-transport.c:512 > > > > #12 0x7feeb115e971 in socket_event_poll_err (fd= > > > optimized out>, idx=, data=0x2bf51c0, > > > > poll_in=, poll_out=0, > > > > poll_err=0) at socket.c:1071 > > > > #13 socket_event_handler (fd=, > > > > idx=, data=0x2bf51c0, poll_in= > > > optimized out>, poll_out=0, poll_err=0) at socket.c:2240 > > > > #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler > > > > (event_pool=0x22e2d00) at event-epoll.c:384 > > > > #15 event_dispatch_epoll (event_pool=0x22e2d00) at > > > > event-epoll.c:445 > > > > #16 0x00407e93 in main (argc=19, > > > > argv=0x7fffea6c7f88) at glusterfsd.c:2023 > > > > (gdb) f 4 > > > > #4 pl_inodelk_client_cleanup (this=0x230d910, > > > > ctx=0x7fee700f0c60) at inodelk.c:428 > > > > 428 pl_inodelk_log_cleanup (l); > > > > (gdb) p l->pl_inode->refkeeper > > > > $1 = (inode_t *) 0x0 > > > > (gdb) > > > > > > > > pl_inode->refkeeper was found to be NULL even when there > > &
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
- Original Message - From: "Anand Avati" To: "Krutika Dhananjay" , "Pranith Kumar Karampuri" Cc: gluster-devel@gluster.org Sent: Thursday, June 5, 2014 12:38:41 PM Subject: Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator On 6/4/14, 9:43 PM, Krutika Dhananjay wrote: > > > > > *From: *"Pranith Kumar Karampuri" > *To: *"Krutika Dhananjay" , "Anand Avati" > > *Cc: *gluster-devel@gluster.org > *Sent: *Wednesday, June 4, 2014 12:23:59 PM > *Subject: *Re: [Gluster-devel] Regarding doing away with refkeeper > in locks xlator > > > On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote: > > > On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: > > Hi, > > Recently there was a crash in locks translator (BZ 1103347, > BZ 1097102) with the following backtrace: > (gdb) bt > #0 uuid_unpack (in=0x8 , > uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 > #1 0x7feeba9e19d6 in uuid_unparse_x (uu= optimized out>, out=0x2350fc0 > "081bbc7a-7551-44ac-85c7-aad5e2633db9", > fmt=0x7feebaa08e00 > "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at > ../../contrib/uuid/unparse.c:55 > #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 out of bounds>) at common-utils.c:2138 > #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup > (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 > #4 pl_inodelk_client_cleanup (this=0x230d910, > ctx=0x7fee700f0c60) at inodelk.c:428 > #5 0x7feeb06ddf3a in pl_client_disconnect_cbk > (this=0x230d910, client=) at posix.c:2550 > #6 0x7feeba9fa2dd in gf_client_disconnect > (client=0x27724a0) at client_t.c:368 > #7 0x7feeab77ed48 in server_connection_cleanup > (this=0x2316390, client=0x27724a0, flags= out>) at server-helpers.c:354 > #8 0x7feeab77ae2c in server_rpc_notify (rpc= optimized out>, xl=0x2316390, event=, > data=0x2bf51c0) at server.c:527 > #9 0x7feeba775155 in rpcsvc_handle_disconnect > (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 > #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, > mydata=, event=, > data=0x2bf51c0) at rpcsvc.c:758 > #11 0x7feeba778638 in rpc_transport_notify (this= optimized out>, event=, data= optimized out>) at rpc-transport.c:512 > #12 0x7feeb115e971 in socket_event_poll_err (fd= optimized out>, idx=, data=0x2bf51c0, > poll_in=, poll_out=0, > poll_err=0) at socket.c:1071 > #13 socket_event_handler (fd=, > idx=, data=0x2bf51c0, poll_in= optimized out>, poll_out=0, poll_err=0) at socket.c:2240 > #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler > (event_pool=0x22e2d00) at event-epoll.c:384 > #15 event_dispatch_epoll (event_pool=0x22e2d00) at > event-epoll.c:445 > #16 0x00407e93 in main (argc=19, > argv=0x7fffea6c7f88) at glusterfsd.c:2023 > (gdb) f 4 > #4 pl_inodelk_client_cleanup (this=0x230d910, > ctx=0x7fee700f0c60) at inodelk.c:428 > 428 pl_inodelk_log_cleanup (l); > (gdb) p l->pl_inode->refkeeper > $1 = (inode_t *) 0x0 > (gdb) > > pl_inode->refkeeper was found to be NULL even when there > were some blocked inodelks in a certain domain of the inode, > which when dereferenced by the epoll thread in the cleanup > codepath led to a crash. > > On inspecting the code (for want of a consistent > reproducer), three things were found: > > 1. The function where the crash happens > (pl_inodelk_log_cleanup()), makes an attempt to resolve the > inode to path as can be seen below. But the way inode_path() > itself > works is to first construct the path based on the given > inode's ancestry and place it in the buffer provided. And if > all else fails, the gfid of the inode is placed in a certain > format (""). > This eliminates the need for statements from line 4 > through 7 below, thereby "preventing" dereferencing of > pl_inode->refkeeper. > Now, although this change prevents the crash > altogether, it still does not fix the race that led to > pl_inode->refkeeper becoming NULL, and comes at the cost of > printing "(null)" in the log message on line 9 every > time pl_inode->refkeeper is found to be NULL, rendering the > logged messages somewhat useless. > > > 0 pl_inode = lock->pl_inode; > 1 > 2 inode_path (pl_inode->refkeeper, NULL, &path); > 3 > 4 if (path) > 5 file = path; > 6 else > 7 file = uuid_utoa > (pl_inode->refkeeper->gfid); > 8 > 9 gf_log (THIS->name, GF_LOG_WARNING, > 10 "releasing lock on %s held by &
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 6/4/14, 9:43 PM, Krutika Dhananjay wrote: *From: *"Pranith Kumar Karampuri" *To: *"Krutika Dhananjay" , "Anand Avati" *Cc: *gluster-devel@gluster.org *Sent: *Wednesday, June 4, 2014 12:23:59 PM *Subject: *Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote: On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 ) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=, xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=, event=, data=) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428pl_inodelk_log_cleanup (l); (gdb) p l->pl_inode->refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode->refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (""). This eliminates the need for statements from line 4 through 7 below, thereby "preventing" dereferencing of pl_inode->refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost of printing "(null)" in the log message on line 9 every time pl_inode->refkeeper is found to be NULL, rendering the logged messages somewhat useless. 0 pl_inode = lock->pl_inode; 1 2 inode_path (pl_inode->refkeeper, NULL, &path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_ino
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 6/3/14, 11:32 PM, Pranith Kumar Karampuri wrote: On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 ) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=, xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=, event=, data=) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428pl_inodelk_log_cleanup (l); (gdb) p l->pl_inode->refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode->refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (""). This eliminates the need for statements from line 4 through 7 below, thereby "preventing" dereferencing of pl_inode->refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost of printing "(null)" in the log message on line 9 every time pl_inode->refkeeper is found to be NULL, rendering the logged messages somewhat useless. 0 pl_inode = lock->pl_inode; 1 2 inode_path (pl_inode->refkeeper, NULL, &path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_inode->refkeeper->gfid); 8 9 gf_log (THIS->name, GF_LOG_WARNING, 10 "releasing lock on %s held by " 11 "{client=%p, pid=%"PRId64" lk-owner=%s}", 12 file, lock->client, (uint64_t) lock->client_pid, 13 lkowner_utoa (&lock->owner)); <\code> I think this logging code is from the days when gfid handle concept was not there. So it wasn't returning in cases the path is not present in the dentries. I believe the else block can be deleted safely now. Pranith 2. There is at least one codepath found that can lead to this crash: Imagine an inode on which an inodelk operation is attempted by a client and is successfully granted too. Now, between the time the lock was granted and pl_update_refkeeper() was called by this thread, the client could send a DISCONNECT event, causing cleanup codepath to be executed, where the epoll thread crashes on dereferencing pl_inode->refkeeper which is STILL NULL at this point. Besides, there are still places in locks xlator where the refkeeper is NOT updated whenever the lists are modified - for instance in the cleanup codepath from a DISCONNECT. 3. Also, pl_update_refkeeper() seems to be not taking into account blocked locks on the inode in the __pl_inode_is_empty() check, when it should, as there could still be cases where the granted list could be empty but not the blocked list at the time of udpating the refkeeper, in which case pl_inode must still take ref on the inode.
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 6/3/14, 11:07 PM, Krutika Dhananjay wrote: Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 ) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=, xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=, event=, data=) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428pl_inodelk_log_cleanup (l); (gdb) p l->pl_inode->refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode->refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (""). This eliminates the need for statements from line 4 through 7 below, thereby "preventing" dereferencing of pl_inode->refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost of printing "(null)" in the log message on line 9 every time pl_inode->refkeeper is found to be NULL, rendering the logged messages somewhat useless. 0 pl_inode = lock->pl_inode; 1 2 inode_path (pl_inode->refkeeper, NULL, &path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_inode->refkeeper->gfid); 8 9 gf_log (THIS->name, GF_LOG_WARNING, 10 "releasing lock on %s held by " 11 "{client=%p, pid=%"PRId64" lk-owner=%s}", 12 file, lock->client, (uint64_t) lock->client_pid, 13 lkowner_utoa (&lock->owner)); <\code> 2. There is at least one codepath found that can lead to this crash: Imagine an inode on which an inodelk operation is attempted by a client and is successfully granted too. Now, between the time the lock was granted and pl_update_refkeeper() was called by this thread, the client could send a DISCONNECT event, causing cleanup codepath to be executed, where the epoll thread crashes on dereferencing pl_inode->refkeeper which is STILL NULL at this point. Besides, there are still places in locks xlator where the refkeeper is NOT updated whenever the lists are modified - for instance in the cleanup codepath from a DISCONNECT. 3. Also, pl_update_refkeeper() seems to be not taking into account blocked locks on the inode in the __pl_inode_is_empty() check, when it should, as there could still be cases where the granted list could be empty but not the blocked list at the time of udpating the refkeeper, in which case pl_inode must still take ref on the inode. Proposed solution to 2/3: 1. Do away with refkeeper in pl_inode_t altogether. 2. Let every lock object (inodelk/entryllk/posix_lock) have an inode_t * member to act as a placeholder for the associated inode object on which it is locking. 3. Let each
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
- Original Message - > From: "Pranith Kumar Karampuri" > To: "Krutika Dhananjay" , "Anand Avati" > > Cc: gluster-devel@gluster.org > Sent: Wednesday, June 4, 2014 12:23:59 PM > Subject: Re: [Gluster-devel] Regarding doing away with refkeeper in locks > xlator > On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote: > > On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: > > > > Hi, > > > > > > Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) > > > with > > > the following backtrace: > > > > > > (gdb) bt > > > > > > #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at > > > ../../contrib/uuid/unpack.c:44 > > > > > > #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, > > > out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", > > > > > > fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at > > > ../../contrib/uuid/unparse.c:55 > > > > > > #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 ) > > > at > > > common-utils.c:2138 > > > > > > #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, > > > ctx=0x7fee700f0c60) at inodelk.c:396 > > > > > > #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at > > > inodelk.c:428 > > > > > > #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, > > > client=) at posix.c:2550 > > > > > > #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at > > > client_t.c:368 > > > > > > #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, > > > client=0x27724a0, flags=) at server-helpers.c:354 > > > > > > #8 0x7feeab77ae2c in server_rpc_notify (rpc=, > > > xl=0x2316390, event=, data=0x2bf51c0) at > > > server.c:527 > > > > > > #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, > > > trans=0x2bf51c0) at rpcsvc.c:720 > > > > > > #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata= > > optimized out>, event=, data=0x2bf51c0) at > > > rpcsvc.c:758 > > > > > > #11 0x7feeba778638 in rpc_transport_notify (this= > > out>, > > > event=, data=) at > > > rpc-transport.c:512 > > > > > > #12 0x7feeb115e971 in socket_event_poll_err (fd= > > out>, > > > idx=, data=0x2bf51c0, poll_in=, > > > poll_out=0, > > > > > > poll_err=0) at socket.c:1071 > > > > > > #13 socket_event_handler (fd=, idx= > > out>, data=0x2bf51c0, poll_in=, poll_out=0, > > > poll_err=0) > > > at socket.c:2240 > > > > > > #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler > > > (event_pool=0x22e2d00) > > > at event-epoll.c:384 > > > > > > #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 > > > > > > #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at > > > glusterfsd.c:2023 > > > > > > (gdb) f 4 > > > > > > #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at > > > inodelk.c:428 > > > > > > 428 pl_inodelk_log_cleanup (l); > > > > > > (gdb) p l->pl_inode->refkeeper > > > > > > $1 = (inode_t *) 0x0 > > > > > > (gdb) > > > > > > pl_inode->refkeeper was found to be NULL even when there were some > > > blocked > > > inodelks in a certain domain of the inode, > > > > > > which when dereferenced by the epoll thread in the cleanup codepath led > > > to > > > a > > > crash. > > > > > > On inspecting the code (for want of a consistent reproducer), three > > > things > > > were found: > > > > > > 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes > > > an > > > attempt to resolve the inode to path as can be seen below. But the way > > > inode_path() itself > > > > > > works is to first construct the path based on the given inode's ancestry > > > and > > > place it in the buffer provided. And if all else fails, the gfid of the > > > inode is placed in a certain format (""). > > > > > > This eliminates the need for statements from line 4 through 7 below, > > > thereby >
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote: On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 bounds>) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=out>, xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=out>, event=, data=) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=out>, idx=, data=0x2bf51c0, poll_in=optimized out>, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=, idx=optimized out>, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428pl_inodelk_log_cleanup (l); (gdb) p l->pl_inode->refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode->refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (""). This eliminates the need for statements from line 4 through 7 below, thereby "preventing" dereferencing of pl_inode->refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost of printing "(null)" in the log message on line 9 every time pl_inode->refkeeper is found to be NULL, rendering the logged messages somewhat useless. 0 pl_inode = lock->pl_inode; 1 2 inode_path (pl_inode->refkeeper, NULL, &path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_inode->refkeeper->gfid); 8 9 gf_log (THIS->name, GF_LOG_WARNING, 10 "releasing lock on %s held by " 11 "{client=%p, pid=%"PRId64" lk-owner=%s}", 12 file, lock->client, (uint64_t) lock->client_pid, 13 lkowner_utoa (&lock->owner)); <\code> I think this logging code is from the days when gfid handle concept was not there. So it wasn't returning in cases the path is not present in the dentries. I believe the else block can be deleted safely now. Pranith 2. There is at least one codepath found that can lead to this crash: Imagine an inode on which an inodelk operation is attempted by a client and is successfully granted too. Now, between the time the lock was granted and pl_update_refkeeper() was called by this thread, the client could send a DISCONNECT event, causing cleanup codepath to be executed, where the epoll thread crashes on dereferencing pl_inode->refkeeper which is STILL NULL at this point. Besides, there are still places in locks xlator where the refkeeper is NOT updated whenever the lists are modified - for instance in the cleanup codepath from a DISCONNECT. 3. Also, pl_update_refkeeper() seems to be not taking into account blocked locks on the inode in the __pl_inode_is_empty() check, when it should, as there could still be cases where the granted list could be empty but not the blocke
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 bounds>) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=out>, xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=out>, event=, data=) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=out>, idx=, data=0x2bf51c0, poll_in=optimized out>, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=, idx=optimized out>, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428pl_inodelk_log_cleanup (l); (gdb) p l->pl_inode->refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode->refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (""). This eliminates the need for statements from line 4 through 7 below, thereby "preventing" dereferencing of pl_inode->refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost of printing "(null)" in the log message on line 9 every time pl_inode->refkeeper is found to be NULL, rendering the logged messages somewhat useless. 0 pl_inode = lock->pl_inode; 1 2 inode_path (pl_inode->refkeeper, NULL, &path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_inode->refkeeper->gfid); 8 9 gf_log (THIS->name, GF_LOG_WARNING, 10 "releasing lock on %s held by " 11 "{client=%p, pid=%"PRId64" lk-owner=%s}", 12 file, lock->client, (uint64_t) lock->client_pid, 13 lkowner_utoa (&lock->owner)); <\code> I think this logging code is from the days when gfid handle concept was not there. So it wasn't returning in cases the path is not present in the dentries. I believe the else block can be deleted safely now. Pranith 2. There is at least one codepath found that can lead to this crash: Imagine an inode on which an inodelk operation is attempted by a client and is successfully granted too. Now, between the time the lock was granted and pl_update_refkeeper() was called by this thread, the client could send a DISCONNECT event, causing cleanup codepath to be executed, where the epoll thread crashes on dereferencing pl_inode->refkeeper which is STILL NULL at this point. Besides, there are still places in locks xlator where the refkeeper is NOT updated whenever the lists are modified - for instance in the cleanup codepath from a DISCONNECT. 3. Also, pl_update_refkeeper() seems to be not taking into account blocked locks on the inode in the __pl_inode_is_empty() check, when it should, as there could still be cases where the granted list could be empty but not the blocked list at the time of udpating the refkeeper, in which c
[Gluster-devel] Regarding doing away with refkeeper in locks xlator
Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 ) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=, xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=, event=, data=) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=, idx=, data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428 pl_inodelk_log_cleanup (l); (gdb) p l->pl_inode->refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode->refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (""). This eliminates the need for statements from line 4 through 7 below, thereby "preventing" dereferencing of pl_inode->refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost of printing "(null)" in the log message on line 9 every time pl_inode->refkeeper is found to be NULL, rendering the logged messages somewhat useless. 0 pl_inode = lock->pl_inode; 1 2 inode_path (pl_inode->refkeeper, NULL, &path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_inode->refkeeper->gfid); 8 9 gf_log (THIS->name, GF_LOG_WARNING, 10 "releasing lock on %s held by " 11 "{client=%p, pid=%"PRId64" lk-owner=%s}", 12 file, lock->client, (uint64_t) lock->client_pid, 13 lkowner_utoa (&lock->owner)); <\code> 2. There is at least one codepath found that can lead to this crash: Imagine an inode on which an inodelk operation is attempted by a client and is successfully granted too. Now, between the time the lock was granted and pl_update_refkeeper() was called by this thread, the client could send a DISCONNECT event, causing cleanup codepath to be executed, where the epoll thread crashes on dereferencing pl_inode->refkeeper which is STILL NULL at this point. Besides, there are still places in locks xlator where the refkeeper is NOT updated whenever the lists are modified - for instance in the cleanup codepath from a DISCONNECT. 3. Also, pl_update_refkeeper() seems to be not taking into account blocked locks on the inode in the __pl_inode_is_empty() check, when it should, as there could still be cases where the granted list could be empty but not the blocked list at the time of udpating the refkeeper, in which case pl_inode must still take ref on the inode. Proposed solution to 2/3: 1. Do away with refkeeper in pl_inode_t altogether. 2. Let every lock object (inodelk/entryllk/posix_lock) have an inode_t * member to act as a placeholder for the associated inode object on which it is locking. 3. Let each lock object hold a ref on the inode at the time of its creation and unref the inode during its destruction. Please let me know what you think of the above. -Krutika ___