Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-27 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi,
Ok, I see your point ,two rounds of pollin means two different iobref, so the 
first pollin will not affect the second. But from the stack the second poll in 
stuck in iobref_unref LOCK operation,
Do you think it is possible that the iobref_destroy does not deal with this 
iobref->lock will cause it to be some unsanitized value and cause this stuck?

cynthia

From: Raghavendra Gowdappa 
Sent: Thursday, April 25, 2019 2:07 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 15, 2019 at 12:52 PM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi,
The reason why I move event_handled to the end of socket_event_poll_in is 
because if event_handled is called before rpc_transport_pollin_destroy, it 
allowed another round of event_dispatch_epoll_handler happen before  
rpc_transport_pollin_destroy, in this way, when the latter poll in goes to 
rpc_transport_pollin_destroy , there is a chance that the pollin->iobref has 
already been destroyed by the first one(there is no lock destroy for 
iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK 
(>lock);”

But, priv->incoming.iobref (from which pollin->iobref is initialized from) is 
set to NULL in __socket_proto_state_machine:

if (in->record_state == SP_STATE_COMPLETE) {
in->record_state = SP_STATE_NADA;
__socket_reset_priv (priv);
}

And since pollin is an allocated object only one instance of 
socket_event_poll_in will be aware of this object. IOW, multiple instances of 
socket_event_poll_in will get different pollin objects. So, the only way 
pollin->iobref could be shared across multiple invocations of 
socket_event_poll_in is due to common shared object priv->incoming.iobref. But 
that too is sanitized by the time __socket_proto_state_machine completes and 
__socket_proto_state_machine is executed under lock. So, I don't see how two 
different concurrent codepaths can get hold of same iobref.

I find the one of recent patch
SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket

I think the point is to avoid the concurrent handling of the same socket at the 
same time, but after my test with this patch this problem also exists, so I 
think event_handled is still called too early to allow concurrent handling of 
the same socket happen, and after move it to the end of socket_event_poll this 
glusterd stuck issue disappeared.
cynthia
From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Monday, April 15, 2019 2:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Ok, thanks for your comment!

cynthia

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Monday, April 15, 2019 11:52 AM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15

Cynthia,

On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi,
I made a patch and according to my test, this glusterd stuck issue disappear 
with my patch. Only need to move  event_handled to the end of 
socket_event_poll_in function.

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }

-if (notify_handled && (ret != -1))
-event_handled (ctx->event_pool, priv->sock, priv->idx,
-   priv->gen);
@@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }
 pthread_mutex_unlock (>notify.lock);
 }
+if (notify_handled && (ret != -1))
+event_handled (ctx->event_pool, priv->sock, priv->idx,
+   priv->gen);

Thanks for this tip. Though this helps in fixing the hang, this change has 
performance impact. Moving event_handled to end of poll_in means, socket will 
be added back for polling of new events only _after_ the rpc is msg is 
processed by higher layers (like EC) and higher layers can have significant 
latency  for processing the msg. Which means, socket will be out of polling for 
longer periods of time which decreases the throughput (number of msgs read per 
second) affecting performance. However, this experiment definitely indicates 
there is a codepath where event_handled is not called (and hence causing the 
hang). I'll go through this codepath again.

Can you check whether patch 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-25 Thread Raghavendra Gowdappa
On Mon, Apr 15, 2019 at 12:52 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> The reason why I move event_handled to the end of socket_event_poll_in is
> because if event_handled is called before rpc_transport_pollin_destroy, it
> allowed another round of event_dispatch_epoll_handler happen before
>  rpc_transport_pollin_destroy, in this way, when the latter poll in goes to
> rpc_transport_pollin_destroy , there is a chance that the pollin->iobref
> has already been destroyed by the first one(there is no lock destroy for
> iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK
> (>lock);”
>

But, priv->incoming.iobref (from which pollin->iobref is initialized from)
is set to NULL in __socket_proto_state_machine:

if (in->record_state == SP_STATE_COMPLETE) {
in->record_state = SP_STATE_NADA;
__socket_reset_priv (priv);
}

And since pollin is an allocated object only one instance of
socket_event_poll_in will be aware of this object. IOW, multiple instances
of socket_event_poll_in will get different pollin objects. So, the only way
pollin->iobref could be shared across multiple invocations of
socket_event_poll_in is due to common shared object priv->incoming.iobref.
But that too is sanitized by the time __socket_proto_state_machine
completes and __socket_proto_state_machine is executed under lock. So, I
don't see how two different concurrent codepaths can get hold of same
iobref.

I find the one of recent patch
>
SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
> I think the point is to avoid the concurrent handling of the same socket
> at the same time, but after my test with this patch this problem also
> exists, so I think event_handled is still called too early to allow
> concurrent handling of the same socket happen, and after move it to the end
> of socket_event_poll this glusterd stuck issue disappeared.
>
> cynthia
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 2:36 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Ok, thanks for your comment!
>
>
>
> cynthia
>
>
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 11:52 AM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
> Cynthia,
>
>
>
> On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi,
>
> I made a patch and according to my test, this glusterd stuck issue
> disappear with my patch. Only need to move  event_handled to the end of
> socket_event_poll_in function.
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
>
> +++ b/rpc/rpc-transport/socket/src/socket.c
>
> @@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>
>
> -if (notify_handled && (ret != -1))
>
> -event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> -   priv->gen);
>
> @@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>  pthread_mutex_unlock (>notify.lock);
>
>  }
>
> +if (notify_handled && (ret != -1))
>
> +event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> +   priv->gen);
>
>
>
> Thanks for this tip. Though this helps in fixing the hang, this change has
> performance impact. Moving event_handled to end of poll_in means, socket
> will be added back for polling of new events only _after_ the rpc is msg is
> processed by higher layers (like EC) and higher layers can have significant
> latency  for processing the msg. Which means, socket will be out of polling
> for longer periods of time which decreases the throughput (number of msgs
> read per second) affecting performance. However, this experiment definitely
> indicates there is a codepath where event_handled is not called (and hence
> causing the hang). I'll go through this codepath again.
>
>
>
> Can you check whether patch [1] fixes the issue you are seeing?
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/22566
>
>
>
>
>
> Thanks for that experiment :).
>
>
>
>  return ret;
>
> }
>
>
>
> cynthia
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, April 09, 2019 3:57 PM
> *To:* 'Raghavendra Gowdappa' 
> *Cc:* gluster-devel@gluster.org
> *Subject:* RE: glusterd stuck for glusterfs with version 3.12.15
>
>
>
> Can you figure out some possible reason why iobref is corrupted, is it
> possible that thread 8 has called poll in and iobref has been relased, but
> 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-15 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Ok, I got your point, thanks for responding!

cynthia

From: Raghavendra Gowdappa 
Sent: Monday, April 15, 2019 4:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 15, 2019 at 12:52 PM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi,
The reason why I move event_handled to the end of socket_event_poll_in is 
because if event_handled is called before rpc_transport_pollin_destroy, it 
allowed another round of event_dispatch_epoll_handler happen before  
rpc_transport_pollin_destroy, in this way, when the latter poll in goes to 
rpc_transport_pollin_destroy , there is a chance that the pollin->iobref has 
already been destroyed by the first one(there is no lock destroy for 
iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK 
(>lock);”
I find the one of recent patch
SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket

I think the point is to avoid the concurrent handling of the same socket at the 
same time, but after my test with this patch this problem also exists, so I 
think event_handled is still called too early to allow concurrent handling of 
the same socket happen,

But concurrent handling is required for performance. So, we cannot serialize it.

and after move it to the end of socket_event_poll this glusterd stuck issue 
disappeared.

Ideally there shouldn't be a single instance of datastructure that should be 
shared between two instances of pollin handling. My initial code-reading didn't 
find any issues with the way iobref is handled even when there is concurrent 
reading when the previous message was still not notified. I'll continue to 
investigate how objects are shared across two instances of pollin. Will post if 
I find anything interesting.

cynthia
From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Monday, April 15, 2019 2:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Ok, thanks for your comment!

cynthia

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Monday, April 15, 2019 11:52 AM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15

Cynthia,

On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi,
I made a patch and according to my test, this glusterd stuck issue disappear 
with my patch. Only need to move  event_handled to the end of 
socket_event_poll_in function.

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }

-if (notify_handled && (ret != -1))
-event_handled (ctx->event_pool, priv->sock, priv->idx,
-   priv->gen);
@@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }
 pthread_mutex_unlock (>notify.lock);
 }
+if (notify_handled && (ret != -1))
+event_handled (ctx->event_pool, priv->sock, priv->idx,
+   priv->gen);

Thanks for this tip. Though this helps in fixing the hang, this change has 
performance impact. Moving event_handled to end of poll_in means, socket will 
be added back for polling of new events only _after_ the rpc is msg is 
processed by higher layers (like EC) and higher layers can have significant 
latency  for processing the msg. Which means, socket will be out of polling for 
longer periods of time which decreases the throughput (number of msgs read per 
second) affecting performance. However, this experiment definitely indicates 
there is a codepath where event_handled is not called (and hence causing the 
hang). I'll go through this codepath again.

Can you check whether patch [1] fixes the issue you are seeing?

[1] https://review.gluster.org/#/c/glusterfs/+/22566


Thanks for that experiment :).

 return ret;
}

cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Tuesday, April 09, 2019 3:57 PM
To: 'Raghavendra Gowdappa' mailto:rgowd...@redhat.com>>
Cc: gluster-devel@gluster.org
Subject: RE: glusterd stuck for glusterfs with version 3.12.15

Can you figure out some possible reason why iobref is corrupted, is it possible 
that thread 8 has called poll in and iobref has been relased, but the lock 
within it is not properly released (as I can not find any free lock operation 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-15 Thread Raghavendra Gowdappa
On Mon, Apr 15, 2019 at 12:52 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> The reason why I move event_handled to the end of socket_event_poll_in is
> because if event_handled is called before rpc_transport_pollin_destroy, it
> allowed another round of event_dispatch_epoll_handler happen before
>  rpc_transport_pollin_destroy, in this way, when the latter poll in goes to
> rpc_transport_pollin_destroy , there is a chance that the pollin->iobref
> has already been destroyed by the first one(there is no lock destroy for
> iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK
> (>lock);”
>
> I find the one of recent patch
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
> I think the point is to avoid the concurrent handling of the same socket
> at the same time, but after my test with this patch this problem also
> exists, so I think event_handled is still called too early to allow
> concurrent handling of the same socket happen,
>

But concurrent handling is required for performance. So, we cannot
serialize it.

and after move it to the end of socket_event_poll this glusterd stuck issue
> disappeared.
>

Ideally there shouldn't be a single instance of datastructure that should
be shared between two instances of pollin handling. My initial code-reading
didn't find any issues with the way iobref is handled even when there is
concurrent reading when the previous message was still not notified. I'll
continue to investigate how objects are shared across two instances of
pollin. Will post if I find anything interesting.

cynthia
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 2:36 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Ok, thanks for your comment!
>
>
>
> cynthia
>
>
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 11:52 AM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
> Cynthia,
>
>
>
> On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi,
>
> I made a patch and according to my test, this glusterd stuck issue
> disappear with my patch. Only need to move  event_handled to the end of
> socket_event_poll_in function.
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
>
> +++ b/rpc/rpc-transport/socket/src/socket.c
>
> @@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>
>
> -if (notify_handled && (ret != -1))
>
> -event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> -   priv->gen);
>
> @@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>  pthread_mutex_unlock (>notify.lock);
>
>  }
>
> +if (notify_handled && (ret != -1))
>
> +event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> +   priv->gen);
>
>
>
> Thanks for this tip. Though this helps in fixing the hang, this change has
> performance impact. Moving event_handled to end of poll_in means, socket
> will be added back for polling of new events only _after_ the rpc is msg is
> processed by higher layers (like EC) and higher layers can have significant
> latency  for processing the msg. Which means, socket will be out of polling
> for longer periods of time which decreases the throughput (number of msgs
> read per second) affecting performance. However, this experiment definitely
> indicates there is a codepath where event_handled is not called (and hence
> causing the hang). I'll go through this codepath again.
>
>
>
> Can you check whether patch [1] fixes the issue you are seeing?
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/22566
>
>
>
>
>
> Thanks for that experiment :).
>
>
>
>  return ret;
>
> }
>
>
>
> cynthia
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, April 09, 2019 3:57 PM
> *To:* 'Raghavendra Gowdappa' 
> *Cc:* gluster-devel@gluster.org
> *Subject:* RE: glusterd stuck for glusterfs with version 3.12.15
>
>
>
> Can you figure out some possible reason why iobref is corrupted, is it
> possible that thread 8 has called poll in and iobref has been relased, but
> the lock within it is not properly released (as I can not find any free
> lock operation in iobref_destroy), then thread 9 called 
> rpc_transport_pollin_destroy
> again, and so stuck on this lock
>
> Also, there should not be two thread handling the same socket at the same
> time, although there has been a patch claimed to tackle 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-15 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi,
The reason why I move event_handled to the end of socket_event_poll_in is 
because if event_handled is called before rpc_transport_pollin_destroy, it 
allowed another round of event_dispatch_epoll_handler happen before  
rpc_transport_pollin_destroy, in this way, when the latter poll in goes to 
rpc_transport_pollin_destroy , there is a chance that the pollin->iobref has 
already been destroyed by the first one(there is no lock destroy for 
iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK 
(>lock);”
I find the one of recent patch
SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket

I think the point is to avoid the concurrent handling of the same socket at the 
same time, but after my test with this patch this problem also exists, so I 
think event_handled is still called too early to allow concurrent handling of 
the same socket happen, and after move it to the end of socket_event_poll this 
glusterd stuck issue disappeared.
cynthia
From: Raghavendra Gowdappa 
Sent: Monday, April 15, 2019 2:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Ok, thanks for your comment!

cynthia

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Monday, April 15, 2019 11:52 AM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15

Cynthia,

On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi,
I made a patch and according to my test, this glusterd stuck issue disappear 
with my patch. Only need to move  event_handled to the end of 
socket_event_poll_in function.

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }

-if (notify_handled && (ret != -1))
-event_handled (ctx->event_pool, priv->sock, priv->idx,
-   priv->gen);
@@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }
 pthread_mutex_unlock (>notify.lock);
 }
+if (notify_handled && (ret != -1))
+event_handled (ctx->event_pool, priv->sock, priv->idx,
+   priv->gen);

Thanks for this tip. Though this helps in fixing the hang, this change has 
performance impact. Moving event_handled to end of poll_in means, socket will 
be added back for polling of new events only _after_ the rpc is msg is 
processed by higher layers (like EC) and higher layers can have significant 
latency  for processing the msg. Which means, socket will be out of polling for 
longer periods of time which decreases the throughput (number of msgs read per 
second) affecting performance. However, this experiment definitely indicates 
there is a codepath where event_handled is not called (and hence causing the 
hang). I'll go through this codepath again.

Can you check whether patch [1] fixes the issue you are seeing?

[1] https://review.gluster.org/#/c/glusterfs/+/22566


Thanks for that experiment :).

 return ret;
}

cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Tuesday, April 09, 2019 3:57 PM
To: 'Raghavendra Gowdappa' mailto:rgowd...@redhat.com>>
Cc: gluster-devel@gluster.org
Subject: RE: glusterd stuck for glusterfs with version 3.12.15

Can you figure out some possible reason why iobref is corrupted, is it possible 
that thread 8 has called poll in and iobref has been relased, but the lock 
within it is not properly released (as I can not find any free lock operation 
in iobref_destroy), then thread 9 called rpc_transport_pollin_destroy again, 
and so stuck on this lock
Also, there should not be two thread handling the same socket at the same time, 
although there has been a patch claimed to tackle this issue.

cynthia

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Tuesday, April 09, 2019 3:52 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-14 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Ok, thanks for your comment!

cynthia

From: Raghavendra Gowdappa 
Sent: Monday, April 15, 2019 11:52 AM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15

Cynthia,

On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi,
I made a patch and according to my test, this glusterd stuck issue disappear 
with my patch. Only need to move  event_handled to the end of 
socket_event_poll_in function.

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }

-if (notify_handled && (ret != -1))
-event_handled (ctx->event_pool, priv->sock, priv->idx,
-   priv->gen);
@@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }
 pthread_mutex_unlock (>notify.lock);
 }
+if (notify_handled && (ret != -1))
+event_handled (ctx->event_pool, priv->sock, priv->idx,
+   priv->gen);

Thanks for this tip. Though this helps in fixing the hang, this change has 
performance impact. Moving event_handled to end of poll_in means, socket will 
be added back for polling of new events only _after_ the rpc is msg is 
processed by higher layers (like EC) and higher layers can have significant 
latency  for processing the msg. Which means, socket will be out of polling for 
longer periods of time which decreases the throughput (number of msgs read per 
second) affecting performance. However, this experiment definitely indicates 
there is a codepath where event_handled is not called (and hence causing the 
hang). I'll go through this codepath again.

Thanks for that experiment :).

 return ret;
}

cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Tuesday, April 09, 2019 3:57 PM
To: 'Raghavendra Gowdappa' mailto:rgowd...@redhat.com>>
Cc: gluster-devel@gluster.org
Subject: RE: glusterd stuck for glusterfs with version 3.12.15

Can you figure out some possible reason why iobref is corrupted, is it possible 
that thread 8 has called poll in and iobref has been relased, but the lock 
within it is not properly released (as I can not find any free lock operation 
in iobref_destroy), then thread 9 called rpc_transport_pollin_destroy again, 
and so stuck on this lock
Also, there should not be two thread handling the same socket at the same time, 
although there has been a patch claimed to tackle this issue.

cynthia

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Tuesday, April 09, 2019 3:52 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(>lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-14 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi,
I made a patch and according to my test, this glusterd stuck issue disappear 
with my patch. Only need to move  event_handled to the end of 
socket_event_poll_in function.

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }

-if (notify_handled && (ret != -1))
-event_handled (ctx->event_pool, priv->sock, priv->idx,
-   priv->gen);
@@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this, gf_boolean_t 
notify_handled)
 }
 pthread_mutex_unlock (>notify.lock);
 }
+if (notify_handled && (ret != -1))
+event_handled (ctx->event_pool, priv->sock, priv->idx,
+   priv->gen);
 return ret;
}

cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Tuesday, April 09, 2019 3:57 PM
To: 'Raghavendra Gowdappa' 
Cc: gluster-devel@gluster.org
Subject: RE: glusterd stuck for glusterfs with version 3.12.15

Can you figure out some possible reason why iobref is corrupted, is it possible 
that thread 8 has called poll in and iobref has been relased, but the lock 
within it is not properly released (as I can not find any free lock operation 
in iobref_destroy), then thread 9 called rpc_transport_pollin_destroy again, 
and so stuck on this lock
Also, there should not be two thread handling the same socket at the same time, 
although there has been a patch claimed to tackle this issue.

cynthia

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Tuesday, April 09, 2019 3:52 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>>
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(>lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from 
/lib64/libpthread.so.0
#2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, 
idx=27) at socket.c:1201
#4  0x7f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#5  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edfffee84) at event-epoll.c:583
#6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at 
event-epoll.c:659
#7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0  0x7f9ee9fd785c in 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-09 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Can you figure out some possible reason why iobref is corrupted, is it possible 
that thread 8 has called poll in and iobref has been relased, but the lock 
within it is not properly released (as I can not find any free lock operation 
in iobref_destroy), then thread 9 called rpc_transport_pollin_destroy again, 
and so stuck on this lock
Also, there should not be two thread handling the same socket at the same time, 
although there has been a patch claimed to tackle this issue.

cynthia

From: Raghavendra Gowdappa 
Sent: Tuesday, April 09, 2019 3:52 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: gluster-devel@gluster.org
Subject: Re: glusterd stuck for glusterfs with version 3.12.15



On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(>lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from 
/lib64/libpthread.so.0
#2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, 
idx=27) at socket.c:1201
#4  0x7f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#5  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edfffee84) at event-epoll.c:583
#6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at 
event-epoll.c:659
#7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) frame 2
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
944 iobuf.c: No such file or directory.
(gdb) print *iobref
$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-09 Thread Raghavendra Gowdappa
On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi glusterfs experts,
>
> Good day!
>
> In my test env, sometimes glusterd stuck issue happened, and it is not
> responding to any gluster commands, when I checked this issue I find that
> glusterd thread 9 and thread 8 is dealing with the same socket, I thought
> following patch should be able to solve this issue, however after I merged
> this patch this issue still exist. When I looked into this code, it seems
> socket_event_poll_in called event_handled before
> rpc_transport_pollin_destroy, I think this gives the chance for another
> poll for the exactly the same socket. And caused this glusterd stuck issue,
> also, I find there is no   LOCK_DESTROY(>lock)
>
> In iobref_destroy, I think it is better to add destroy lock.
>
> Following is the gdb info when this issue happened, I would like to know
> your opinion on this issue, thanks!
>
>
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
>
>
>
>
> *GDB INFO:*
>
> Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in
> iobref_unref, I think
>
> Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from
> /lib64/libpthread.so.0
>
> #2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0,
> gen=4, idx=27) at socket.c:1201
>
> #4  0x7f9ee4fbf99c in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2480
>
> #5  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
>
> #6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at
> event-epoll.c:659
>
> #7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> (gdb) thread 9
>
> [Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
> (gdb) frame 2
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> 944 iobuf.c: No such file or directory.
>
> (gdb) print *iobref
>
> $1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222,
> __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
>
> __elision = 0, __list = {__prev = 0x4000, __next =
> 0x7f9ed00063b000}},
>
>   __size =
> "\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
> __align = 953482739714}}, ref = -256, iobrefs = 0x, alloced
> = -1, used = -1}
>

looks like the iobref 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-09 Thread Sanju Rakonde
t; #1  0x7f9ee9fd8bac in sigwait () from /lib64/libpthread.so.0
>
> #2  0x00409ed7 in glusterfs_sigwaiter ()
>
> #3  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #4  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 2 (Thread 0x7f9ee8932700 (LWP 1758)):
>
> #0  0x7f9ee9fd83b0 in nanosleep () from /lib64/libpthread.so.0
>
> #1  0x7f9eeb224545 in gf_timer_proc (data=0x1808580) at timer.c:164
>
> #2  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #3  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 1 (Thread 0x7f9eeb707780 (LWP 1757)):
>
> #0  0x7f9ee9fcfa3d in __pthread_timedjoin_ex () from
> /lib64/libpthread.so.0
>
> #1  0x00007f9eeb282b09 in event_dispatch_epoll (event_pool=0x17feb00) at
> event-epoll.c:746
>
> #2  0x7f9eeb246786 in event_dispatch (event_pool=0x17feb00) at
> event.c:124
>
> #3  0x0040ab95 in main ()
>
> (gdb)
>
> (gdb)
>
> (gdb) q!
>
> A syntax error in expression, near `'.
>
> (gdb) quit
>
>
>
> *From:* Sanju Rakonde 
> *Sent:* Monday, April 08, 2019 4:58 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* Raghavendra Gowdappa ;
> gluster-devel@gluster.org
> *Subject:* Re: [Gluster-devel] glusterd stuck for glusterfs with version
> 3.12.15
>
>
>
> Can you please capture output of "pstack $(pidof glusterd)" and send it to
> us? We need to capture this information when glusterd is struck.
>
>
>
> On Mon, Apr 8, 2019 at 8:05 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi glusterfs experts,
>
> Good day!
>
> In my test env, sometimes glusterd stuck issue happened, and it is not
> responding to any gluster commands, when I checked this issue I find that
> glusterd thread 9 and thread 8 is dealing with the same socket, I thought
> following patch should be able to solve this issue, however after I merged
> this patch this issue still exist. When I looked into this code, it seems
> socket_event_poll_in called event_handled before
> rpc_transport_pollin_destroy, I think this gives the chance for another
> poll for the exactly the same socket. And caused this glusterd stuck issue,
> also, I find there is no   LOCK_DESTROY(>lock)
>
> In iobref_destroy, I think it is better to add destroy lock.
>
> Following is the gdb info when this issue happened, I would like to know
> your opinion on this issue, thanks!
>
>
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
>
>
>
>
> *GDB INFO:*
>
> Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in
> iobref_unref, I think
>
> Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from
> /lib64/libpthread.so.0
>
> #2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0,
> gen=4, idx=27) at socket.c:1201
>
> #4  0x7f9ee4fbf99c in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2480
>
> #5  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
>
> #6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at
> event-epoll.c:659
>
> #7  0x7f9ee9fce5da in

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-08 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
gzhou) 
Cc: Raghavendra Gowdappa ; gluster-devel@gluster.org
Subject: Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

Can you please capture output of "pstack $(pidof glusterd)" and send it to us? 
We need to capture this information when glusterd is struck.

On Mon, Apr 8, 2019 at 8:05 AM Zhou, Cynthia (NSB - CN/Hangzhou) 
mailto:cynthia.z...@nokia-sbell.com>> wrote:
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(>lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from 
/lib64/libpthread.so.0
#2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, 
idx=27) at socket.c:1201
#4  0x7f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#5  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edfffee84) at event-epoll.c:583
#6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at 
event-epoll.c:659
#7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) frame 2
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
944 iobuf.c: No such file or directory.
(gdb) print *iobref
$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, 
__owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
__elision = 0, __list = {__prev = 0x4000, __next = 0x7f9ed00063b000}},
  __size = 
"\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
 __align = 953482739714}}, ref = -256, iobrefs = 0x, alloced =

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-08 Thread Sanju Rakonde
Can you please capture output of "pstack $(pidof glusterd)" and send it to
us? We need to capture this information when glusterd is struck.

On Mon, Apr 8, 2019 at 8:05 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi glusterfs experts,
>
> Good day!
>
> In my test env, sometimes glusterd stuck issue happened, and it is not
> responding to any gluster commands, when I checked this issue I find that
> glusterd thread 9 and thread 8 is dealing with the same socket, I thought
> following patch should be able to solve this issue, however after I merged
> this patch this issue still exist. When I looked into this code, it seems
> socket_event_poll_in called event_handled before
> rpc_transport_pollin_destroy, I think this gives the chance for another
> poll for the exactly the same socket. And caused this glusterd stuck issue,
> also, I find there is no   LOCK_DESTROY(>lock)
>
> In iobref_destroy, I think it is better to add destroy lock.
>
> Following is the gdb info when this issue happened, I would like to know
> your opinion on this issue, thanks!
>
>
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
>
>
>
>
> *GDB INFO:*
>
> Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in
> iobref_unref, I think
>
> Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from
> /lib64/libpthread.so.0
>
> #2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0,
> gen=4, idx=27) at socket.c:1201
>
> #4  0x7f9ee4fbf99c in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2480
>
> #5  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
>
> #6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at
> event-epoll.c:659
>
> #7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> (gdb) thread 9
>
> [Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
> (gdb) frame 2
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> 944 iobuf.c: No such file or directory.
>
> (gdb) print *iobref
>
> $1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222,
> __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
>
> __elision = 0, __list = {__prev = 0x4000, __next =
> 0x7f9ed00063b000}},
>
>   __size =
> 

[Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-07 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(>lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from 
/lib64/libpthread.so.0
#2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, 
idx=27) at socket.c:1201
#4  0x7f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#5  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edfffee84) at event-epoll.c:583
#6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at 
event-epoll.c:659
#7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) frame 2
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
944 iobuf.c: No such file or directory.
(gdb) print *iobref
$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, 
__owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
__elision = 0, __list = {__prev = 0x4000, __next = 0x7f9ed00063b000}},
  __size = 
"\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
 __align = 953482739714}}, ref = -256, iobrefs = 0x, alloced = 
-1, used = -1}
(gdb) quit
A debugging session is active.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel