Hi Ying,
Thanks for the patches. I introduced these bugs and was sick for more than 2 weeks, so you had to fix them. I have few comments for the entire series. [tipc-discussion] [net 2/5] tipc: adjust the policy of holding subscription kref<https://sourceforge.net/p/tipc/mailman/message/35676908/><https://sourceforge.net/p/tipc/mailman/message/35676908/> The above commit looks good and should fix the issue reported by John. [tipc-discussion] [net 1/5] tipc: advance the time of deleting subscription from subscriber->subscrp_list<https://sourceforge.net/p/tipc/mailman/message/35676906/> In the above commit, we miss to delete sub->subscrp_list at subscription timeout. We placed this common code in refcount cleanup to avoid code duplication. [tipc-discussion] [net 3/5] tipc: adjust policy that sub->timer holds subscription kref<https://sourceforge.net/p/tipc/mailman/message/35676904/><https://sourceforge.net/p/tipc/mailman/message/35676904/> With this patch, we will never trigger a subscription cleanup at tipc_subscrp_timeout() as the refcount will be one when we exit tipc_subscrp_timeout(). Thus the subscription is still present in nametable and hence the subscriber will receive new events for this subscription after receiving the subscription timeout until the subscriber terminates. We need to update our design policy that at subscription timeout, we remove all references to the subscription and it is no longer accessible. This conflicts with the change proposed by the above commit. If we still need to include this commit, we need to adapt tipc_subscrp_timeout(), as: 1. tipc_nametbl_unsubscribe() 2. tipc_subscrp_put() 3. tipc_subscrp_put() static void tipc_subscrp_delete(struct tipc_subscription *sub) { u32 timeout = htohl(sub->evt.s.timeout, sub->swap); - if (timeout == TIPC_WAIT_FOREVER || del_timer(&sub->timer)) + if (timeout != TIPC_WAIT_FOREVER && del_timer(&sub->timer)) { tipc_subscrp_put(sub); + tipc_subscrp_put(sub); } else { + tipc_subscrp_put(sub); } } But this makes the functions look odd as we adjust refcount twice in the same function. To get the subscription timeout cleanup working correctly, we need to sqash this commit together :[tipc-discussion] [net 4/5] tipc: advance the time of calling tipc_nametbl_unsubscribe<https://sourceforge.net/p/tipc/mailman/message/35676905/> [tipc-discussion] [net 5/5] tipc: remove unnecessary increasement of subscription refcount<https://sourceforge.net/p/tipc/mailman/message/35676917/> In this commit message, you need to edit the stacktraces as the scenario you describe will not happen due to your previous patches. As we never perform a tipc_subscrp_get() in tipc_subscrp_report_overlap(). /Partha ________________________________ From: John Thompson <thompa....@gmail.com> Sent: Tuesday, February 21, 2017 10:15 PM To: Ying Xue Cc: Jon Maloy; Parthasarathy Bhuvaragan; tipc-discussion@lists.sourceforge.net Subject: Re: [net 5/5] tipc: remove unnecessary increasement of subscription refcount Sorry, I was mistaken. You are right that you have removed the sub->lock from tipc_subscrp_kref_release() and so there is no chance of a deadlock. JT On Tue, Feb 21, 2017 at 11:22 PM, Ying Xue <ying....@windriver.com<mailto:ying....@windriver.com>> wrote: On 02/21/2017 06:13 AM, John Thompson wrote: > > > On Tue, Feb 21, 2017 at 12:39 AM, Ying Xue > <ying....@windriver.com<mailto:ying....@windriver.com> > <mailto:ying....@windriver.com<mailto:ying....@windriver.com>>> wrote: > > As the policy of holding subscription in subscriber time has been > adjusted, it's unnecessary to hold subscription refcount before > tipc_subscrp_delete() is called. As a consequence, subscriber->lock > can be safely removed to avoid the following two deadlock scenarios: > > CPU1: CPU2: > ---------- ---------------- > tipc_nametbl_publish > spin_lock_bh(&tn->nametbl_lock) > tipc_nametbl_insert_publ > tipc_nameseq_insert_publ > tipc_subscrp_report_overlap > tipc_subscrp_get > tipc_subscrp_send_event > tipc_close_conn > tipc_subscrb_release_cb > tipc_subscrb_delete > tipc_subscrp_put > tipc_subscrp_put > tipc_subscrp_kref_release > tipc_nametbl_unsubscribe > spin_lock_bh(&tn->nametbl_lock) > <<grab nametbl_lock again>> > > CPU1: CPU2: > ---------- ---------------- > tipc_nametbl_stop > spin_lock_bh(&tn->nametbl_lock) > tipc_purge_publications > tipc_nameseq_remove_publ > tipc_subscrp_report_overlap > tipc_subscrp_get > tipc_subscrp_send_event > tipc_close_conn > tipc_subscrb_release_cb > tipc_subscrb_delete > tipc_subscrp_put > tipc_subscrp_put > tipc_subscrp_kref_release > tipc_nametbl_unsubscribe > spin_lock_bh(&tn->nametbl_lock) > <<grab nametbl_lock again>> > > Reported-by: John Thompson > <thompa....@gmail.com<mailto:thompa....@gmail.com> > <mailto:thompa....@gmail.com<mailto:thompa....@gmail.com>>> > Reported-by: Jon Maloy > <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com> > <mailto:jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>>> > Fixes: d094c4d5f5 ("tipc: add subscription refcount to avoid invalid > delete") > Signed-off-by: Ying Xue > <ying....@windriver.com<mailto:ying....@windriver.com> > <mailto:ying....@windriver.com<mailto:ying....@windriver.com>>> > --- > net/tipc/subscr.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c > index 6624232..ea67cce 100644 > --- a/net/tipc/subscr.c > +++ b/net/tipc/subscr.c > @@ -168,9 +168,7 @@ static void tipc_subscrp_kref_release(struct > kref *kref) > struct tipc_net *tn = net_generic(sub->net, tipc_net_id); > struct tipc_subscriber *subscriber = sub->subscriber; > > - spin_lock_bh(&subscriber->lock); > atomic_dec(&tn->subscription_count); > - spin_unlock_bh(&subscriber->lock); > kfree(sub); > tipc_subscrb_put(subscriber); > } > @@ -202,11 +200,7 @@ static void tipc_subscrb_subscrp_delete(struct > tipc_subscriber *subscriber, > list_del(&sub->subscrp_list); > > tipc_nametbl_unsubscribe(sub); > - tipc_subscrp_get(sub); > - spin_unlock_bh(&subscriber->lock); > > > By removing the unlocking at this point couldn't you end up with a > deadlock on subscriber->lock > due to tipc_subscrp_delete() putting the subscrp twice (which could end > up with kref == 0) and > as a result calling tipc_subscrp_kref_release() which gets subscriber->lock? > I think there is no deadlock risk even if tipc_subscrp_put() will be called twice in tipc_subscrp_delete() because tipc_subscrp_kref_release() doesn't hold subscriber->lock after the change. Please take a look at tipc_subscrp_kref_release() code: static void tipc_subscrp_kref_release(struct kref *kref) { struct tipc_subscription *sub = container_of(kref, struct tipc_subscription, kref); struct tipc_net *tn = net_generic(sub->net, tipc_net_id); struct tipc_subscriber *subscriber = sub->subscriber; atomic_dec(&tn->subscription_count); kfree(sub); tipc_subscrb_put(subscriber); } > > > tipc_subscrp_delete(sub); > - tipc_subscrp_put(sub); > - spin_lock_bh(&subscriber->lock); > > if (s) > break; > -- > 2.7.4 > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion