Hi Ying,
These are good design changes, that definitely should go in asap. However, I 
feel deeply uncomfortable with such a big change going into 'net', especially 
since our previous, exceptionally large, contribution now has turned out to 
have problems. I wonder if we could not get away with something simpler for 
'net'.

Looking closer at your series, it seems to me that only patches ## 1, 4, and 
the lock removal part of #5 are really needed to solve the problem we have at 
hand now. Why not merge those into one patch and deliver this to 'net', while 
reference count redesign parts can go into net-next ?

Regards
///jon


> -----Original Message-----
> From: Ying Xue [mailto:ying....@windriver.com]
> Sent: Monday, February 20, 2017 06:39 AM
> To: Jon Maloy <jon.ma...@ericsson.com>; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>; thompa....@gmail.com
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: [net 0/5] solve two deadlock issues
> 
> Commit d094c4d5f5 ("tipc: add subscription refcount to avoid invalid
> delete") accidently introduce the following deadlock scenarios:
> 
>    CPU1:                             CPU2:
> ----------                     ----------------
> tipc_nametbl_publish
> spin_lock_bh(&tn->nametbl_lock)
> tipc_nametbl_insert_publ
> tipc_nameseq_insert_publ
> tipc_subscrp_report_overlap
> tipc_subscrp_get
> tipc_subscrp_send_event
>                                tipc_close_conn
>                                tipc_subscrb_release_cb
>                                tipc_subscrb_delete
>                                tipc_subscrp_put
> tipc_subscrp_put
> tipc_subscrp_kref_release
> tipc_nametbl_unsubscribe
> spin_lock_bh(&tn->nametbl_lock)
> <<grab nametbl_lock again>>
> 
>    CPU1:                              CPU2:
> ----------                     ----------------
> tipc_nametbl_stop
> spin_lock_bh(&tn->nametbl_lock)
> tipc_purge_publications
> tipc_nameseq_remove_publ
> tipc_subscrp_report_overlap
> tipc_subscrp_get
> tipc_subscrp_send_event
>                                tipc_close_conn
>                                tipc_subscrb_release_cb
>                                tipc_subscrb_delete
>                                tipc_subscrp_put
> tipc_subscrp_put
> tipc_subscrp_kref_release
> tipc_nametbl_unsubscribe
> spin_lock_bh(&tn->nametbl_lock)
> <<grab nametbl_lock again>>
> 
> The root cause of two deadlocks is that we have to hold nametbl lock when
> subscription is freed in tipc_subscrp_kref_release(). In order to eliminate 
> the
> need of taking nametbl lock in tipc_subscrp_kref_release(), the functions
> protected by nametbl lock in tipc_subscrp_kref_release() are moved to
> other places step by step in the series.
> 
> Ying Xue (5):
>   tipc: advance the time of deleting subscription from
>     subscriber->subscrp_list
>   tipc: adjust the policy of holding subscription kref
>   tipc: adjust policy that sub->timer holds subscription kref
>   tipc: advance the time of calling tipc_nametbl_unsubscribe
>   tipc: remove unnecessary increasement of subscription refcount
> 
>  net/tipc/name_table.c |  2 ++
>  net/tipc/subscr.c     | 32 ++++++++++++++------------------
>  net/tipc/subscr.h     |  3 +++
>  3 files changed, 19 insertions(+), 18 deletions(-)
> 
> --
> 2.7.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to