Hi,
This is a known issue, which I discussed with Jarek Poplawski at the
netdev mailing list back in January.
This is not a real bug, since the two functions never can access the
same _instance_ of the table[i].lock at the same time.
I.e. the index "i" can never the same in the two calls.
Consider:
tipc_createport()->create_reference() returns i, to be
used as a reference for the port instance.
tipc_deleteport() uses that reference as argument when the
port should be deleted.
Hence, it is logically impossible that tipc_deleteport() can be called
_for a port_ until tipc_create_port() has returned the reference to that
port.
(Just as you can't do free() on a memory block until malloc() has
returned its pointer.)
However, we acknowledge that this an issue that will keep popping up
until we find some way to fool the tool to not give this warning.
Jarek had a suggestion for a rather ugly fix which I did not quite
understand, so I have been dragging my feet on this until I come
up with something better. I will try to look into this again.
Regards
///jon
Florian Westphal wrote:
> Hi.
>
> i am running a standalone tipc 1.7.2 node inside a 4-CPU qemu instance.
> When opening a tipc socket, i get this:
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.20-gentoo-r3 #9
> -------------------------------------------------------
> netsend/1453 is trying to acquire lock:
> (ref_table_lock){-+..}, at: [<c029c5d1>] tipc_ref_discard+0x31/0x140
>
> but task is already holding lock:
> (&table[i].lock){-+..}, at: [<c029c579>] tipc_ref_lock+0x39/0x60
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&table[i].lock){-+..}:
> [<c0136f96>] __lock_acquire+0xcd6/0xdc0
> [<c01370d7>] lock_acquire+0x57/0x70
> [<c02a3371>] _spin_lock_bh+0x31/0x40
> [<c029c75c>] tipc_ref_acquire+0x7c/0x110
> [<c029acc2>] tipc_createport_raw+0x32/0x1a0
> [<c029b866>] tipc_createport+0x46/0xf0
> [<c02960dd>] tipc_subscr_start+0xbd/0x130
> [<c028e736>] process_signal_queue+0x56/0x90
> [<c012062a>] tasklet_action+0x5a/0xe0
> [<c01200c7>] __do_softirq+0x87/0x100
> [<c0120197>] do_softirq+0x57/0x60
> [<c012046e>] local_bh_enable_ip+0xae/0x100
> [<c02a3235>] _spin_unlock_bh+0x25/0x30
> [<c028e6b2>] tipc_k_signal+0xc2/0xf0
> [<c028e3e8>] tipc_core_start+0x98/0xc0
> [<c0353093>] tipc_init+0x83/0xdb
> [<c01004d0>] init+0x110/0x320
> [<c0103c43>] kernel_thread_helper+0x7/0x14
> [<ffffffff>] 0xffffffff
>
> -> #0 (ref_table_lock){-+..}:
> [<c0136e0a>] __lock_acquire+0xb4a/0xdc0
> [<c01370d7>] lock_acquire+0x57/0x70
> [<c02a33f1>] _write_lock_bh+0x31/0x40
> [<c029c5d1>] tipc_ref_discard+0x31/0x140
> [<c029b453>] tipc_deleteport+0x33/0x140
> [<c029e825>] release+0xa5/0x130
> [<c023a6d3>] sock_release+0x13/0x70
> [<c023a8a1>] sock_close+0x21/0x40
> [<c0159828>] __fput+0x58/0x100
> [<c0159939>] fput+0x19/0x20
> [<c0156f67>] filp_close+0x47/0x70
> [<c011d034>] put_files_struct+0xa4/0xb0
> [<c011e14e>] do_exit+0x12e/0x7d0
> [<c011e819>] do_group_exit+0x29/0x70
> [<c011e86f>] sys_exit_group+0xf/0x20
> [<c0103018>] syscall_call+0x7/0xb
> [<ffffffff>] 0xffffffff
> other info that might help us debug this:
>
> 2 locks held by netsend/1453:
> #0: (sk_lock-AF_TIPC){--..}, at: [<c029e7b1>] release+0x31/0x130
> #1: (&table[i].lock){-+..}, at: [<c029c579>] tipc_ref_lock+0x39/0x60
>
> stack backtrace:
> [<c010402a>] show_trace_log_lvl+0x1a/0x30
> [<c0104712>] show_trace+0x12/0x20
> [<c01047c6>] dump_stack+0x16/0x20
> [<c01350ef>] print_circular_bug_tail+0x6f/0x80
> [<c0136e0a>] __lock_acquire+0xb4a/0xdc0
> [<c01370d7>] lock_acquire+0x57/0x70
> [<c02a33f1>] _write_lock_bh+0x31/0x40
> [<c029c5d1>] tipc_ref_discard+0x31/0x140
> [<c029b453>] tipc_deleteport+0x33/0x140
> [<c029e825>] release+0xa5/0x130
> [<c023a6d3>] sock_release+0x13/0x70
> [<c023a8a1>] sock_close+0x21/0x40
> [<c0159828>] __fput+0x58/0x100
> [<c0159939>] fput+0x19/0x20
> [<c0156f67>] filp_close+0x47/0x70
> [<c011d034>] put_files_struct+0xa4/0xb0
> [<c011e14e>] do_exit+0x12e/0x7d0
> [<c011e819>] do_group_exit+0x29/0x70
> [<c011e86f>] sys_exit_group+0xf/0x20
> [<c0103018>] syscall_call+0x7/0xb
> =======================
>
>
> I think the warning is correct and that this is indeed a possible deadlock:
>
> tipc_deleteport():
> calls tipc_port_lock() (which is the same as tipc_ref_lock);
> tipc_ref_lock() aquires the reference spinlock:
> struct reference *r = &tipc_ref_table.entries[ref &
> tipc_ref_table.index_mask];
> spin_lock_bh(&r->lock);
> if (likely(r->data.reference == ref))
> return r->object;
> next, tipc_deleteport calls tipc_ref_discard(), which locks ref_table_lock.
>
> tipc_ref_acquire():
> locks ref_table_lock. Then the following code is executed:
> if (tipc_ref_table.first_free) {
> index = tipc_ref_table.first_free;
> entry = &(tipc_ref_table.entries[index]);
> index_mask = tipc_ref_table.index_mask;
> /* take lock in case a previous user of entry still holds it */
> spin_lock_bh(&entry->lock);
>
> and indeed: the locks are acquired in reverse order 8-/
> Any ideas of how this can be fixed?
>
> Thanks,
> Florian
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion