Hi Ravi:
I don't think that tipc_deleteport() is really missing a port lock.
What is happening is that tipc_deleteport() is locking the port,
deregistering the port from TIPC's object registry, and then releasing
the port lock. This ensures that no other thread of control can gain
access to the port data structure (which is going to be deleted shortly)
by calling port_lock(). It is then valid for tipc_deleteport() to call
routines that normally require holding the port lock -- without actually
having to hold the lock -- since any other thread that attempts to use
the port will get a failure indication when it calls port_lock()
[because the port is no longer a registered object].
The traceback you provided tells me that the problem you encountered
occurs during the net_route_msg() call at the end of tipc_deleteport(),
but it appears this routine has been interrupted by the TIPC tasklet
which handles the execution of asynchronous TIPC events (i.e. the
routines triggered by k_signal() calls). This suggests that your
problem involves process_signal_queue() encountering corruption of the
items in TIPC's signal queue (see handler.c), rather than a problem with
the work done by proto_build_peer_abort_msg(). However, I can't see any
reason why the signal queue might be corrupt.
Regards,
Al
________________________________
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ravi
Rao
Sent: Thursday, October 25, 2007 1:36 PM
To: [email protected];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [tipc-discussion] port_lock missing
beforeport_build_peer_abort_msg function call
Hi All,
This is about an issue that I am facing when a user application
tries to delete a port.
A user program that uses TIPC-1.5.10 for communication traps
causing kernel panic while trying to delete TIPC socket.
Following is stack dump
[<c0103443>] dump_stack+0x1e/0x2
[<c011b5aa>] panic+0x87/0x12
[<c0103984>] die+0x18a/0x19
[<c0112602>] do_page_fault+0x594/0x8e
[<c01030bb>] error_code+0x2b/0x3
[<f92193a0>] process_signal_queue+0x1d0/0x279 [tipc
[<c0120e35>] tasklet_action+0x7e/0xd
[<c0120aed>] __do_softirq+0x89/0xf
[<c0120b90>] do_softirq+0x31/0x3
[<c0120c05>] local_bh_enable+0x73/0x7
[<f9211270>] net_route_msg+0x259/0x4a4 [tipc
[<f921397d>] tipc_deleteport+0xf4/0x156 [tipc
[<f9219a65>] release+0xf4/0x16d [tipc
[<c038eff9>] sock_release+0x6f/0xa
[<c038fafc>] sock_close+0x34/0x5
[<c0168915>] __fput+0xf3/0x12
[<c0167021>] filp_close+0x50/0x8
[<c01670f5>] sys_close+0x96/0xc
[<c0102554>] no_dpa_vsyscall_enter+0x8/0
Further debugging the stack dump shows that
-> tipc_deleteport
-> Calls port_build_peer_abort_msg
-> Calls port_build_proto_msg
Before calling "port_build_proto_msg" TIPC port must be first
locked (as mentioned in code comments), but it seems that neither
tipc_deleteport nor port_build_peer_abort_msg takes care for locking the
port.
Looking at the raw stack, during the panic it seem that some bad
memory address is introduced by port_build_peer_abort_msg function.
My queries are,
Could not locking the port introduce a bad address that may
cause a kernel panic ?
Is this an issue/known issues? If yes, has it been addressed ?
Is their any way / demo code that can execute below execution
path
-> tipc_deleteport -> Calls port_build_peer_abort_msg -> Calls
port_build_proto_msg ?
Thanks,
Ravi.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion