On 2/22/24 11:02, Daniel Ding wrote:
>
>
>> 2024年2月22日 上午1:55,Ilya Maximets <[email protected]
>> <mailto:[email protected]>> 写道:
>>
>> On 2/21/24 15:27, Aaron Conole wrote:
>>> Daniel Ding <[email protected] <mailto:[email protected]>> writes:
>>>
>>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>>>> raise the following exception.
>>>>
>>>> Error in atexit._run_exitfuncs:
>>>> Traceback (most recent call last):
>>>> File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>>> ovsdb = OVSDB(db_sock)
>>>> File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>>> OVSDB.wait_for_db_change(self._idl_conn) # Initial Sync with DB
>>>> File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>>> while idl.change_seqno == seq and not idl.run():
>>>>
>>>> Signed-off-by: Daniel Ding <[email protected]
>>>> <mailto:[email protected]>>
>>>> ---
>>>
>>> LGTM for the linux side - maybe Alin might check the windows side.
>>
>> The code is copied from the fatal-signla library, so it is likely fine.
>> However, I don;t think this issue should be fixed on the application
>> level (ovs-tcpdump). There seems to be adifference in how C and python
>> fatal-signla libraries behave. The C version calls the hooks in the run()
>> function and the signal handler only updates the signal number variable.
>> So, if the same signal arrives multiple times it will be handled only once
>> and the run() function will not exit until all the hooks are done, unless
>> it is a segfault.
>>
>> With python implementation we're calling hooks right from the signal
>> handler (it's not a real signal handler, so we're allowed to use not
>> really singal-safe functions there). So, if another signal arrives while
>> we're executing the hooks, the second hook execution will be skipped
>> due to 'recursion' check, but the signal will be re-raised immediately,
>> killing the process.
>
> As your suggestions, I rewrite recurse as a threading.Lock. So that protect
> calling hooks
> applications registered.
>
>>
>> There is likley a way to fix that by using a mutex instead of recursion
>> check and blocking it while executing the hooks. The signal number
>> will need to be stored in the signal handler and the signal will need
>> to be re-raised in the call_hooks() instead of signal handler.
>> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>>
>
> To avoid breaking calling hooks, I set the signal restored in the signal
> handler to SIG_IGN in
> call_hooks.
I'm not sure this is necessary, see below.
> And move re-raised the signal in the call_hooks with locked.
>
>> Does that make sense?
>>
>> Best regards, Ilya Maximets.
>
> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
> index cb2e99e87..3aa56a166 100644
> --- a/python/ovs/fatal_signal.py
> +++ b/python/ovs/fatal_signal.py
> @@ -16,6 +16,7 @@ import atexit
> import os
> import signal
> import sys
> +import threading
>
> import ovs.vlog
>
> @@ -112,28 +113,34 @@ def _unlink(file_):
> def _signal_handler(signr, _):
> _call_hooks(signr)
>
> - # Re-raise the signal with the default handling so that the program
> - # termination status reflects that we were killed by this signal.
> - signal.signal(signr, signal.SIG_DFL)
> - os.kill(os.getpid(), signr)
> -
>
> def _atexit_handler():
> _call_hooks(0)
>
>
> -recurse = False
> +recurse = threading.Lock()
It's better to rename this to just 'mutex' or 'lock'.
>
>
> def _call_hooks(signr):
> global recurse
> - if recurse:
> + if recurse.locked():
> return
There is an awkward race window between checking that the mutex is
locked and actually attempting to take it below. It's probably not
a big deal here, but it's better to take the lock at the same time
with checking, e.g.:
if not mutex.acquire(blocking=False):
return
And since we're going to end the process anyway (it's a handler
for fatal signals after all), we don't actually need to unlock it,
AFAICT.
> - recurse = True
>
> - for hook, cancel, run_at_exit in _hooks:
> - if signr != 0 or run_at_exit:
> - hook()
> + with recurse:
> + if signr != 0:
> + signal.signal(signr, signal.SIG_IGN)
> + else:
> + signal.signal(signal.SIGINT, signal.SIG_IGN)
If another signal arrives while we're under the lock, the signal
handler will call the _call_hooks(), check that the lock is already
taken and return. There is no need to ignore it.
Also, if a different signal arrives the code above will not ignore
it anyway.
> +
> + for hook, cancel, run_at_exit in _hooks:
> + if signr != 0 or run_at_exit:
> + hook()
> +
> + if signr != 0:
> + # Re-raise the signal with the default handling so that the
> program
> + # termination status reflects that we were killed by this signal.
> + signal.signal(signr, signal.SIG_DFL)
> + os.kill(os.getpid(), signr)
We may want to place this in the try: finally: block, so we still
re-raise the signal even if one of the hooks throws an exception.
Hooks should not generally throw exceptions, but may be safer this
way.
What do you think?
>
>
> _inited = False
> @@ -165,7 +172,6 @@ def signal_alarm(timeout):
>
> if sys.platform == "win32":
> import time
> - import threading
>
> class Alarm (threading.Thread):
> def __init__(self, timeout):
> --
> 2.43.0
>
> Thanks Ilya Maximets, please help me review again.
>
> Best regards, Daniel Ding.
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev