> 2024年2月22日 下午7:20,Ilya Maximets <i.maxim...@ovn.org> 写道:
>
> On 2/22/24 11:53, Ilya Maximets wrote:
>> On 2/22/24 11:02, Daniel Ding wrote:
>>>
>>>
>>>> 2024年2月22日 上午1:55,Ilya Maximets <i.maxim...@ovn.org
>>>> <mailto:i.maxim...@ovn.org>> 写道:
>>>>
>>>> On 2/21/24 15:27, Aaron Conole wrote:
>>>>> Daniel Ding <danieldin...@gmail.com <mailto:danieldin...@gmail.com>>
>>>>> writes:
>>>>>
>>>>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>>>>>> raise the following exception.
>>>>>>
>>>>>> Error in atexit._run_exitfuncs:
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>>>>> ovsdb = OVSDB(db_sock)
>>>>>> File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>>>>> OVSDB.wait_for_db_change(self._idl_conn) # Initial Sync with DB
>>>>>> File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>>>>> while idl.change_seqno == seq and not idl.run():
>>>>>>
>>>>>> Signed-off-by: Daniel Ding <zhihui.d...@easystack.cn
>>>>>> <mailto:zhihui.d...@easystack.cn>>
>>>>>> ---
>>>>>
>>>>> LGTM for the linux side - maybe Alin might check the windows side.
>>>>
>>>> The code is copied from the fatal-signla library, so it is likely fine.
>>>> However, I don;t think this issue should be fixed on the application
>>>> level (ovs-tcpdump). There seems to be adifference in how C and python
>>>> fatal-signla libraries behave. The C version calls the hooks in the run()
>>>> function and the signal handler only updates the signal number variable.
>>>> So, if the same signal arrives multiple times it will be handled only once
>>>> and the run() function will not exit until all the hooks are done, unless
>>>> it is a segfault.
>>>>
>>>> With python implementation we're calling hooks right from the signal
>>>> handler (it's not a real signal handler, so we're allowed to use not
>>>> really singal-safe functions there). So, if another signal arrives while
>>>> we're executing the hooks, the second hook execution will be skipped
>>>> due to 'recursion' check, but the signal will be re-raised immediately,
>>>> killing the process.
>>>
>>> As your suggestions, I rewrite recurse as a threading.Lock. So that protect
>>> calling hooks
>>> applications registered.
>>>
>>>>
>>>> There is likley a way to fix that by using a mutex instead of recursion
>>>> check and blocking it while executing the hooks. The signal number
>>>> will need to be stored in the signal handler and the signal will need
>>>> to be re-raised in the call_hooks() instead of signal handler.
>>>> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>>>>
>>>
>>> To avoid breaking calling hooks, I set the signal restored in the signal
>>> handler to SIG_IGN in
>>> call_hooks.
>>
>> I'm not sure this is necessary, see below.
The default handler of SIGINT is default_int_handler, so it not register the
signal handler
successfully. When received CTRL+C again, the program be break, and calling
hook can't
execute completely.
>>
>>> And move re-raised the signal in the call_hooks with locked.
>>>
>>>> Does that make sense?
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
>>> index cb2e99e87..3aa56a166 100644
>>> --- a/python/ovs/fatal_signal.py
>>> +++ b/python/ovs/fatal_signal.py
>>> @@ -16,6 +16,7 @@ import atexit
>>> import os
>>> import signal
>>> import sys
>>> +import threading
>>>
>>> import ovs.vlog
>>>
>>> @@ -112,28 +113,34 @@ def _unlink(file_):
>>> def _signal_handler(signr, _):
>>> _call_hooks(signr)
>>>
>>> - # Re-raise the signal with the default handling so that the program
>>> - # termination status reflects that we were killed by this signal.
>>> - signal.signal(signr, signal.SIG_DFL)
>>> - os.kill(os.getpid(), signr)
>>> -
>>>
>>> def _atexit_handler():
>>> _call_hooks(0)
>>>
>>>
>>> -recurse = False
>>> +recurse = threading.Lock()
>>
>> It's better to rename this to just 'mutex' or 'lock'.
Done
>>
>>>
>>>
>>> def _call_hooks(signr):
>>> global recurse
>>> - if recurse:
>>> + if recurse.locked():
>>> return
>>
>> There is an awkward race window between checking that the mutex is
>> locked and actually attempting to take it below. It's probably not
>> a big deal here, but it's better to take the lock at the same time
>> with checking, e.g.:
>>
>> if not mutex.acquire(blocking=False):
>> return
>>
>> And since we're going to end the process anyway (it's a handler
>> for fatal signals after all), we don't actually need to unlock it,
>> AFAICT.
Agree it.
>>
>>> - recurse = True
>>>
>>> - for hook, cancel, run_at_exit in _hooks:
>>> - if signr != 0 or run_at_exit:
>>> - hook()
>>> + with recurse:
>>> + if signr != 0:
>>> + signal.signal(signr, signal.SIG_IGN)
>>> + else:
>>> + signal.signal(signal.SIGINT, signal.SIG_IGN)
>>
>> If another signal arrives while we're under the lock, the signal
>> handler will call the _call_hooks(), check that the lock is already
>> taken and return. There is no need to ignore it.
>> Also, if a different signal arrives the code above will not ignore
>> it anyway.
>>
You’re right, but the SIGINT is exception in the _init because its default
handler is not SIG_DFL.
So I append a new condition for SIGINT to register the signal handler.
for signr in signals:
handler = signal.getsignal(signr)
if (handler == signal.SIG_DFL or
handler == signal.default_int_handler):
signal.signal(signr, _signal_handler)
>>> +
>>> + for hook, cancel, run_at_exit in _hooks:
>>> + if signr != 0 or run_at_exit:
>>> + hook()
>>> +
>>> + if signr != 0:
>>> + # Re-raise the signal with the default handling so that the
>>> program
>>> + # termination status reflects that we were killed by this
>>> signal.
>>> + signal.signal(signr, signal.SIG_DFL)
>>> + os.kill(os.getpid(), signr)
>>
>> We may want to place this in the try: finally: block, so we still
>> re-raise the signal even if one of the hooks throws an exception.
>> Hooks should not generally throw exceptions, but may be safer this
>> way.
>
> On a second thought, if we re-raise after the exception in the
> hook, we may loose the exception. So, letting the exception to
> kill the application may be better in this case then hiding it
> with re-raising a signal.
>
>>
>> What do you think?
>>
>>>
>>>
>>> _inited = False
>>> @@ -165,7 +172,6 @@ def signal_alarm(timeout):
>>>
>>> if sys.platform == "win32":
>>> import time
>>> - import threading
>>>
>>> class Alarm (threading.Thread):
>>> def __init__(self, timeout):
>>> --
>>> 2.43.0
>>>
>>> Thanks Ilya Maximets, please help me review again.
>>>
>>> Best regards, Daniel Ding.
python/ovs/fatal_signal.py | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
index cb2e99e87..077f50dd5 100644
--- a/python/ovs/fatal_signal.py
+++ b/python/ovs/fatal_signal.py
@@ -16,6 +16,7 @@ import atexit
import os
import signal
import sys
+import threading
import ovs.vlog
@@ -112,29 +113,29 @@ def _unlink(file_):
def _signal_handler(signr, _):
_call_hooks(signr)
- # Re-raise the signal with the default handling so that the program
- # termination status reflects that we were killed by this signal.
- signal.signal(signr, signal.SIG_DFL)
- os.kill(os.getpid(), signr)
-
def _atexit_handler():
_call_hooks(0)
-recurse = False
+mutex = threading.Lock()
def _call_hooks(signr):
- global recurse
- if recurse:
+ global mutex
+ if not mutex.acquire(blocking=False):
return
- recurse = True
for hook, cancel, run_at_exit in _hooks:
if signr != 0 or run_at_exit:
hook()
+ if signr != 0:
+ # Re-raise the signal with the default handling so that the program
+ # termination status reflects that we were killed by this signal.
+ signal.signal(signr, signal.SIG_DFL)
+ os.kill(os.getpid(), signr)
+
_inited = False
@@ -150,7 +151,9 @@ def _init():
signal.SIGALRM]
for signr in signals:
- if signal.getsignal(signr) == signal.SIG_DFL:
+ handler = signal.getsignal(signr)
+ if (handler == signal.SIG_DFL or
+ handler == signal.default_int_handler):
signal.signal(signr, _signal_handler)
atexit.register(_atexit_handler)
@@ -165,7 +168,6 @@ def signal_alarm(timeout):
if sys.platform == "win32":
import time
- import threading
class Alarm (threading.Thread):
def __init__(self, timeout):
Thanks for your review, Ilya Maximets.
Regards,
Daniel Ding
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev