> 2024年2月22日 下午7:20,Ilya Maximets <i.maxim...@ovn.org> 写道:
> 
> On 2/22/24 11:53, Ilya Maximets wrote:
>> On 2/22/24 11:02, Daniel Ding wrote:
>>> 
>>> 
>>>> 2024年2月22日 上午1:55,Ilya Maximets <i.maxim...@ovn.org 
>>>> <mailto:i.maxim...@ovn.org>> 写道:
>>>> 
>>>> On 2/21/24 15:27, Aaron Conole wrote:
>>>>> Daniel Ding <danieldin...@gmail.com <mailto:danieldin...@gmail.com>> 
>>>>> writes:
>>>>> 
>>>>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>>>>>> raise the following exception.
>>>>>> 
>>>>>> Error in atexit._run_exitfuncs:
>>>>>> Traceback (most recent call last):
>>>>>>  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>>>>>    ovsdb = OVSDB(db_sock)
>>>>>>  File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>>>>>    OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>>>>>>  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>>>>>    while idl.change_seqno == seq and not idl.run():
>>>>>> 
>>>>>> Signed-off-by: Daniel Ding <zhihui.d...@easystack.cn 
>>>>>> <mailto:zhihui.d...@easystack.cn>>
>>>>>> ---
>>>>> 
>>>>> LGTM for the linux side - maybe Alin might check the windows side.
>>>> 
>>>> The code is copied from the fatal-signla library, so it is likely fine.
>>>> However, I don;t think this issue should be fixed on the application
>>>> level (ovs-tcpdump).   There seems to be adifference in how C and python
>>>> fatal-signla libraries behave.  The C version calls the hooks in the run()
>>>> function and the signal handler only updates the signal number variable.
>>>> So, if the same signal arrives multiple times it will be handled only once
>>>> and the run() function will not exit until all the hooks are done, unless
>>>> it is a segfault.
>>>> 
>>>> With python implementation we're calling hooks right from the signal
>>>> handler (it's not a real signal handler, so we're allowed to use not
>>>> really singal-safe functions there).  So, if another signal arrives while
>>>> we're executing the hooks, the second hook execution will be skipped
>>>> due to 'recursion' check, but the signal will be re-raised immediately,
>>>> killing the process.
>>> 
>>> As your suggestions, I rewrite recurse as a threading.Lock. So that protect 
>>> calling hooks
>>> applications registered.
>>> 
>>>> 
>>>> There is likley a way to fix that by using a mutex instead of recursion
>>>> check and blocking it while executing the hooks.  The signal number
>>>> will need to be stored in the signal handler and the signal will need
>>>> to be re-raised in the call_hooks() instead of signal handler.
>>>> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>>>> 
>>> 
>>> To avoid breaking calling hooks, I set the signal restored in the signal 
>>> handler to SIG_IGN in 
>>> call_hooks.
>> 
>> I'm not sure this is necessary, see below.

The default handler of SIGINT is default_int_handler, so it not register the 
signal handler
successfully. When received CTRL+C again, the program be break, and calling 
hook can't
execute completely.

>> 
>>> And move re-raised the signal in the call_hooks with locked.
>>> 
>>>> Does that make sense?
>>>> 
>>>> Best regards, Ilya Maximets.
>>> 
>>> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
>>> index cb2e99e87..3aa56a166 100644
>>> --- a/python/ovs/fatal_signal.py
>>> +++ b/python/ovs/fatal_signal.py
>>> @@ -16,6 +16,7 @@ import atexit
>>>  import os
>>>  import signal
>>>  import sys
>>> +import threading
>>> 
>>>  import ovs.vlog
>>> 
>>> @@ -112,28 +113,34 @@ def _unlink(file_):
>>>  def _signal_handler(signr, _):
>>>      _call_hooks(signr)
>>> 
>>> -    # Re-raise the signal with the default handling so that the program
>>> -    # termination status reflects that we were killed by this signal.
>>> -    signal.signal(signr, signal.SIG_DFL)
>>> -    os.kill(os.getpid(), signr)
>>> -
>>> 
>>>  def _atexit_handler():
>>>      _call_hooks(0)
>>> 
>>> 
>>> -recurse = False
>>> +recurse = threading.Lock()
>> 
>> It's better to rename this to just 'mutex' or 'lock'.

Done

>> 
>>> 
>>> 
>>>  def _call_hooks(signr):
>>>      global recurse
>>> -    if recurse:
>>> +    if recurse.locked():
>>>          return
>> 
>> There is an awkward race window between checking that the mutex is
>> locked and actually attempting to take it below.  It's probably not
>> a big deal here, but it's better to take the lock at the same time
>> with checking, e.g.:
>> 
>>    if not mutex.acquire(blocking=False):
>>        return
>> 
>> And since we're going to end the process anyway (it's a handler
>> for fatal signals after all), we don't actually need to unlock it,
>> AFAICT.

Agree it.

>> 
>>> -    recurse = True
>>> 
>>> -    for hook, cancel, run_at_exit in _hooks:
>>> -        if signr != 0 or run_at_exit:
>>> -            hook()
>>> +    with recurse:
>>> +        if signr != 0:
>>> +            signal.signal(signr, signal.SIG_IGN)
>>> +        else:
>>> +            signal.signal(signal.SIGINT, signal.SIG_IGN)
>> 
>> If another signal arrives while we're under the lock, the signal
>> handler will call the _call_hooks(), check that the lock is already
>> taken and return.  There is no need to ignore it.
>> Also, if a different signal arrives the code above will not ignore
>> it anyway.
>> 

You’re right, but the SIGINT is exception in the _init because its default 
handler is not SIG_DFL.
So I append a new condition for SIGINT to register the signal handler.

        for signr in signals:
            handler = signal.getsignal(signr)
            if (handler == signal.SIG_DFL or
                handler == signal.default_int_handler):
                signal.signal(signr, _signal_handler)


>>> +
>>> +        for hook, cancel, run_at_exit in _hooks:
>>> +            if signr != 0 or run_at_exit:
>>> +                hook()
>>> +
>>> +        if signr != 0:
>>> +           # Re-raise the signal with the default handling so that the 
>>> program
>>> +           # termination status reflects that we were killed by this 
>>> signal.
>>> +           signal.signal(signr, signal.SIG_DFL)
>>> +           os.kill(os.getpid(), signr)
>> 
>> We may want to place this in the try: finally: block, so we still
>> re-raise the signal even if one of the hooks throws an exception.
>> Hooks should not generally throw exceptions, but may be safer this
>> way.
> 
> On a second thought, if we re-raise after the exception in the
> hook, we may loose the exception.  So, letting the exception to
> kill the application may be better in this case then hiding it
> with re-raising a signal.
> 
>> 
>> What do you think?
>> 
>>> 
>>> 
>>>  _inited = False
>>> @@ -165,7 +172,6 @@ def signal_alarm(timeout):
>>> 
>>>      if sys.platform == "win32":
>>>          import time
>>> -        import threading
>>> 
>>>          class Alarm (threading.Thread):
>>>              def __init__(self, timeout):
>>> --
>>> 2.43.0
>>> 
>>> Thanks Ilya Maximets, please help me review again.
>>> 
>>> Best regards, Daniel Ding.


 python/ovs/fatal_signal.py | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
index cb2e99e87..077f50dd5 100644
--- a/python/ovs/fatal_signal.py
+++ b/python/ovs/fatal_signal.py
@@ -16,6 +16,7 @@ import atexit
 import os
 import signal
 import sys
+import threading

 import ovs.vlog

@@ -112,29 +113,29 @@ def _unlink(file_):
 def _signal_handler(signr, _):
     _call_hooks(signr)

-    # Re-raise the signal with the default handling so that the program
-    # termination status reflects that we were killed by this signal.
-    signal.signal(signr, signal.SIG_DFL)
-    os.kill(os.getpid(), signr)
-

 def _atexit_handler():
     _call_hooks(0)


-recurse = False
+mutex = threading.Lock()


 def _call_hooks(signr):
-    global recurse
-    if recurse:
+    global mutex
+    if not mutex.acquire(blocking=False):
         return
-    recurse = True

     for hook, cancel, run_at_exit in _hooks:
         if signr != 0 or run_at_exit:
             hook()

+    if signr != 0:
+       # Re-raise the signal with the default handling so that the program
+       # termination status reflects that we were killed by this signal.
+       signal.signal(signr, signal.SIG_DFL)
+       os.kill(os.getpid(), signr)
+

 _inited = False

@@ -150,7 +151,9 @@ def _init():
                        signal.SIGALRM]

         for signr in signals:
-            if signal.getsignal(signr) == signal.SIG_DFL:
+            handler = signal.getsignal(signr)
+            if (handler == signal.SIG_DFL or
+                handler == signal.default_int_handler):
                 signal.signal(signr, _signal_handler)
         atexit.register(_atexit_handler)

@@ -165,7 +168,6 @@ def signal_alarm(timeout):

     if sys.platform == "win32":
         import time
-        import threading

         class Alarm (threading.Thread):
             def __init__(self, timeout):



Thanks for your review, Ilya Maximets.


Regards,
Daniel Ding


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to