On 2/22/24 11:53, Ilya Maximets wrote:
> On 2/22/24 11:02, Daniel Ding wrote:
>>
>>
>>> 2024年2月22日 上午1:55,Ilya Maximets <[email protected] 
>>> <mailto:[email protected]>> 写道:
>>>
>>> On 2/21/24 15:27, Aaron Conole wrote:
>>>> Daniel Ding <[email protected] <mailto:[email protected]>> 
>>>> writes:
>>>>
>>>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>>>>> raise the following exception.
>>>>>
>>>>> Error in atexit._run_exitfuncs:
>>>>> Traceback (most recent call last):
>>>>>  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>>>>    ovsdb = OVSDB(db_sock)
>>>>>  File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>>>>    OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>>>>>  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>>>>    while idl.change_seqno == seq and not idl.run():
>>>>>
>>>>> Signed-off-by: Daniel Ding <[email protected] 
>>>>> <mailto:[email protected]>>
>>>>> ---
>>>>
>>>> LGTM for the linux side - maybe Alin might check the windows side.
>>>
>>> The code is copied from the fatal-signla library, so it is likely fine.
>>> However, I don;t think this issue should be fixed on the application
>>> level (ovs-tcpdump).   There seems to be adifference in how C and python
>>> fatal-signla libraries behave.  The C version calls the hooks in the run()
>>> function and the signal handler only updates the signal number variable.
>>> So, if the same signal arrives multiple times it will be handled only once
>>> and the run() function will not exit until all the hooks are done, unless
>>> it is a segfault.
>>>
>>> With python implementation we're calling hooks right from the signal
>>> handler (it's not a real signal handler, so we're allowed to use not
>>> really singal-safe functions there).  So, if another signal arrives while
>>> we're executing the hooks, the second hook execution will be skipped
>>> due to 'recursion' check, but the signal will be re-raised immediately,
>>> killing the process.
>>
>> As your suggestions, I rewrite recurse as a threading.Lock. So that protect 
>> calling hooks
>> applications registered.
>>
>>>
>>> There is likley a way to fix that by using a mutex instead of recursion
>>> check and blocking it while executing the hooks.  The signal number
>>> will need to be stored in the signal handler and the signal will need
>>> to be re-raised in the call_hooks() instead of signal handler.
>>> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>>>
>>
>> To avoid breaking calling hooks, I set the signal restored in the signal 
>> handler to SIG_IGN in 
>> call_hooks.
> 
> I'm not sure this is necessary, see below.
> 
>> And move re-raised the signal in the call_hooks with locked.
>>
>>> Does that make sense?
>>>
>>> Best regards, Ilya Maximets.
>>
>> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
>> index cb2e99e87..3aa56a166 100644
>> --- a/python/ovs/fatal_signal.py
>> +++ b/python/ovs/fatal_signal.py
>> @@ -16,6 +16,7 @@ import atexit
>>  import os
>>  import signal
>>  import sys
>> +import threading
>>
>>  import ovs.vlog
>>
>> @@ -112,28 +113,34 @@ def _unlink(file_):
>>  def _signal_handler(signr, _):
>>      _call_hooks(signr)
>>
>> -    # Re-raise the signal with the default handling so that the program
>> -    # termination status reflects that we were killed by this signal.
>> -    signal.signal(signr, signal.SIG_DFL)
>> -    os.kill(os.getpid(), signr)
>> -
>>
>>  def _atexit_handler():
>>      _call_hooks(0)
>>
>>
>> -recurse = False
>> +recurse = threading.Lock()
> 
> It's better to rename this to just 'mutex' or 'lock'.
> 
>>
>>
>>  def _call_hooks(signr):
>>      global recurse
>> -    if recurse:
>> +    if recurse.locked():
>>          return
> 
> There is an awkward race window between checking that the mutex is
> locked and actually attempting to take it below.  It's probably not
> a big deal here, but it's better to take the lock at the same time
> with checking, e.g.:
> 
>     if not mutex.acquire(blocking=False):
>         return
> 
> And since we're going to end the process anyway (it's a handler
> for fatal signals after all), we don't actually need to unlock it,
> AFAICT.
> 
>> -    recurse = True
>>
>> -    for hook, cancel, run_at_exit in _hooks:
>> -        if signr != 0 or run_at_exit:
>> -            hook()
>> +    with recurse:
>> +        if signr != 0:
>> +            signal.signal(signr, signal.SIG_IGN)
>> +        else:
>> +            signal.signal(signal.SIGINT, signal.SIG_IGN)
> 
> If another signal arrives while we're under the lock, the signal
> handler will call the _call_hooks(), check that the lock is already
> taken and return.  There is no need to ignore it.
> Also, if a different signal arrives the code above will not ignore
> it anyway.
> 
>> +
>> +        for hook, cancel, run_at_exit in _hooks:
>> +            if signr != 0 or run_at_exit:
>> +                hook()
>> +
>> +        if signr != 0:
>> +           # Re-raise the signal with the default handling so that the 
>> program
>> +           # termination status reflects that we were killed by this signal.
>> +           signal.signal(signr, signal.SIG_DFL)
>> +           os.kill(os.getpid(), signr)
> 
> We may want to place this in the try: finally: block, so we still
> re-raise the signal even if one of the hooks throws an exception.
> Hooks should not generally throw exceptions, but may be safer this
> way.

On a second thought, if we re-raise after the exception in the
hook, we may loose the exception.  So, letting the exception to
kill the application may be better in this case then hiding it
with re-raising a signal.

> 
> What do you think?
> 
>>
>>
>>  _inited = False
>> @@ -165,7 +172,6 @@ def signal_alarm(timeout):
>>
>>      if sys.platform == "win32":
>>          import time
>> -        import threading
>>
>>          class Alarm (threading.Thread):
>>              def __init__(self, timeout):
>> --
>> 2.43.0
>>
>> Thanks Ilya Maximets, please help me review again.
>>
>> Best regards, Daniel Ding.
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to