On 2/22/24 14:12, Daniel Ding wrote: > > > >> 2024年2月22日 下午7:20,Ilya Maximets <[email protected] >> <mailto:[email protected]>> 写道: >> >> On 2/22/24 11:53, Ilya Maximets wrote: >>> On 2/22/24 11:02, Daniel Ding wrote: >>>> >>>> >>>>> 2024年2月22日 上午1:55,Ilya Maximets <[email protected] >>>>> <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>>> 写道: >>>>> >>>>> On 2/21/24 15:27, Aaron Conole wrote: >>>>>> Daniel Ding <[email protected] <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>>> writes: >>>>>> >>>>>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will >>>>>>> raise the following exception. >>>>>>> >>>>>>> Error in atexit._run_exitfuncs: >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror >>>>>>> ovsdb = OVSDB(db_sock) >>>>>>> File "/usr/bin/ovs-tcpdump", line 168, in __init__ >>>>>>> OVSDB.wait_for_db_change(self._idl_conn) # Initial Sync with DB >>>>>>> File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change >>>>>>> while idl.change_seqno == seq and not idl.run(): >>>>>>> >>>>>>> Signed-off-by: Daniel Ding <[email protected] >>>>>>> <mailto:[email protected]> <mailto:[email protected] >>>>>>> <mailto:[email protected]>>> >>>>>>> --- >>>>>> >>>>>> LGTM for the linux side - maybe Alin might check the windows side. >>>>> >>>>> The code is copied from the fatal-signla library, so it is likely fine. >>>>> However, I don;t think this issue should be fixed on the application >>>>> level (ovs-tcpdump). There seems to be adifference in how C and python >>>>> fatal-signla libraries behave. The C version calls the hooks in the run() >>>>> function and the signal handler only updates the signal number variable. >>>>> So, if the same signal arrives multiple times it will be handled only once >>>>> and the run() function will not exit until all the hooks are done, unless >>>>> it is a segfault. >>>>> >>>>> With python implementation we're calling hooks right from the signal >>>>> handler (it's not a real signal handler, so we're allowed to use not >>>>> really singal-safe functions there). So, if another signal arrives while >>>>> we're executing the hooks, the second hook execution will be skipped >>>>> due to 'recursion' check, but the signal will be re-raised immediately, >>>>> killing the process. >>>> >>>> As your suggestions, I rewrite recurse as a threading.Lock. So that >>>> protect calling hooks >>>> applications registered. >>>> >>>>> >>>>> There is likley a way to fix that by using a mutex instead of recursion >>>>> check and blocking it while executing the hooks. The signal number >>>>> will need to be stored in the signal handler and the signal will need >>>>> to be re-raised in the call_hooks() instead of signal handler. >>>>> We can use mutex.acquire(blocking=False) as a way to prevent recursion. >>>>> >>>> >>>> To avoid breaking calling hooks, I set the signal restored in the signal >>>> handler to SIG_IGN in >>>> call_hooks. >>> >>> I'm not sure this is necessary, see below. > > The default handler of SIGINT is default_int_handler, so it not register the > signal handler > successfully. When received CTRL+C again, the program be break, and calling > hook can't > execute completely. > >>> >>>> And move re-raised the signal in the call_hooks with locked. >>>> >>>>> Does that make sense? >>>>> >>>>> Best regards, Ilya Maximets. >>>> >>>> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py >>>> index cb2e99e87..3aa56a166 100644 >>>> --- a/python/ovs/fatal_signal.py >>>> +++ b/python/ovs/fatal_signal.py >>>> @@ -16,6 +16,7 @@ import atexit >>>> import os >>>> import signal >>>> import sys >>>> +import threading >>>> >>>> import ovs.vlog >>>> >>>> @@ -112,28 +113,34 @@ def _unlink(file_): >>>> def _signal_handler(signr, _): >>>> _call_hooks(signr) >>>> >>>> - # Re-raise the signal with the default handling so that the program >>>> - # termination status reflects that we were killed by this signal. >>>> - signal.signal(signr, signal.SIG_DFL) >>>> - os.kill(os.getpid(), signr) >>>> - >>>> >>>> def _atexit_handler(): >>>> _call_hooks(0) >>>> >>>> >>>> -recurse = False >>>> +recurse = threading.Lock() >>> >>> It's better to rename this to just 'mutex' or 'lock'. > > Done > >>> >>>> >>>> >>>> def _call_hooks(signr): >>>> global recurse >>>> - if recurse: >>>> + if recurse.locked(): >>>> return >>> >>> There is an awkward race window between checking that the mutex is >>> locked and actually attempting to take it below. It's probably not >>> a big deal here, but it's better to take the lock at the same time >>> with checking, e.g.: >>> >>> if not mutex.acquire(blocking=False): >>> return >>> >>> And since we're going to end the process anyway (it's a handler >>> for fatal signals after all), we don't actually need to unlock it, >>> AFAICT. > > Agree it. > >>> >>>> - recurse = True >>>> >>>> - for hook, cancel, run_at_exit in _hooks: >>>> - if signr != 0 or run_at_exit: >>>> - hook() >>>> + with recurse: >>>> + if signr != 0: >>>> + signal.signal(signr, signal.SIG_IGN) >>>> + else: >>>> + signal.signal(signal.SIGINT, signal.SIG_IGN) >>> >>> If another signal arrives while we're under the lock, the signal >>> handler will call the _call_hooks(), check that the lock is already >>> taken and return. There is no need to ignore it. >>> Also, if a different signal arrives the code above will not ignore >>> it anyway. >>> > > You’re right, but the SIGINT is exception in the _init because its default > handler is not SIG_DFL. > So I append a new condition for SIGINT to register the signal handler. > > for signr in signals: > handler = signal.getsignal(signr) > if (handler == signal.SIG_DFL or > handler == signal.default_int_handler): > signal.signal(signr, _signal_handler)
Hmm, yeah, makes sense. With that, do we need to catch the KeyboardInterrupt exception in the ovs-tcpdump now? > > >>>> + >>>> + for hook, cancel, run_at_exit in _hooks: >>>> + if signr != 0 or run_at_exit: >>>> + hook() >>>> + >>>> + if signr != 0: >>>> + # Re-raise the signal with the default handling so that the >>>> program >>>> + # termination status reflects that we were killed by this >>>> signal. >>>> + signal.signal(signr, signal.SIG_DFL) >>>> + os.kill(os.getpid(), signr) >>> >>> We may want to place this in the try: finally: block, so we still >>> re-raise the signal even if one of the hooks throws an exception. >>> Hooks should not generally throw exceptions, but may be safer this >>> way. >> >> On a second thought, if we re-raise after the exception in the >> hook, we may loose the exception. So, letting the exception to >> kill the application may be better in this case then hiding it >> with re-raising a signal. >> >>> >>> What do you think? >>> >>>> >>>> >>>> _inited = False >>>> @@ -165,7 +172,6 @@ def signal_alarm(timeout): >>>> >>>> if sys.platform == "win32": >>>> import time >>>> - import threading >>>> >>>> class Alarm (threading.Thread): >>>> def __init__(self, timeout): >>>> -- >>>> 2.43.0 >>>> >>>> Thanks Ilya Maximets, please help me review again. >>>> >>>> Best regards, Daniel Ding. > > > python/ovs/fatal_signal.py | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py > index cb2e99e87..077f50dd5 100644 > --- a/python/ovs/fatal_signal.py > +++ b/python/ovs/fatal_signal.py > @@ -16,6 +16,7 @@ import atexit > import os > import signal > import sys > +import threading > > import ovs.vlog > > @@ -112,29 +113,29 @@ def _unlink(file_): > def _signal_handler(signr, _): > _call_hooks(signr) > > - # Re-raise the signal with the default handling so that the program > - # termination status reflects that we were killed by this signal. > - signal.signal(signr, signal.SIG_DFL) > - os.kill(os.getpid(), signr) > - > > def _atexit_handler(): > _call_hooks(0) > > > -recurse = False > +mutex = threading.Lock() > > > def _call_hooks(signr): > - global recurse > - if recurse: > + global mutex > + if not mutex.acquire(blocking=False): > return > - recurse = True > > for hook, cancel, run_at_exit in _hooks: > if signr != 0 or run_at_exit: > hook() > > + if signr != 0: > + # Re-raise the signal with the default handling so that the program > + # termination status reflects that we were killed by this signal. > + signal.signal(signr, signal.SIG_DFL) > + os.kill(os.getpid(), signr) > + > > _inited = False > > @@ -150,7 +151,9 @@ def _init(): > signal.SIGALRM] > > for signr in signals: > - if signal.getsignal(signr) == signal.SIG_DFL: > + handler = signal.getsignal(signr) > + if (handler == signal.SIG_DFL or > + handler == signal.default_int_handler): > signal.signal(signr, _signal_handler) > atexit.register(_atexit_handler) > > @@ -165,7 +168,6 @@ def signal_alarm(timeout): > > if sys.platform == "win32": > import time > - import threading > > class Alarm (threading.Thread): > def __init__(self, timeout): > > > > Thanks for your review, Ilya Maximets. I left a comment above that may need to be addressed. But otherwise, this version seems mostly fine. Could you please send it as a new PATCH v2 with adjusted patch name and the commit message? It will get tested and it will be easier to continue reviews this way. Thanks! Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
