Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-22 Thread Ilya Maximets
On 2/22/24 14:12, Daniel Ding wrote:
> 
> 
> 
>> 2024年2月22日 下午7:20,Ilya Maximets > > 写道:
>>
>> On 2/22/24 11:53, Ilya Maximets wrote:
>>> On 2/22/24 11:02, Daniel Ding wrote:


> 2024年2月22日 上午1:55,Ilya Maximets    >> 写道:
>
> On 2/21/24 15:27, Aaron Conole wrote:
>> Daniel Ding mailto:danieldin...@gmail.com> 
>> >> writes:
>>
>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>>> raise the following exception.
>>>
>>> Error in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>>  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>>    ovsdb = OVSDB(db_sock)
>>>  File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>>    OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>>>  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>>    while idl.change_seqno == seq and not idl.run():
>>>
>>> Signed-off-by: Daniel Ding >>  >> >>
>>> ---
>>
>> LGTM for the linux side - maybe Alin might check the windows side.
>
> The code is copied from the fatal-signla library, so it is likely fine.
> However, I don;t think this issue should be fixed on the application
> level (ovs-tcpdump).   There seems to be adifference in how C and python
> fatal-signla libraries behave.  The C version calls the hooks in the run()
> function and the signal handler only updates the signal number variable.
> So, if the same signal arrives multiple times it will be handled only once
> and the run() function will not exit until all the hooks are done, unless
> it is a segfault.
>
> With python implementation we're calling hooks right from the signal
> handler (it's not a real signal handler, so we're allowed to use not
> really singal-safe functions there).  So, if another signal arrives while
> we're executing the hooks, the second hook execution will be skipped
> due to 'recursion' check, but the signal will be re-raised immediately,
> killing the process.

 As your suggestions, I rewrite recurse as a threading.Lock. So that 
 protect calling hooks
 applications registered.

>
> There is likley a way to fix that by using a mutex instead of recursion
> check and blocking it while executing the hooks.  The signal number
> will need to be stored in the signal handler and the signal will need
> to be re-raised in the call_hooks() instead of signal handler.
> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>

 To avoid breaking calling hooks, I set the signal restored in the signal 
 handler to SIG_IGN in 
 call_hooks.
>>>
>>> I'm not sure this is necessary, see below.
> 
> The default handler of SIGINT is default_int_handler, so it not register the 
> signal handler
> successfully. When received CTRL+C again, the program be break, and calling 
> hook can't
> execute completely.
> 
>>>
 And move re-raised the signal in the call_hooks with locked.

> Does that make sense?
>
> Best regards, Ilya Maximets.

 diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
 index cb2e99e87..3aa56a166 100644
 --- a/python/ovs/fatal_signal.py
 +++ b/python/ovs/fatal_signal.py
 @@ -16,6 +16,7 @@ import atexit
  import os
  import signal
  import sys
 +import threading

  import ovs.vlog

 @@ -112,28 +113,34 @@ def _unlink(file_):
  def _signal_handler(signr, _):
      _call_hooks(signr)

 -    # Re-raise the signal with the default handling so that the program
 -    # termination status reflects that we were killed by this signal.
 -    signal.signal(signr, signal.SIG_DFL)
 -    os.kill(os.getpid(), signr)
 -

  def _atexit_handler():
      _call_hooks(0)


 -recurse = False
 +recurse = threading.Lock()
>>>
>>> It's better to rename this to just 'mutex' or 'lock'.
> 
> Done
> 
>>>


  def _call_hooks(signr):
      global recurse
 -    if recurse:
 +    if recurse.locked():
          return
>>>
>>> There is an awkward race window between checking that the mutex is
>>> locked and actually attempting to take it below.  It's probably not
>>> a big deal here, but it's better to take the lock at the same time
>>> with checking, e.g.:
>>>
>>>    if not mutex.acquire(blocking=False):
>>>    return
>>>
>>> And since we're going to end the process anyway (it's a handler
>>> for fatal signals after all), we don't actually need to unlock it,
>>> AFAICT.
> 
> Agree it.
> 
>>>
 -    recurse = True


Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-22 Thread Daniel Ding



> 2024年2月22日 下午7:20,Ilya Maximets  写道:
> 
> On 2/22/24 11:53, Ilya Maximets wrote:
>> On 2/22/24 11:02, Daniel Ding wrote:
>>> 
>>> 
 2024年2月22日 上午1:55,Ilya Maximets >>> > 写道:
 
 On 2/21/24 15:27, Aaron Conole wrote:
> Daniel Ding mailto:danieldin...@gmail.com>> 
> writes:
> 
>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>> raise the following exception.
>> 
>> Error in atexit._run_exitfuncs:
>> Traceback (most recent call last):
>>  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>ovsdb = OVSDB(db_sock)
>>  File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>>  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>while idl.change_seqno == seq and not idl.run():
>> 
>> Signed-off-by: Daniel Ding > >
>> ---
> 
> LGTM for the linux side - maybe Alin might check the windows side.
 
 The code is copied from the fatal-signla library, so it is likely fine.
 However, I don;t think this issue should be fixed on the application
 level (ovs-tcpdump).   There seems to be adifference in how C and python
 fatal-signla libraries behave.  The C version calls the hooks in the run()
 function and the signal handler only updates the signal number variable.
 So, if the same signal arrives multiple times it will be handled only once
 and the run() function will not exit until all the hooks are done, unless
 it is a segfault.
 
 With python implementation we're calling hooks right from the signal
 handler (it's not a real signal handler, so we're allowed to use not
 really singal-safe functions there).  So, if another signal arrives while
 we're executing the hooks, the second hook execution will be skipped
 due to 'recursion' check, but the signal will be re-raised immediately,
 killing the process.
>>> 
>>> As your suggestions, I rewrite recurse as a threading.Lock. So that protect 
>>> calling hooks
>>> applications registered.
>>> 
 
 There is likley a way to fix that by using a mutex instead of recursion
 check and blocking it while executing the hooks.  The signal number
 will need to be stored in the signal handler and the signal will need
 to be re-raised in the call_hooks() instead of signal handler.
 We can use mutex.acquire(blocking=False) as a way to prevent recursion.
 
>>> 
>>> To avoid breaking calling hooks, I set the signal restored in the signal 
>>> handler to SIG_IGN in 
>>> call_hooks.
>> 
>> I'm not sure this is necessary, see below.

The default handler of SIGINT is default_int_handler, so it not register the 
signal handler
successfully. When received CTRL+C again, the program be break, and calling 
hook can't
execute completely.

>> 
>>> And move re-raised the signal in the call_hooks with locked.
>>> 
 Does that make sense?
 
 Best regards, Ilya Maximets.
>>> 
>>> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
>>> index cb2e99e87..3aa56a166 100644
>>> --- a/python/ovs/fatal_signal.py
>>> +++ b/python/ovs/fatal_signal.py
>>> @@ -16,6 +16,7 @@ import atexit
>>>  import os
>>>  import signal
>>>  import sys
>>> +import threading
>>> 
>>>  import ovs.vlog
>>> 
>>> @@ -112,28 +113,34 @@ def _unlink(file_):
>>>  def _signal_handler(signr, _):
>>>  _call_hooks(signr)
>>> 
>>> -# Re-raise the signal with the default handling so that the program
>>> -# termination status reflects that we were killed by this signal.
>>> -signal.signal(signr, signal.SIG_DFL)
>>> -os.kill(os.getpid(), signr)
>>> -
>>> 
>>>  def _atexit_handler():
>>>  _call_hooks(0)
>>> 
>>> 
>>> -recurse = False
>>> +recurse = threading.Lock()
>> 
>> It's better to rename this to just 'mutex' or 'lock'.

Done

>> 
>>> 
>>> 
>>>  def _call_hooks(signr):
>>>  global recurse
>>> -if recurse:
>>> +if recurse.locked():
>>>  return
>> 
>> There is an awkward race window between checking that the mutex is
>> locked and actually attempting to take it below.  It's probably not
>> a big deal here, but it's better to take the lock at the same time
>> with checking, e.g.:
>> 
>>if not mutex.acquire(blocking=False):
>>return
>> 
>> And since we're going to end the process anyway (it's a handler
>> for fatal signals after all), we don't actually need to unlock it,
>> AFAICT.

Agree it.

>> 
>>> -recurse = True
>>> 
>>> -for hook, cancel, run_at_exit in _hooks:
>>> -if signr != 0 or run_at_exit:
>>> -hook()
>>> +with recurse:
>>> +if signr != 0:
>>> +signal.signal(signr, signal.SIG_IGN)
>>> +else:
>>> +signal.signal(signal.SIGINT, signal.SIG_IGN)
>> 
>> If another signal arrives while we're under the lock, the signal
>> handler will 

Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-22 Thread Ilya Maximets
On 2/22/24 11:53, Ilya Maximets wrote:
> On 2/22/24 11:02, Daniel Ding wrote:
>>
>>
>>> 2024年2月22日 上午1:55,Ilya Maximets >> > 写道:
>>>
>>> On 2/21/24 15:27, Aaron Conole wrote:
 Daniel Ding mailto:danieldin...@gmail.com>> 
 writes:

> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
> raise the following exception.
>
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
>  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>    ovsdb = OVSDB(db_sock)
>  File "/usr/bin/ovs-tcpdump", line 168, in __init__
>    OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>    while idl.change_seqno == seq and not idl.run():
>
> Signed-off-by: Daniel Ding  >
> ---

 LGTM for the linux side - maybe Alin might check the windows side.
>>>
>>> The code is copied from the fatal-signla library, so it is likely fine.
>>> However, I don;t think this issue should be fixed on the application
>>> level (ovs-tcpdump).   There seems to be adifference in how C and python
>>> fatal-signla libraries behave.  The C version calls the hooks in the run()
>>> function and the signal handler only updates the signal number variable.
>>> So, if the same signal arrives multiple times it will be handled only once
>>> and the run() function will not exit until all the hooks are done, unless
>>> it is a segfault.
>>>
>>> With python implementation we're calling hooks right from the signal
>>> handler (it's not a real signal handler, so we're allowed to use not
>>> really singal-safe functions there).  So, if another signal arrives while
>>> we're executing the hooks, the second hook execution will be skipped
>>> due to 'recursion' check, but the signal will be re-raised immediately,
>>> killing the process.
>>
>> As your suggestions, I rewrite recurse as a threading.Lock. So that protect 
>> calling hooks
>> applications registered.
>>
>>>
>>> There is likley a way to fix that by using a mutex instead of recursion
>>> check and blocking it while executing the hooks.  The signal number
>>> will need to be stored in the signal handler and the signal will need
>>> to be re-raised in the call_hooks() instead of signal handler.
>>> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>>>
>>
>> To avoid breaking calling hooks, I set the signal restored in the signal 
>> handler to SIG_IGN in 
>> call_hooks.
> 
> I'm not sure this is necessary, see below.
> 
>> And move re-raised the signal in the call_hooks with locked.
>>
>>> Does that make sense?
>>>
>>> Best regards, Ilya Maximets.
>>
>> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
>> index cb2e99e87..3aa56a166 100644
>> --- a/python/ovs/fatal_signal.py
>> +++ b/python/ovs/fatal_signal.py
>> @@ -16,6 +16,7 @@ import atexit
>>  import os
>>  import signal
>>  import sys
>> +import threading
>>
>>  import ovs.vlog
>>
>> @@ -112,28 +113,34 @@ def _unlink(file_):
>>  def _signal_handler(signr, _):
>>      _call_hooks(signr)
>>
>> -    # Re-raise the signal with the default handling so that the program
>> -    # termination status reflects that we were killed by this signal.
>> -    signal.signal(signr, signal.SIG_DFL)
>> -    os.kill(os.getpid(), signr)
>> -
>>
>>  def _atexit_handler():
>>      _call_hooks(0)
>>
>>
>> -recurse = False
>> +recurse = threading.Lock()
> 
> It's better to rename this to just 'mutex' or 'lock'.
> 
>>
>>
>>  def _call_hooks(signr):
>>      global recurse
>> -    if recurse:
>> +    if recurse.locked():
>>          return
> 
> There is an awkward race window between checking that the mutex is
> locked and actually attempting to take it below.  It's probably not
> a big deal here, but it's better to take the lock at the same time
> with checking, e.g.:
> 
> if not mutex.acquire(blocking=False):
> return
> 
> And since we're going to end the process anyway (it's a handler
> for fatal signals after all), we don't actually need to unlock it,
> AFAICT.
> 
>> -    recurse = True
>>
>> -    for hook, cancel, run_at_exit in _hooks:
>> -        if signr != 0 or run_at_exit:
>> -            hook()
>> +    with recurse:
>> +        if signr != 0:
>> +            signal.signal(signr, signal.SIG_IGN)
>> +        else:
>> +            signal.signal(signal.SIGINT, signal.SIG_IGN)
> 
> If another signal arrives while we're under the lock, the signal
> handler will call the _call_hooks(), check that the lock is already
> taken and return.  There is no need to ignore it.
> Also, if a different signal arrives the code above will not ignore
> it anyway.
> 
>> +
>> +        for hook, cancel, run_at_exit in _hooks:
>> +            if signr != 0 or run_at_exit:
>> +                hook()
>> +
>> +        if signr != 0:
>> +           # Re-raise the signal with the default handling so that 

Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-22 Thread Ilya Maximets
On 2/22/24 11:02, Daniel Ding wrote:
> 
> 
>> 2024年2月22日 上午1:55,Ilya Maximets > > 写道:
>>
>> On 2/21/24 15:27, Aaron Conole wrote:
>>> Daniel Ding mailto:danieldin...@gmail.com>> writes:
>>>
 After running ovs-tcpdump and inputs multiple CTRL+C, the program will
 raise the following exception.

 Error in atexit._run_exitfuncs:
 Traceback (most recent call last):
  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
    ovsdb = OVSDB(db_sock)
  File "/usr/bin/ovs-tcpdump", line 168, in __init__
    OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
    while idl.change_seqno == seq and not idl.run():

 Signed-off-by: Daniel Ding >>> >
 ---
>>>
>>> LGTM for the linux side - maybe Alin might check the windows side.
>>
>> The code is copied from the fatal-signla library, so it is likely fine.
>> However, I don;t think this issue should be fixed on the application
>> level (ovs-tcpdump).   There seems to be adifference in how C and python
>> fatal-signla libraries behave.  The C version calls the hooks in the run()
>> function and the signal handler only updates the signal number variable.
>> So, if the same signal arrives multiple times it will be handled only once
>> and the run() function will not exit until all the hooks are done, unless
>> it is a segfault.
>>
>> With python implementation we're calling hooks right from the signal
>> handler (it's not a real signal handler, so we're allowed to use not
>> really singal-safe functions there).  So, if another signal arrives while
>> we're executing the hooks, the second hook execution will be skipped
>> due to 'recursion' check, but the signal will be re-raised immediately,
>> killing the process.
> 
> As your suggestions, I rewrite recurse as a threading.Lock. So that protect 
> calling hooks
> applications registered.
> 
>>
>> There is likley a way to fix that by using a mutex instead of recursion
>> check and blocking it while executing the hooks.  The signal number
>> will need to be stored in the signal handler and the signal will need
>> to be re-raised in the call_hooks() instead of signal handler.
>> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
>>
> 
> To avoid breaking calling hooks, I set the signal restored in the signal 
> handler to SIG_IGN in 
> call_hooks.

I'm not sure this is necessary, see below.

> And move re-raised the signal in the call_hooks with locked.
> 
>> Does that make sense?
>>
>> Best regards, Ilya Maximets.
> 
> diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
> index cb2e99e87..3aa56a166 100644
> --- a/python/ovs/fatal_signal.py
> +++ b/python/ovs/fatal_signal.py
> @@ -16,6 +16,7 @@ import atexit
>  import os
>  import signal
>  import sys
> +import threading
> 
>  import ovs.vlog
> 
> @@ -112,28 +113,34 @@ def _unlink(file_):
>  def _signal_handler(signr, _):
>      _call_hooks(signr)
> 
> -    # Re-raise the signal with the default handling so that the program
> -    # termination status reflects that we were killed by this signal.
> -    signal.signal(signr, signal.SIG_DFL)
> -    os.kill(os.getpid(), signr)
> -
> 
>  def _atexit_handler():
>      _call_hooks(0)
> 
> 
> -recurse = False
> +recurse = threading.Lock()

It's better to rename this to just 'mutex' or 'lock'.

> 
> 
>  def _call_hooks(signr):
>      global recurse
> -    if recurse:
> +    if recurse.locked():
>          return

There is an awkward race window between checking that the mutex is
locked and actually attempting to take it below.  It's probably not
a big deal here, but it's better to take the lock at the same time
with checking, e.g.:

if not mutex.acquire(blocking=False):
return

And since we're going to end the process anyway (it's a handler
for fatal signals after all), we don't actually need to unlock it,
AFAICT.

> -    recurse = True
> 
> -    for hook, cancel, run_at_exit in _hooks:
> -        if signr != 0 or run_at_exit:
> -            hook()
> +    with recurse:
> +        if signr != 0:
> +            signal.signal(signr, signal.SIG_IGN)
> +        else:
> +            signal.signal(signal.SIGINT, signal.SIG_IGN)

If another signal arrives while we're under the lock, the signal
handler will call the _call_hooks(), check that the lock is already
taken and return.  There is no need to ignore it.
Also, if a different signal arrives the code above will not ignore
it anyway.

> +
> +        for hook, cancel, run_at_exit in _hooks:
> +            if signr != 0 or run_at_exit:
> +                hook()
> +
> +        if signr != 0:
> +           # Re-raise the signal with the default handling so that the 
> program
> +           # termination status reflects that we were killed by this signal.
> +           signal.signal(signr, signal.SIG_DFL)
> +           os.kill(os.getpid(), signr)

We may 

Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-22 Thread Daniel Ding


> 2024年2月22日 上午1:55,Ilya Maximets  写道:
> 
> On 2/21/24 15:27, Aaron Conole wrote:
>> Daniel Ding  writes:
>> 
>>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>>> raise the following exception.
>>> 
>>> Error in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>>  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>>>ovsdb = OVSDB(db_sock)
>>>  File "/usr/bin/ovs-tcpdump", line 168, in __init__
>>>OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>>>  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>>>while idl.change_seqno == seq and not idl.run():
>>> 
>>> Signed-off-by: Daniel Ding 
>>> ---
>> 
>> LGTM for the linux side - maybe Alin might check the windows side.
> 
> The code is copied from the fatal-signla library, so it is likely fine.
> However, I don;t think this issue should be fixed on the application
> level (ovs-tcpdump).   There seems to be adifference in how C and python
> fatal-signla libraries behave.  The C version calls the hooks in the run()
> function and the signal handler only updates the signal number variable.
> So, if the same signal arrives multiple times it will be handled only once
> and the run() function will not exit until all the hooks are done, unless
> it is a segfault.
> 
> With python implementation we're calling hooks right from the signal
> handler (it's not a real signal handler, so we're allowed to use not
> really singal-safe functions there).  So, if another signal arrives while
> we're executing the hooks, the second hook execution will be skipped
> due to 'recursion' check, but the signal will be re-raised immediately,
> killing the process.

As your suggestions, I rewrite recurse as a threading.Lock. So that protect 
calling hooks
applications registered.

> 
> There is likley a way to fix that by using a mutex instead of recursion
> check and blocking it while executing the hooks.  The signal number
> will need to be stored in the signal handler and the signal will need
> to be re-raised in the call_hooks() instead of signal handler.
> We can use mutex.acquire(blocking=False) as a way to prevent recursion.
> 

To avoid breaking calling hooks, I set the signal restored in the signal 
handler to SIG_IGN in 
call_hooks. And move re-raised the signal in the call_hooks with locked.

> Does that make sense?
> 
> Best regards, Ilya Maximets.

diff --git a/python/ovs/fatal_signal.py b/python/ovs/fatal_signal.py
index cb2e99e87..3aa56a166 100644
--- a/python/ovs/fatal_signal.py
+++ b/python/ovs/fatal_signal.py
@@ -16,6 +16,7 @@ import atexit
 import os
 import signal
 import sys
+import threading

 import ovs.vlog

@@ -112,28 +113,34 @@ def _unlink(file_):
 def _signal_handler(signr, _):
 _call_hooks(signr)

-# Re-raise the signal with the default handling so that the program
-# termination status reflects that we were killed by this signal.
-signal.signal(signr, signal.SIG_DFL)
-os.kill(os.getpid(), signr)
-

 def _atexit_handler():
 _call_hooks(0)


-recurse = False
+recurse = threading.Lock()


 def _call_hooks(signr):
 global recurse
-if recurse:
+if recurse.locked():
 return
-recurse = True

-for hook, cancel, run_at_exit in _hooks:
-if signr != 0 or run_at_exit:
-hook()
+with recurse:
+if signr != 0:
+signal.signal(signr, signal.SIG_IGN)
+else:
+signal.signal(signal.SIGINT, signal.SIG_IGN)
+
+for hook, cancel, run_at_exit in _hooks:
+if signr != 0 or run_at_exit:
+hook()
+
+if signr != 0:
+   # Re-raise the signal with the default handling so that the program
+   # termination status reflects that we were killed by this signal.
+   signal.signal(signr, signal.SIG_DFL)
+   os.kill(os.getpid(), signr)


 _inited = False
@@ -165,7 +172,6 @@ def signal_alarm(timeout):

 if sys.platform == "win32":
 import time
-import threading

 class Alarm (threading.Thread):
 def __init__(self, timeout):
--
2.43.0

Thanks Ilya Maximets, please help me review again.

Best regards, Daniel Ding.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-21 Thread Ilya Maximets
On 2/21/24 15:27, Aaron Conole wrote:
> Daniel Ding  writes:
> 
>> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
>> raise the following exception.
>>
>> Error in atexit._run_exitfuncs:
>> Traceback (most recent call last):
>>   File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
>> ovsdb = OVSDB(db_sock)
>>   File "/usr/bin/ovs-tcpdump", line 168, in __init__
>> OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>>   File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
>> while idl.change_seqno == seq and not idl.run():
>>
>> Signed-off-by: Daniel Ding 
>> ---
> 
> LGTM for the linux side - maybe Alin might check the windows side.

The code is copied from the fatal-signla library, so it is likely fine.
However, I don;t think this issue should be fixed on the application
level (ovs-tcpdump).   There seems to be adifference in how C and python
fatal-signla libraries behave.  The C version calls the hooks in the run()
function and the signal handler only updates the signal number variable.
So, if the same signal arrives multiple times it will be handled only once
and the run() function will not exit until all the hooks are done, unless
it is a segfault.

With python implementation we're calling hooks right from the signal
handler (it's not a real signal handler, so we're allowed to use not
really singal-safe functions there).  So, if another signal arrives while
we're executing the hooks, the second hook execution will be skipped
due to 'recursion' check, but the signal will be re-raised immediately,
killing the process.

There is likley a way to fix that by using a mutex instead of recursion
check and blocking it while executing the hooks.  The signal number
will need to be stored in the signal handler and the signal will need
to be re-raised in the call_hooks() instead of signal handler.
We can use mutex.acquire(blocking=False) as a way to prevent recursion.

Does that make sense?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-21 Thread Aaron Conole
Daniel Ding  writes:

> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
> raise the following exception.
>
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
>   File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
> ovsdb = OVSDB(db_sock)
>   File "/usr/bin/ovs-tcpdump", line 168, in __init__
> OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>   File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
> while idl.change_seqno == seq and not idl.run():
>
> Signed-off-by: Daniel Ding 
> ---

LGTM for the linux side - maybe Alin might check the windows side.

When you post v2 you can keep my

Reviewed-by: Aaron Conole 

>  utilities/ovs-tcpdump.in | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
> index 4cbd9a5d3..eec2262d6 100755
> --- a/utilities/ovs-tcpdump.in
> +++ b/utilities/ovs-tcpdump.in
> @@ -17,6 +17,7 @@
>  import os
>  import pwd
>  from random import randint
> +import signal
>  import subprocess
>  import sys
>  import time
> @@ -417,8 +418,22 @@ def py_which(executable):
> for path in os.environ["PATH"].split(os.pathsep))
>  
>  
> +def ignore_fatal_signals():
> +if sys.platform == "win32":
> +signals = [signal.SIGTERM, signal.SIGINT]
> +else:
> +signals = [signal.SIGTERM, signal.SIGINT, signal.SIGHUP,
> +   signal.SIGALRM]
> +
> +for signr in signals:
> +signal.signal(signr, signal.SIG_IGN)
> +
> +

Just a nit, but it might be clearer to put the ignore_fatal_signals as a
function scoped in teardown below.  That way it is a bit clearer that
this will only be called to turn off these in the double failure case.

>  def teardown(db_sock, interface, mirror_interface, tap_created):
>  def cleanup_mirror():
> +# Ignore fatal signals, avoid it to break OVSDB operations.
> +# So that cleanup mirror and ports be finished.
> +ignore_fatal_signals()
>  try:
>  ovsdb = OVSDB(db_sock)
>  ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-20 Thread 0-day Robot
Bleep bloop.  Greetings Daniel Ding, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject summary should end with a dot.
Subject: ovs-tcpdump: Cleanup mirror failed with twice fatal signals
ERROR: Author Daniel Ding  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Daniel Ding 
Lines checked: 61, Warnings: 2, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-20 Thread Daniel Ding
After running ovs-tcpdump and inputs multiple CTRL+C, the program will
raise the following exception.

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
ovsdb = OVSDB(db_sock)
  File "/usr/bin/ovs-tcpdump", line 168, in __init__
OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
  File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
while idl.change_seqno == seq and not idl.run():

Signed-off-by: Daniel Ding 
---
 utilities/ovs-tcpdump.in | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
index 4cbd9a5d3..eec2262d6 100755
--- a/utilities/ovs-tcpdump.in
+++ b/utilities/ovs-tcpdump.in
@@ -17,6 +17,7 @@
 import os
 import pwd
 from random import randint
+import signal
 import subprocess
 import sys
 import time
@@ -417,8 +418,22 @@ def py_which(executable):
for path in os.environ["PATH"].split(os.pathsep))
 
 
+def ignore_fatal_signals():
+if sys.platform == "win32":
+signals = [signal.SIGTERM, signal.SIGINT]
+else:
+signals = [signal.SIGTERM, signal.SIGINT, signal.SIGHUP,
+   signal.SIGALRM]
+
+for signr in signals:
+signal.signal(signr, signal.SIG_IGN)
+
+
 def teardown(db_sock, interface, mirror_interface, tap_created):
 def cleanup_mirror():
+# Ignore fatal signals, avoid it to break OVSDB operations.
+# So that cleanup mirror and ports be finished.
+ignore_fatal_signals()
 try:
 ovsdb = OVSDB(db_sock)
 ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))
-- 
2.43.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev