Hi Hans,


As I said earlier this patch is for the cases where Remote fencing is not 
enabled. Stonith is valid only when Remote fencing is enabled.


Also the ideal solution in this scenario is CLM taking complete responsibility 
of fencing the node and AMF should depend on CLM Notification for doing role 

In that case we won't see two Active SU's at the same time.

The patch is a temporary solution only where we are trying to Isolate the 
faulted node immediately.






Hi Ravi,


stonith is not only valid for virutalized environment, I assume stonith 
supports other e.g. ipmi in a legacy environment. The probability for 
"flickering" may be higher in a virtualized environment,

but for redundancy there should be two interfaces configured, which is the 
normal configuration in legacy. If the problem in this ticket is solved by 
using stonith I don't see a need for adding this patch.

BTW do this patch work when stonith is enabled?

On 04/13/2018 10:59 AM, Ravi Sekhar Reddy Konda wrote:

HI Hans,


The use case that we are addressing here is link flickering  when remote 
fencing is not enabled, Also remote fencing using Stonith is valid only in 
Virtualization environments. I have not tested using Stonith enabled as the use 
case is in the case where remote fencing is disabled.






Hi Ravi,


I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled? 


 scripts/opensaf_reboot | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
@@ -118,7 +121,17 @@ else
                 # uncomment the following line if debugging errors that keep 
restarting the node
                 # exit 0
+                # If the application is using different interface for cluster 
communication, please
+                # add your application specific isolation commands here
                 logger -t "opensaf_reboot" "Rebooting local node; 
+                # Isolate the node
+                if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+                   tipc-config -bd eth:$TIPC_ETH_IF
+                else
+                   $icmd pkill -STOP osafdtmd
+                fi
                 # Start a reboot supervision background process. Note that a 
                 # supervision is also done in the opensaf_reboot() function in 
@@ -128,12 +141,6 @@ else
                         (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
-               # Stop some important opensaf processes to prevent bad things 
from happening
-               $icmd pkill -STOP osafamfwd
-               $icmd pkill -STOP osafamfnd
-               $icmd pkill -STOP osafamfd
-               $icmd pkill -STOP osaffmd
                 # Flush OpenSAF internal log server messages to disk.
                 $bindir/osaflog --flush

