Thanks Neel.  Is this fix is in 4.3.2 ?

On Mar 5, 2014, at 7:42 AM, Neelakanta Reddy 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

The similar problem is fixed in  http://sourceforge.net/p/opensaf/tickets/600/.
The patch is pushed in changeset: 4688 for 4.3.x.

Apply the patch and retest.

If you still see the problem, please share the following logs:

1. amfd and amfnd traces of controllers and the payload

2. syslog of controllers and payload.

3. mds.log for controllers and payload.

/Neel.


On Wednesday 05 March 2014 05:25 PM, Tony Hart wrote:

5 seconds

The payload card gets the TIPC timeout logs, but it does not reboot.  This 
maybe timing related since the link re-establishes quickly after the down (you 
can see from the logs that the link re-established within the same second of 
going down).

On Mar 5, 2014, at 6:51 AM, Neelakanta Reddy 
<[email protected]<mailto:[email protected]>> wrote:

HI,

what is the configured TIPC link tolerance time?
Depending on the tolerance time, the other node will get service down.

/Neel.

On Tuesday 04 March 2014 08:53 PM, Tony Hart wrote:

We’re seeing a problem where there is a loss of connectivity between a payload 
(cmm02B) and the controller (the connectivity returns but is away just long 
enough to trigger a TIPC timeout) in this case the payload is dropped from the 
cluster but the payload doesn’t restart.  The payload is flagged as not being 
in the cluster and its presence state is UNINSTANTIATED.  Its still running the 
osaf processes though.

Is this something that’s been fixed in the current release (we’re running 4.3.1)

$ immlist safNode=cmm02b,safCluster=myClmCluster
Name                                               Type         Value(s)
========================================================================
safNode                                            SA_STRING_T  safNode=cmm02b
saClmNodeLockCallbackTimeout                       SA_TIME_T    50000000000 
(0xba43b7400, Thu Jan  1 00:00:50 1970)
saClmNodeIsMember                                  SA_UINT32_T  0 (0x0)
saClmNodeInitialViewNumber                         SA_UINT64_T  28 (0x1c)
saClmNodeID                                        SA_UINT32_T  73743 (0x1200f)
saClmNodeEE                                        SA_NAME_T    <Empty>
saClmNodeDisableReboot                             SA_UINT32_T  0 (0x0)
saClmNodeCurrAddressFamily                         SA_UINT32_T  <Empty>
saClmNodeCurrAddress                               SA_STRING_T  <Empty>
saClmNodeBootTimeStamp                             SA_TIME_T    
1393879646000000000 (0x13580e27277a2c00, Mon Mar  3 20:47:26 2014)
saClmNodeAdminState                                SA_UINT32_T  1 (0x1)
saClmNodeAddressFamily                             SA_UINT32_T  <Empty>
saClmNodeAddress                                   SA_STRING_T  <Empty>
SaImmAttrImplementerName                           SA_STRING_T  safClmService
SaImmAttrClassName                                 SA_STRING_T  SaClmNode
SaImmAttrAdminOwnerName                            SA_STRING_T  IMMLOADER


$ immlist $(amf-find node | grep CMM02B)
Name                                               Type         Value(s)
========================================================================
safAmfNode                                         SA_STRING_T  
safAmfNode=CMM02B
saAmfNodeSuFailoverMax                             SA_UINT32_T  2 (0x2)
saAmfNodeSuFailOverProb                            SA_TIME_T    1200000000000 
(0x1176592e000, Thu Jan  1 00:20:00 1970)
saAmfNodeOperState                                 SA_UINT32_T  2 (0x2)
saAmfNodeFailfastOnTerminationFailure              SA_UINT32_T  0 (0x0)
saAmfNodeFailfastOnInstantiationFailure            SA_UINT32_T  0 (0x0)
saAmfNodeClmNode                                   SA_NAME_T    
safNode=cmm02b,safCluster=myClmCluster (38)
saAmfNodeCapacity                                  SA_STRING_T  <Empty>
saAmfNodeAutoRepair                                SA_UINT32_T  1 (0x1)
saAmfNodeAdminState                                SA_UINT32_T  1 (0x1)
SaImmAttrImplementerName                           SA_STRING_T  safAmfService
SaImmAttrClassName                                 SA_STRING_T  SaAmfNode
SaImmAttrAdminOwnerName                            SA_STRING_T  IMMLOADER


cmm02b$ ps aux | grep osaf
root      1417  0.0  0.0 225880  2028 ?        Ssl  Mar03   0:08 
/usr/lib64/opensaf/osafamfnd osafamfnd
root      1429  0.0  0.0 157100  1416 ?        Ssl  Mar03   0:00 
/usr/lib64/opensaf/osafsmfnd osafsmfnd
opensaf   1438  0.0  0.1 174256  5764 ?        Ssl  Mar03   0:00 
/usr/lib64/opensaf/osafmsgnd osafmsgnd
opensaf   1454  0.0  0.0 155732  1448 ?        Ssl  Mar03   0:00 
/usr/lib64/opensaf/osaflcknd osaflcknd
opensaf   1463  0.0  0.0 158148  2296 ?        Ssl  Mar03   0:00 
/usr/lib64/opensaf/osafckptnd osafckptnd
opensaf   1472  0.0  0.0 155020  1392 ?        Ssl  Mar03   0:02 
/usr/lib64/opensaf/osafamfwd osafamfwd
opensaf   4704  0.0  0.3 182240 11992 ?        Ssl  14:20   0:01 
/usr/lib64/opensaf/osafimmnd osafimmnd



SCM1 (1.1.15)
-------------------
2014-03-04T14:20:18.808187+00:00 scm1 osafamfd[1771]: NO Node 'PLD0211' left 
the cluster
2014-03-04T14:20:18.851318+00:00 scm1 kernel: TIPC: Established link 
<1.1.15:eth2-1.1.27:bond0> on network plane A
2014-03-04T14:20:18.852749+00:00 scm1 osafsmfd[1965]: ER saClmClusterNodeGet 
failed, rc=SA_AIS_ERR_NOT_EXIST (12)
2014-03-04T14:20:18.858472+00:00 scm1 kernel: TIPC: Established link 
<1.1.15:eth2-1.1.23:bond0> on network plane A
2014-03-04T14:20:18.871084+00:00 scm1 osafsmfd[1965]: ER saClmClusterNodeGet 
failed, rc=SA_AIS_ERR_NOT_EXIST (12)
2014-03-04T14:20:18.956307+00:00 scm1 kernel: TIPC: Resetting link 
<1.1.15:eth2-1.1.32:eth2>, peer not responding
2014-03-04T14:20:18.956330+00:00 scm1 kernel: TIPC: Lost link 
<1.1.15:eth2-1.1.32:eth2> on network plane A
2014-03-04T14:20:18.956335+00:00 scm1 kernel: TIPC: Lost contact with <1.1.32>
2014-03-04T14:20:18.956340+00:00 scm1 kernel: TIPC: Established link 
<1.1.15:eth2-1.1.32:eth2> on network plane A
2014-03-04T14:20:18.958227+00:00 scm1 osafimmnd[1667]: NO Global discard node 
received for nodeId:1200f pid:1347
2014-03-04T14:20:18.958270+00:00 scm1 osafimmnd[1667]: NO Implementer 
disconnected 51 <0, 1200f(down)> (MsgQueueService73743)
2014-03-04T14:20:18.965240+00:00 scm1 osafimmnd[1667]: NO Implementer 
connected: 71 (MsgQueueService73743) <92377, 10f0f>
2014-03-04T14:20:18.968251+00:00 scm1 osafimmnd[1667]: NO Implementer locally 
disconnected. Marking it as doomed 71 <92377, 10f0f> (MsgQueueService73743)
2014-03-04T14:20:18.971785+00:00 scm1 osafimmnd[1667]: NO Global discard node 
received for nodeId:1170f pid:0
2014-03-04T14:20:18.973013+00:00 scm1 osafimmnd[1667]: NO Global discard node 
received for nodeId:1200f pid:0
2014-03-04T14:20:18.976586+00:00 scm1 osafimmnd[1667]: NO Implementer 
disconnected 71 <92377, 10f0f> (MsgQueueService73743)
2014-03-04T14:20:19.025760+00:00 scm1 osafimmd[1657]: NO Node 11e0f request 
sync sync-pid:23769 epoch:0
2014-03-04T14:20:19.076427+00:00 scm1 osafamfd[1771]: NO Node 'PLD0214' left 
the cluster
2014-03-04T14:20:19.215220+00:00 scm1 osafimmd[1657]: NO Node 11c0f request 
sync sync-pid:23629 epoch:0
2014-03-04T14:20:19.296817+00:00 scm1 osafamfd[1771]: WA avd_msg_sanity_chk: 
invalid node ID (11e0f)
2014-03-04T14:20:19.300899+00:00 scm1 osafamfd[1771]: WA avd_msg_sanity_chk: 
invalid node ID (11e0f)
2014-03-04T14:20:19.305377+00:00 scm1 osafamfd[1771]: NO Node 'CMM02B' left the 
cluster
2014-03-04T14:20:19.357458+00:00 scm1 osafimmd[1657]: NO Node 1200f request 
sync sync-pid:4704 epoch:0


cmm02B (1.1.32)
-----------------------
2014-03-04T14:20:18.495174+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.10:bond0>, peer not responding
2014-03-04T14:20:18.495203+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.10:bond0> on network plane A
2014-03-04T14:20:18.495209+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.10>
2014-03-04T14:20:18.501981+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.15:eth2>, peer not responding
2014-03-04T14:20:18.502012+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.15:eth2> on network plane A
2014-03-04T14:20:18.502016+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.15>
2014-03-04T14:20:18.502020+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.11:bond0>, peer not responding
2014-03-04T14:20:18.502023+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.11:bond0> on network plane A
2014-03-04T14:20:18.502026+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.11>
2014-03-04T14:20:18.502110+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.1:bond0>, peer not responding
2014-03-04T14:20:18.502115+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.1:bond0> on network plane A
2014-03-04T14:20:18.502118+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.1>
2014-03-04T14:20:18.549154+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.14:bond0>, peer not responding
2014-03-04T14:20:18.549180+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.14:bond0> on network plane A
2014-03-04T14:20:18.549184+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.14>
2014-03-04T14:20:18.671107+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.14:bond0> on network plane A
2014-03-04T14:20:18.743482+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.11:bond0> on network plane A
2014-03-04T14:20:18.866277+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.10:bond0> on network plane A
2014-03-04T14:20:18.869280+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.1:bond0> on network plane A
2014-03-04T14:20:18.954740+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.15:eth2> on network plane A
2014-03-04T14:20:18.959226+00:00 cmm02b osafimmnd[1347]: WA MESSAGE:38632 OUT 
OF ORDER my highest processed:38600, exiting
2014-03-04T14:20:18.967269+00:00 cmm02b osafamfnd[1417]: NO 
'safComp=IMMND,safSu=CMM02B,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'
2014-03-04T14:20:19.052569+00:00 cmm02b osafimmnd[4704]: Started
2014-03-04T14:20:19.157393+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
2014-03-04T14:20:19.257835+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
2014-03-04T14:20:19.358134+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
2014-03-04T14:20:19.358452+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_ISOLATED
2014-03-04T14:20:19.955686+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_W_AVAILABLE
2014-03-04T14:20:20.022473+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
2014-03-04T14:20:26.925158+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.31:eth2>, peer not responding
2014-03-04T14:20:26.925184+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.31:eth2> on network plane A
2014-03-04T14:20:26.925191+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.31>
2014-03-04T14:20:27.893115+00:00 cmm02b kernel: TIPC: Resetting link 
<1.1.32:eth2-1.1.27:bond0>, peer not responding
2014-03-04T14:20:27.893148+00:00 cmm02b kernel: TIPC: Lost link 
<1.1.32:eth2-1.1.27:bond0> on network plane A
2014-03-04T14:20:27.893154+00:00 cmm02b kernel: TIPC: Lost contact with <1.1.27>
2014-03-04T14:20:32.026411+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 2144
2014-03-04T14:20:32.026463+00:00 cmm02b osafimmnd[4704]: NO RepositoryInitModeT 
is SA_IMM_INIT_FROM_FILE
2014-03-04T14:20:32.026493+00:00 cmm02b osafimmnd[4704]: NO Epoch set to 22 in 
ImmModel
2014-03-04T14:20:32.031737+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 72 (MsgQueueService73743) <67, 1200f>
2014-03-04T14:20:32.035966+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 73 (MsgQueueService73231) <0, 11e0f>
2014-03-04T14:20:32.041233+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
2014-03-04T14:20:32.042213+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 74 (MsgQueueService72719) <0, 11c0f>
2014-03-04T14:20:32.047252+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 75 (MsgQueueService71439) <0, 1170f>
2014-03-04T14:20:46.911220+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 76 (MsgQueueService73487) <0, 10f0f>
2014-03-04T14:20:46.920751+00:00 cmm02b osafimmnd[4704]: NO Implementer 
disconnected 76 <0, 10f0f> (MsgQueueService73487)
2014-03-04T14:20:48.012244+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 77 (MsgQueueService72463) <0, 10f0f>
2014-03-04T14:20:48.014779+00:00 cmm02b osafimmnd[4704]: NO Implementer 
disconnected 77 <0, 10f0f> (MsgQueueService72463)
2014-03-04T14:21:13.653100+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.31:eth2> on network plane A
2014-03-04T14:21:14.052200+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
2014-03-04T14:21:20.913433+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 14277
2014-03-04T14:21:20.913963+00:00 cmm02b osafimmnd[4704]: NO Epoch set to 23 in 
ImmModel
2014-03-04T14:21:21.419157+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 78 (MsgQueueService73487) <0, 11f0f>
2014-03-04T14:21:40.874192+00:00 cmm02b kernel: TIPC: Established link 
<1.1.32:eth2-1.1.27:bond0> on network plane A
2014-03-04T14:21:42.179625+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
2014-03-04T14:21:46.871328+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 14277
2014-03-04T14:21:46.871755+00:00 cmm02b osafimmnd[4704]: NO Epoch set to 24 in 
ImmModel
2014-03-04T14:21:47.649858+00:00 cmm02b osafimmnd[4704]: NO Implementer 
connected: 79 (MsgQueueService72463) <0, 11b0f>

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/opensaf-users





------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to