How we will go about changing /etc/opensaf/nodeinit.conf.controller & /etc/opensaf/nodeinit.conf.payload TIME-OUT tuned values based on number of node at build time?
-AVM On 3/30/2016 3:48 PM, Anders Widell wrote: > Ok, good that you got it working. I see that you actually generate an > IMM configuration with 1000 nodes. Then I am not so surprised that you > bump into scalability problems. It should work smoother if the > configuration only contains a few nodes. I.e. use the IMM XML tools to > generate a cluster with two controllers and two payloads. Then rename > PL-3 to PL-998 and PL-4 to PL-999 in the generated imm.xml file. > > regards, > Anders Widell > > On 03/30/2016 11:20 AM, A V Mahesh wrote: >> Hi Anders Widell, >> >> Please find my comments >> >> On 3/29/2016 2:49 PM, Anders Widell wrote: >>> First of all, please double-check that you have applied the immtools >>> patch, since "999" was a magic number in immxml-nodegen. Secondly, >>> could you re-run your test with some smaller node-ids, like for >>> example PL-298 & PL-299 instead of PL-998 & PL-999? There are >>> scalability problems in OpenSAF when you have more than >>> approximately 300 nodes, and from what I have seen these scalability >>> problems seem to be there to some extent even if you just have a few >>> nodes, but the slot numbers are not in sequence (as in your example, >>> when you use slot id 998 & 999). TIPC seems to scale better than TCP. >> [AVM] I am able to bring up payload with slot number up to "999" >> with out any issue with both TCP & TIPC , >> just by tuning osaf-amfnd , osaf-amfd , osaf-immnd >> , osaf-immd TIME-OUT values >> in clc-cli start-up scripts ( >> /etc/opensaf/nodeinit.conf.controller & >> /etc/opensaf/nodeinit.conf.payload ) >> >> configuration : >> >> 1) 4 node setup : 2 Controllers & 2 payloads >> 2) xml configuration : ./immxml-clustersize -s 2 -p 999 >> 3) Controller slot number : SC-1 >> 4) Controller slot number : SC-2 >> 5) payload slot number : PL-998 >> 5) payload slot number : PL-999 >> >>> >>> This patch is only addressing the limit for the address space, which >>> was previously eight bits and thus limited to at most 255 nodes. The >>> scalability issues is a separate problem. >> >> [AVM] Bring up and done some switchover & fail-over i haven seen >> any problem of having slot number up to "999" for payload in both >> TCP & TIPC transport. >> so along with this patch we also need to give a new >> configuration option at build time ( some thing like /configure >> [OPTION]... [VAR=VALUE]... --numnodes=<777> ) >> based on the number of nodes we need generate the >> tuned TIME-OUT values in clc-cli start-up scripts ( >> /etc/opensaf/nodeinit.conf.controller & >> /etc/opensaf/nodeinit.conf.payload ) >> >> >> -AVM >>> >>> regards, >>> Anders Widell >>> >>> On 03/29/2016 10:21 AM, A V Mahesh wrote: >>>> Hi, >>>> >>>> Some times Standby never joins the cluster ( see below ) >>>> we will test TIPC and provide, is the behavior is consistent >>>> across the transports >>>> >>>> ============================================ >>>> # /etc/init.d/opensafd restart >>>> Mar 29 13:45:07 SC-2 opensafd: Stopping OpenSAF Services >>>> Stopping OpenSAF Services: Mar 29 13:45:07 SC-2 opensafd: OpenSAF >>>> services successfully stopped >>>> done >>>> Mar 29 13:45:07 SC-2 opensafd: Starting OpenSAF Services(5.0.M0 - ) >>>> (Using TCP) >>>> Starting OpenSAF Services (Using TCP):Mar 29 13:45:07 SC-2 >>>> osafdtmd[17441]: Started >>>> Mar 29 13:45:07 SC-2 osafrded[17458]: Started >>>> Mar 29 13:45:07 SC-2 osafdtmd[17441]: NO Established contact with >>>> 'SC-1' >>>> Mar 29 13:45:07 SC-2 osafrded[17458]: NO Peer rde@2010f has active >>>> state => Assigning Standby role to this node >>>> Mar 29 13:45:07 SC-2 osaffmd[17468]: Started >>>> Mar 29 13:45:07 SC-2 osafimmd[17478]: Started >>>> Mar 29 13:45:07 SC-2 osafimmnd[17489]: Started >>>> Mar 29 13:45:07 SC-2 osafimmnd[17489]: NO IMMD service is UP ... >>>> ScAbsenseAllowed?:0 introduced?:0 >>>> Mar 29 13:45:07 SC-2 osafimmnd[17489]: NO SERVER STATE: >>>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING >>>> Mar 29 13:45:08 SC-2 osafimmnd[17489]: NO SETTING COORD TO 0 CLOUD >>>> PROTO >>>> Mar 29 13:45:08 SC-2 osafimmnd[17489]: NO SERVER STATE: >>>> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING >>>> Mar 29 13:45:08 SC-2 osafimmnd[17489]: NO SERVER STATE: >>>> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING >>>> Mar 29 13:45:08 SC-2 osafimmnd[17489]: NO NODE STATE-> >>>> IMM_NODE_ISOLATED >>>> Mar 29 13:45:08 SC-2 osafdtmd[17441]: NO Established contact with >>>> 'PL-998' >>>> Mar 29 13:45:09 SC-2 osafimmd[17478]: NO SBY: Ruling epoch noted as:4 >>>> Mar 29 13:45:09 SC-2 osafimmd[17478]: NO IMMND coord at 2010f >>>> Mar 29 13:45:09 SC-2 osafimmnd[17489]: NO NODE STATE-> >>>> IMM_NODE_W_AVAILABLE >>>> Mar 29 13:45:09 SC-2 osafimmnd[17489]: NO SERVER STATE: >>>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT >>>> Mar 29 13:45:09 SC-2 osafdtmd[17441]: NO Established contact with >>>> 'PL-999' >>>> Mar 29 13:45:16 SC-2 osafimmd[17478]: NO SBY: New Epoch for IMMND >>>> process at node 2010f old epoch: 3 new epoch:4 >>>> Mar 29 13:45:16 SC-2 osafimmd[17478]: NO IMMND coord at 2010f >>>> Mar 29 13:45:16 SC-2 osafimmnd[17489]: NO NODE STATE-> >>>> IMM_NODE_FULLY_AVAILABLE 2729 >>>> Mar 29 13:45:16 SC-2 osafimmnd[17489]: NO RepositoryInitModeT is >>>> SA_IMM_INIT_FROM_FILE >>>> Mar 29 13:45:16 SC-2 osafimmnd[17489]: WA IMM Access Control mode >>>> is DISABLED! >>>> Mar 29 13:45:16 SC-2 osafimmnd[17489]: NO Epoch set to 4 in ImmModel >>>> Mar 29 13:45:16 SC-2 osafimmd[17478]: NO SBY: New Epoch for IMMND >>>> process at node 2020f old epoch: 0 new epoch:4 >>>> Mar 29 13:45:16 SC-2 osafimmnd[17489]: NO SERVER STATE: >>>> IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_READY >>>> Mar 29 13:45:16 SC-2 osafimmnd[17489]: NO ImmModel received >>>> scAbsenceAllowed 0 >>>> Mar 29 13:45:16 SC-2 osaflogd[17507]: Started >>>> Mar 29 13:45:16 SC-2 osaflogd[17507]: NO LOGSV_DATA_GROUPNAME not found >>>> Mar 29 13:45:16 SC-2 osaflogd[17507]: NO LOG root directory is: >>>> "/var/log/opensaf/saflog" >>>> Mar 29 13:45:16 SC-2 osaflogd[17507]: NO LOG data group is: "" >>>> Mar 29 13:45:16 SC-2 osaflogd[17507]: NO LGS_MBCSV_VERSION = 5 >>>> Mar 29 13:45:16 SC-2 osafntfd[17518]: Started >>>> Mar 29 13:45:16 SC-2 osafclmd[17532]: Started >>>> Mar 29 13:45:16 SC-2 osafclmna[17542]: Started >>>> Mar 29 13:45:16 SC-2 osafclmna[17542]: NO >>>> safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f >>>> Mar 29 13:45:16 SC-2 osafamfd[17551]: Started >>>> Mar 29 13:45:17 SC-2 osafimmnd[17489]: NO Implementer (applier) >>>> connected: 27 (@OpenSafImmReplicatorB) <6, 2020f> >>>> Mar 29 13:45:17 SC-2 osafntfimcnd[17524]: NO Started >>>> Mar 29 13:45:20 SC-2 osafimmd[17478]: NO SBY: Ruling epoch noted as:5 >>>> Mar 29 13:45:20 SC-2 osafimmd[17478]: NO IMMND coord at 2010f >>>> Mar 29 13:45:20 SC-2 osafimmnd[17489]: NO NODE STATE-> >>>> IMM_NODE_R_AVAILABLE >>>> Mar 29 13:45:28 SC-2 osafimmd[17478]: NO SBY: New Epoch for IMMND >>>> process at node 2010f old epoch: 4 new epoch:5 >>>> Mar 29 13:45:28 SC-2 osafimmd[17478]: NO IMMND coord at 2010f >>>> Mar 29 13:45:28 SC-2 osafimmnd[17489]: NO NODE STATE-> >>>> IMM_NODE_FULLY_AVAILABLE 18093 >>>> Mar 29 13:45:28 SC-2 osafimmnd[17489]: NO Epoch set to 5 in ImmModel >>>> Mar 29 13:45:28 SC-2 osafimmd[17478]: NO SBY: New Epoch for IMMND >>>> process at node 2020f old epoch: 4 new epoch:5 >>>> Mar 29 13:45:28 SC-2 osafimmd[17478]: NO SBY: New Epoch for IMMND >>>> process at node 2e60c old epoch: 0 new epoch:5 >>>> Mar 29 13:45:28 SC-2 osafimmd[17478]: NO SBY: New Epoch for IMMND >>>> process at node 2e70c old epoch: 0 new epoch:5 >>>> Mar 29 13:45:29 SC-2 osafimmnd[17489]: NO Implementer connected: 28 >>>> (MsgQueueService190220) <0, 2e70c> >>>> Mar 29 13:45:29 SC-2 osafimmnd[17489]: NO Implementer connected: 29 >>>> (MsgQueueService189964) <0, 2e60c> >>>> ============================================ >>>> >>>> On 3/29/2016 12:24 PM, A V Mahesh wrote: >>>>> >>>>> My configuration are : >>>>> >>>>> slot 1 with 'SC-1' >>>>> slot 2 with 'SC-2' >>>>> slot 998 with 'PL-998' >>>>> slot 999 with 'PL-999' >>>>> >>>>> -AVM >>>>> >>>>> On 3/29/2016 12:17 PM, A V Mahesh wrote: >>>>>> >>>>>> Hi Anders Widell, >>>>>> >>>>>> I an not able to bring up cluster with TCP ( i stared with TCP it >>>>>> self because it will touch different code flows in MDS ) >>>>>> and observed different time different behavior. >>>>>> >>>>>> Is this [#1613] tested with TCP ? or am i missing any thing ? >>>>>> >>>>>> My configuration are : >>>>>> >>>>>> slot 1 with 'SC-1' >>>>>> slot 1 with 'SC-2' >>>>>> slot 998 with 'PL-998' >>>>>> slot 999 with 'PL-999' >>>>>> >>>>>> ============================================================================================= >>>>>> Mar 29 11:58:57 SC-1 opensafd: Stopping OpenSAF Services >>>>>> Mar 29 11:58:58 SC-1 opensafd: OpenSAF services successfully stopped >>>>>> Mar 29 11:58:58 SC-1 opensafd: Starting OpenSAF Services(5.0.M0 - >>>>>> ) (Using TCP) >>>>>> Mar 29 11:58:58 SC-1 osafdtmd[3253]: Started >>>>>> Mar 29 11:58:58 SC-1 osafrded[3270]: Started >>>>>> Mar 29 11:59:00 SC-1 osafrded[3270]: NO No peer available => >>>>>> Setting Active role for this node >>>>>> Mar 29 11:59:00 SC-1 osaffmd[3282]: Started >>>>>> Mar 29 11:59:00 SC-1 osafimmd[3292]: Started >>>>>> Mar 29 11:59:00 SC-1 osafimmnd[3303]: Started >>>>>> Mar 29 11:59:00 SC-1 osafimmnd[3303]: NO IMMD service is UP ... >>>>>> ScAbsenseAllowed?:0 introduced?:0 >>>>>> Mar 29 11:59:00 SC-1 osafimmd[3292]: NO New IMMND process is on >>>>>> ACTIVE Controller at 2010f >>>>>> Mar 29 11:59:00 SC-1 osafimmd[3292]: NO First SC IMMND (OpenSAF >>>>>> 4.4 or later) attached 2010f >>>>>> Mar 29 11:59:00 SC-1 osafimmd[3292]: NO First IMMND at SC to >>>>>> attach is NOT configured for PBE >>>>>> Mar 29 11:59:00 SC-1 osafimmd[3292]: NO Attached Nodes:1 Accepted >>>>>> nodes:0 KnownVeteran:0 doReply:1 >>>>>> Mar 29 11:59:00 SC-1 osafimmd[3292]: NO First IMMND on SC found >>>>>> at 2010f this IMMD at 2010f. Cluster is loading, *not* 2PBE => >>>>>> designating that IMMND as coordinator >>>>>> Mar 29 11:59:00 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING >>>>>> Mar 29 11:59:00 SC-1 osafimmnd[3303]: NO This IMMND is now the >>>>>> NEW Coord >>>>>> Mar 29 11:59:00 SC-1 osafimmnd[3303]: NO SETTING COORD TO 1 CLOUD >>>>>> PROTO >>>>>> Mar 29 11:59:03 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING >>>>>> Mar 29 11:59:03 SC-1 osafimmd[3292]: NO Successfully announced >>>>>> loading. New ruling epoch:1 >>>>>> Mar 29 11:59:03 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_SERVER >>>>>> Mar 29 11:59:03 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_LOADING >>>>>> Mar 29 11:59:03 SC-1 osafimmloadd: NO Load starting >>>>>> Mar 29 11:59:03 SC-1 osafimmloadd: NO ***** Loading from XML file >>>>>> imm.xml at /etc/opensaf ***** >>>>>> Mar 29 11:59:03 SC-1 osafimmloadd: NO The class OpensafImm has >>>>>> been created since it was missing from the imm.xml load file >>>>>> Mar 29 11:59:03 SC-1 osafimmloadd: IN Class OsafImmPbeRt created >>>>>> Mar 29 11:59:03 SC-1 osafimmloadd: NO The class OsafImmPbeRt has >>>>>> been created since it was missing from the imm.xml load file >>>>>> Mar 29 11:59:08 SC-1 osafdtmd[3253]: NO Established contact with >>>>>> 'SC-2' >>>>>> Mar 29 11:59:08 SC-1 osafimmd[3292]: NO New IMMND process is on >>>>>> STANDBY Controller at 2020f >>>>>> Mar 29 11:59:08 SC-1 osafimmd[3292]: WA IMMND on controller (not >>>>>> currently coord) requests sync >>>>>> Mar 29 11:59:08 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 11:59:08 SC-1 osafimmd[3292]: NO Node 2020f request sync >>>>>> sync-pid:9043 epoch:0 >>>>>> Mar 29 11:59:12 SC-1 osafimmloadd: NO The >>>>>> opensafImm=opensafImm,safApp=safImmService object of class >>>>>> OpensafImm has been created since it was missing from the imm.xml >>>>>> load file >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Ccb 1 COMMITTED (IMMLOADER) >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Closing admin owner >>>>>> IMMLOADER id(1), loading of IMM done >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_FULLY_AVAILABLE 2729 >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO RepositoryInitModeT is >>>>>> SA_IMM_INIT_FROM_FILE >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: WA IMM Access Control mode >>>>>> is DISABLED! >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO opensafImmNostdFlags >>>>>> changed to: 0xf6 >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Epoch set to 2 in ImmModel >>>>>> Mar 29 11:59:12 SC-1 osafimmd[3292]: NO ACT: New Epoch for IMMND >>>>>> process at node 2010f old epoch: 1 new epoch:2 >>>>>> Mar 29 11:59:12 SC-1 osafimmd[3292]: NO Ruling epoch changed to:2 >>>>>> Mar 29 11:59:12 SC-1 osafimmloadd: NO Load ending normally >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_LOADING_SERVER --> IMM_SERVER_READY >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO ImmModel received >>>>>> scAbsenceAllowed 0 >>>>>> Mar 29 11:59:12 SC-1 osaflogd[3327]: Started >>>>>> Mar 29 11:59:12 SC-1 osaflogd[3327]: NO LOGSV_DATA_GROUPNAME not >>>>>> found >>>>>> Mar 29 11:59:12 SC-1 osaflogd[3327]: NO LOG root directory is: >>>>>> "/var/log/opensaf/saflog" >>>>>> Mar 29 11:59:12 SC-1 osaflogd[3327]: NO LOG data group is: "" >>>>>> Mar 29 11:59:12 SC-1 osaflogd[3327]: NO LGS_MBCSV_VERSION = 5 >>>>>> Mar 29 11:59:12 SC-1 osafdtmd[3253]: NO Established contact with >>>>>> 'PL-998' >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Implementer connected: 1 >>>>>> (safLogService) <2, 2010f> >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO implementer for class >>>>>> 'OpenSafLogConfig' is safLogService => class extent is safe. >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO implementer for class >>>>>> 'SaLogStreamConfig' is safLogService => class extent is safe. >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Implementer (applier) >>>>>> connected: 2 (@safLogService_appl) <11, 2010f> >>>>>> Mar 29 11:59:12 SC-1 osafntfd[3340]: Started >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Implementer (applier) >>>>>> connected: 3 (@OpenSafImmReplicatorA) <17, 2010f> >>>>>> Mar 29 11:59:12 SC-1 osafntfimcnd[3348]: NO Started >>>>>> Mar 29 11:59:12 SC-1 osafclmd[3354]: Started >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO Implementer connected: 4 >>>>>> (safClmService) <19, 2010f> >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO implementer for class >>>>>> 'SaClmNode' is safClmService => class extent is safe. >>>>>> Mar 29 11:59:12 SC-1 osafimmnd[3303]: NO implementer for class >>>>>> 'SaClmCluster' is safClmService => class extent is safe. >>>>>> Mar 29 11:59:12 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 11:59:12 SC-1 osafimmd[3292]: NO Node 2e60c request sync >>>>>> sync-pid:7734 epoch:0 >>>>>> Mar 29 11:59:12 SC-1 osafclmna[3364]: Started >>>>>> Mar 29 11:59:12 SC-1 osafclmna[3364]: NO >>>>>> safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f >>>>>> Mar 29 11:59:12 SC-1 osafamfd[3373]: Started >>>>>> Mar 29 11:59:16 SC-1 osafdtmd[3253]: NO Established contact with >>>>>> 'PL-999' >>>>>> Mar 29 11:59:16 SC-1 osafimmd[3292]: NO Extended intro from node >>>>>> 2e70c >>>>>> Mar 29 11:59:16 SC-1 osafimmd[3292]: WA PBE not configured at >>>>>> first attached SC-immnd, but Pbe is configured for immnd at 2e70c >>>>>> - possible upgrade from pre 4.4 >>>>>> Mar 29 11:59:16 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 11:59:16 SC-1 osafimmd[3292]: NO Node 2e70c request sync >>>>>> sync-pid:7615 epoch:0 >>>>>> Mar 29 11:59:17 SC-1 osafimmnd[3303]: NO Announce sync, epoch:3 >>>>>> Mar 29 11:59:17 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER >>>>>> Mar 29 11:59:17 SC-1 osafimmd[3292]: NO Successfully announced >>>>>> sync. New ruling epoch:3 >>>>>> Mar 29 11:59:17 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_R_AVAILABLE >>>>>> Mar 29 11:59:17 SC-1 osafimmloadd: NO Sync starting >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: WA STOPPING sync process >>>>>> pid 3386 after five minutes >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: ER SYNC APPARENTLY FAILED >>>>>> status:0 >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: NO -SERVER STATE: >>>>>> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_FULLY_AVAILABLE (2624) >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: NO Epoch set to 3 in ImmModel >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: NO ACT: New Epoch for IMMND >>>>>> process at node 2010f old epoch: 2 new epoch:3 >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: NO Coord broadcasting >>>>>> ABORT_SYNC, epoch:3 >>>>>> Mar 29 12:04:18 SC-1 osafimmnd[3303]: WA IMMND - Client Node Get >>>>>> Failed for cli_hdl:708669735183 >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: WA Successfully aborted >>>>>> sync. Epoch:3 >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: WA IMMND on controller (not >>>>>> currently coord) requests sync >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: NO Node 2020f request sync >>>>>> sync-pid:9043 epoch:0 >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: NO Node 2e60c request sync >>>>>> sync-pid:7734 epoch:0 >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 12:04:18 SC-1 osafimmd[3292]: NO Node 2e70c request sync >>>>>> sync-pid:7615 epoch:0 >>>>>> Mar 29 12:04:19 SC-1 osafimmnd[3303]: WA MDS Send Failed >>>>>> Mar 29 12:04:19 SC-1 osafimmnd[3303]: WA Error code 2 returned >>>>>> for message type 17 - ignoring >>>>>> Mar 29 12:04:19 SC-1 osafimmnd[3303]: WA Global ABORT SYNC >>>>>> received for epoch 3 >>>>>> Mar 29 12:04:22 SC-1 osafimmnd[3303]: NO Announce sync, epoch:4 >>>>>> Mar 29 12:04:22 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER >>>>>> Mar 29 12:04:22 SC-1 osafimmd[3292]: NO Successfully announced >>>>>> sync. New ruling epoch:4 >>>>>> Mar 29 12:04:22 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_R_AVAILABLE >>>>>> Mar 29 12:04:22 SC-1 osafimmloadd: NO Sync starting >>>>>> Mar 29 12:07:08 SC-1 osafimmnd[3303]: NO Global discard node >>>>>> received for nodeId:2020f pid:9043 >>>>>> Mar 29 12:07:12 SC-1 osafimmnd[3303]: NO Global discard node >>>>>> received for nodeId:2e60c pid:7734 >>>>>> Mar 29 12:07:16 SC-1 osafimmnd[3303]: NO Global discard node >>>>>> received for nodeId:2e70c pid:7615 >>>>>> Mar 29 12:07:23 SC-1 osafimmd[3292]: NO New IMMND process is on >>>>>> STANDBY Controller at 2020f >>>>>> Mar 29 12:07:23 SC-1 osafimmd[3292]: WA PBE is configured at >>>>>> first attached SC-immnd, but no Pbe file is configured for immnd >>>>>> at node 2020f - rejecting node >>>>>> Mar 29 12:07:23 SC-1 osafimmd[3292]: WA Error returned from >>>>>> processing message err:2 msg-type:2 >>>>>> Mar 29 12:07:23 SC-1 osafimmnd[3303]: NO Global discard node >>>>>> received for nodeId:2020f pid:9670 >>>>>> Mar 29 12:07:27 SC-1 osafimmd[3292]: WA PBE is configured at >>>>>> first attached SC-immnd, but no Pbe file is configured for immnd >>>>>> at node 2e60c - rejecting node >>>>>> Mar 29 12:07:27 SC-1 osafimmd[3292]: WA Error returned from >>>>>> processing message err:2 msg-type:2 >>>>>> Mar 29 12:07:27 SC-1 osafimmnd[3303]: NO Global discard node >>>>>> received for nodeId:2e60c pid:8359 >>>>>> Mar 29 12:07:29 SC-1 osafimmloadd: IN Synced 33170 objects in total >>>>>> Mar 29 12:07:29 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_FULLY_AVAILABLE 17518 >>>>>> Mar 29 12:07:29 SC-1 osafimmloadd: NO Sync ending normally >>>>>> Mar 29 12:07:29 SC-1 osafimmnd[3303]: NO Epoch set to 4 in ImmModel >>>>>> Mar 29 12:07:29 SC-1 osafimmd[3292]: NO ACT: New Epoch for IMMND >>>>>> process at node 2010f old epoch: 3 new epoch:4 >>>>>> Mar 29 12:07:29 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY >>>>>> Mar 29 12:07:31 SC-1 osafimmd[3292]: NO Extended intro from node >>>>>> 2e70c >>>>>> Mar 29 12:07:31 SC-1 osafimmd[3292]: WA No coordinator IMMND >>>>>> known (case B) - ignoring sync request >>>>>> Mar 29 12:07:31 SC-1 osafimmd[3292]: NO Node 2e70c request sync >>>>>> sync-pid:8240 epoch:0 >>>>>> Mar 29 12:07:33 SC-1 osafimmnd[3303]: NO Announce sync, epoch:5 >>>>>> Mar 29 12:07:33 SC-1 osafimmnd[3303]: NO SERVER STATE: >>>>>> IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER >>>>>> Mar 29 12:07:33 SC-1 osafimmd[3292]: NO Successfully announced >>>>>> sync. New ruling epoch:5 >>>>>> Mar 29 12:07:33 SC-1 osafimmnd[3303]: NO NODE STATE-> >>>>>> IMM_NODE_R_AVAILABLE >>>>>> Mar 29 12:07:33 SC-1 osafimmloadd: NO Sync starting >>>>>> Mar 29 12:07:38 SC-1 osafimmd[3292]: NO New IMMND process is on >>>>>> STANDBY Controller at 2020f >>>>>> Mar 29 12:07:38 SC-1 osafimmd[3292]: WA PBE is configured at >>>>>> first attached SC-immnd, but no Pbe file is configured for immnd >>>>>> at node 2020f - rejecting node >>>>>> Mar 29 12:07:38 SC-1 osafimmd[3292]: WA Error returned from >>>>>> processing message err:2 msg-type:2 >>>>>> Mar 29 12:07:38 SC-1 osafimmnd[3303]: NO Global discard node >>>>>> received for nodeId:2020f pid:9709 >>>>>> Mar 29 12:07:38 SC-1 osafimmd[3292]: WA IMMD lost contact with >>>>>> peer IMMD (NCSMDS_RED_DOWN) >>>>>> Mar 29 12:07:39 SC-1 osafdtmd[3253]: NO Lost contact with 'SC-2' >>>>>> Mar 29 12:07:39 SC-1 osaffmd[3282]: NO Node Down event for node >>>>>> id 2020f: >>>>>> Mar 29 12:07:39 SC-1 osaffmd[3282]: NO Current role: ACTIVE >>>>>> Mar 29 12:07:39 SC-1 osaffmd[3282]: Rebooting OpenSAF NodeId = 0 >>>>>> EE Name = No EE Mapped, Reason: Failover occurred, but this node >>>>>> is not yet ready, OwnNodeId = 131343, SupervisionTime = 60 >>>>>> Mar 29 12:07:39 SC-1 opensaf_reboot: Rebooting local node; timeout=60 >>>>>> ============================================================================================= >>>>>> >>>>>> -AVM >>>>>> >>>>>> On 3/18/2016 9:38 PM, Anders Widell wrote: >>>>>>> osaf/libs/core/mds/include/mds_dt.h | 26 +++++++++++++++++--------- >>>>>>> osaf/libs/core/mds/mds_c_db.c | 26 ++++++++++++-------------- >>>>>>> 2 files changed, 29 insertions(+), 23 deletions(-) >>>>>>> >>>>>>> >>>>>>> Support up to 4095 nodes in the flat addressing scheme for TIPC, by >>>>>>> encoding the >>>>>>> slot ID in the lower eight bits and the ones' complement of the subslot >>>>>>> ID in >>>>>>> bits 8 to 11 in the node identifier of the TIPC address. The reason for >>>>>>> taking >>>>>>> the ones' complement of the subslot ID is backwards compatibility with >>>>>>> existing >>>>>>> installations, so that this enhancement can be upgraded in-service. >>>>>>> >>>>>>> diff --git a/osaf/libs/core/mds/include/mds_dt.h >>>>>>> b/osaf/libs/core/mds/include/mds_dt.h >>>>>>> --- a/osaf/libs/core/mds/include/mds_dt.h >>>>>>> +++ b/osaf/libs/core/mds/include/mds_dt.h >>>>>>> @@ -237,7 +237,8 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT >>>>>>> >>>>>>> /* >>>>>>> * In the default flat addressing scheme, TIPC node addresses looks >>>>>>> like >>>>>>> - * 1.1.1, 1.1.2 etc. >>>>>>> + * 1.1.1, 1.1.2 etc. The ones' complement of the subslot ID is shifted >>>>>>> 8 >>>>>>> + * bits up and the slot ID is added in the 8 LSB. >>>>>>> * In the non flat (old/legacy) addressing scheme TIPC addresses >>>>>>> looks like >>>>>>> * 1.1.31, 1.1.47. The slot ID is shifted 4 bits up and subslot ID is >>>>>>> added >>>>>>> * in the 4 LSB. >>>>>>> @@ -248,13 +249,20 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT >>>>>>> >>>>>>> #if (MDS_USE_SUBSLOT_ID == 0) >>>>>>> #define MDS_TIPC_NODE_ID_MIN 0x01001001 >>>>>>> -#define MDS_TIPC_NODE_ID_MAX 0x010010ff >>>>>>> -#define MDS_NCS_NODE_ID_MIN (MDS_NCS_CHASSIS_ID|0x0000010f) >>>>>>> -#define MDS_NCS_NODE_ID_MAX (MDS_NCS_CHASSIS_ID|0x0000ff0f) >>>>>>> -#define m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(node) \ >>>>>>> - (NODE_ID)( MDS_NCS_CHASSIS_ID | (((node)&0xff)<<8) | (0xf)) >>>>>>> -#define m_MDS_GET_TIPC_NODE_ID_FROM_NCS_NODE_ID(node) \ >>>>>>> - (NODE_ID)( MDS_TIPC_COMMON_ID | (((node)&0xff00)>>8) ) >>>>>>> +#define MDS_TIPC_NODE_ID_MAX 0x01001fff >>>>>>> +static inline NODE_ID m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(NODE_ID >>>>>>> node) { >>>>>>> + return MDS_NCS_CHASSIS_ID | ((node & 0xff) << 8) | (((node & >>>>>>> 0xf00) >> 8) ^ 0xf); >>>>>>> +} >>>>>>> +static inline NODE_ID m_MDS_GET_TIPC_NODE_ID_FROM_NCS_NODE_ID(NODE_ID >>>>>>> node) { >>>>>>> + return MDS_TIPC_COMMON_ID | ((node & 0xff00) >> 8) | (((node & >>>>>>> 0xf) ^ 0xf) << 8); >>>>>>> +} >>>>>>> +static inline uint32_t m_MDS_CHECK_TIPC_NODE_ID_RANGE(NODE_ID node) { >>>>>>> + return node < MDS_TIPC_NODE_ID_MIN || node > >>>>>>> MDS_TIPC_NODE_ID_MAX ? >>>>>>> + NCSCC_RC_FAILURE : NCSCC_RC_SUCCESS; >>>>>>> +} >>>>>>> +static inline uint32_t m_MDS_CHECK_NCS_NODE_ID_RANGE(NODE_ID node) { >>>>>>> + return >>>>>>> m_MDS_CHECK_TIPC_NODE_ID_RANGE(m_MDS_GET_TIPC_NODE_ID_FROM_NCS_NODE_ID(node)); >>>>>>> +} >>>>>>> #else >>>>>>> #define MDS_TIPC_NODE_ID_MIN 0x01001001 >>>>>>> #define MDS_TIPC_NODE_ID_MAX 0x0100110f >>>>>>> @@ -264,10 +272,10 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT >>>>>>> (NODE_ID)( MDS_NCS_CHASSIS_ID | ((node)&0xf) | >>>>>>> (((node)&0xff0)<<4)) >>>>>>> #define m_MDS_GET_TIPC_NODE_ID_FROM_NCS_NODE_ID(node) \ >>>>>>> (NODE_ID)( MDS_TIPC_COMMON_ID | (((node)&0xff00)>>4) | >>>>>>> ((node)&0xf) ) >>>>>>> -#endif >>>>>>> >>>>>>> #define m_MDS_CHECK_TIPC_NODE_ID_RANGE(node) >>>>>>> (((((node)<MDS_TIPC_NODE_ID_MIN)||((node)>MDS_TIPC_NODE_ID_MAX))?NCSCC_RC_FAILURE:NCSCC_RC_SUCCESS)) >>>>>>> #define m_MDS_CHECK_NCS_NODE_ID_RANGE(node) >>>>>>> (((((node)<MDS_NCS_NODE_ID_MIN)||((node)>MDS_NCS_NODE_ID_MAX))?NCSCC_RC_FAILURE:NCSCC_RC_SUCCESS)) >>>>>>> +#endif >>>>>>> >>>>>>> /* ******************************************** */ >>>>>>> /* ******************************************** */ >>>>>>> diff --git a/osaf/libs/core/mds/mds_c_db.c >>>>>>> b/osaf/libs/core/mds/mds_c_db.c >>>>>>> --- a/osaf/libs/core/mds/mds_c_db.c >>>>>>> +++ b/osaf/libs/core/mds/mds_c_db.c >>>>>>> @@ -37,14 +37,13 @@ void get_adest_details(MDS_DEST adest, c >>>>>>> char *token, *saveptr; >>>>>>> struct stat s; >>>>>>> uint32_t process_id = 0; >>>>>>> - NCS_PHY_SLOT_ID phy_slot; >>>>>>> - NCS_SUB_SLOT_ID sub_slot; >>>>>>> + SlotSubslotId slot_subslot_id; >>>>>>> char pid_path[1024]; >>>>>>> char *pid_name = NULL; >>>>>>> char process_name[MDS_MAX_PROCESS_NAME_LEN]; >>>>>>> bool remote = false; >>>>>>> >>>>>>> - >>>>>>> m_NCS_GET_PHYINFO_FROM_NODE_ID(m_NCS_NODE_ID_FROM_MDS_DEST(adest), >>>>>>> NULL, &phy_slot, &sub_slot); >>>>>>> + slot_subslot_id = >>>>>>> GetSlotSubslotIdFromNodeId(m_NCS_NODE_ID_FROM_MDS_DEST(adest)); >>>>>>> >>>>>>> if (!tipc_mode_enabled) { >>>>>>> process_id = m_MDS_GET_PROCESS_ID_FROM_ADEST(adest); >>>>>>> @@ -111,11 +110,11 @@ void get_adest_details(MDS_DEST adest, c >>>>>>> } >>>>>>> >>>>>>> if (remote == true) >>>>>>> - snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<rem_nodeid[%d]:%s>", >>>>>>> - phy_slot, process_name); >>>>>>> + snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<rem_nodeid[%u]:%s>", >>>>>>> + slot_subslot_id, process_name); >>>>>>> else >>>>>>> - snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<nodeid[%d]:%s>", >>>>>>> - phy_slot, process_name); >>>>>>> + snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<nodeid[%u]:%s>", >>>>>>> + slot_subslot_id, process_name); >>>>>>> >>>>>>> m_MDS_LOG_DBG("MDS:DB: adest_details: %s ", adest_details); >>>>>>> m_MDS_LEAVE(); >>>>>>> @@ -129,8 +128,7 @@ void get_adest_details(MDS_DEST adest, c >>>>>>> void get_subtn_adest_details(MDS_PWE_HDL pwe_hdl, MDS_SVC_ID svc_id, >>>>>>> MDS_DEST adest, char* adest_details) >>>>>>> { >>>>>>> uint32_t process_id = 0; >>>>>>> - NCS_PHY_SLOT_ID phy_slot; >>>>>>> - NCS_SUB_SLOT_ID sub_slot; >>>>>>> + SlotSubslotId slot_subslot_id; >>>>>>> char process_name[MDS_MAX_PROCESS_NAME_LEN]; >>>>>>> bool remote = false; >>>>>>> MDS_SVC_INFO *svc_info = NULL; >>>>>>> @@ -139,7 +137,7 @@ void get_subtn_adest_details(MDS_PWE_HDL >>>>>>> char *pid_name = NULL; >>>>>>> struct stat s; >>>>>>> >>>>>>> - >>>>>>> m_NCS_GET_PHYINFO_FROM_NODE_ID(m_NCS_NODE_ID_FROM_MDS_DEST(adest), >>>>>>> NULL, &phy_slot, &sub_slot); >>>>>>> + slot_subslot_id = >>>>>>> GetSlotSubslotIdFromNodeId(m_NCS_NODE_ID_FROM_MDS_DEST(adest)); >>>>>>> process_id = m_MDS_GET_PROCESS_ID_FROM_ADEST(adest); >>>>>>> >>>>>>> if (NCSCC_RC_SUCCESS == mds_mcm_check_intranode(adest)) { >>>>>>> @@ -185,11 +183,11 @@ void get_subtn_adest_details(MDS_PWE_HDL >>>>>>> } >>>>>>> >>>>>>> if (remote == true) >>>>>>> - snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<rem_node[%d]:%s>", >>>>>>> - phy_slot, process_name); >>>>>>> + snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<rem_node[%u]:%s>", >>>>>>> + slot_subslot_id, process_name); >>>>>>> else >>>>>>> - snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<node[%d]:%s>", >>>>>>> - phy_slot, process_name); >>>>>>> + snprintf(adest_details, MDS_MAX_PROCESS_NAME_LEN, >>>>>>> "<node[%u]:%s>", >>>>>>> + slot_subslot_id, process_name); >>>>>>> done: >>>>>>> m_MDS_LOG_DBG("MDS:DB: adest_details: %s ", adest_details); >>>>>>> m_MDS_LEAVE(); >>>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel