Great so you found a stupid non documented implementation limit. Those are not 
OK...
This limit is worth a ticket and should be fixed somehow.

But you indicated a problem also when using TIPC?
We only use MDS/TIPC and don't have any issues regarding scalability (too much 
larger cluster than this).

Thanks,
Hans

-----Original Message-----
From: Shu Wang [mailto:[email protected]] 
Sent: den 22 oktober 2014 00:27
To: Neelakanta Reddy; [email protected]
Cc: Lisa Ann Lentz-Liddell
Subject: Re: [users] Max number of SUs/Components in a cluster?

When testing the cluster for max size what configuration do you use?  Are we 
able to get a copy of that to understand the look of the cluster?
What is the max number of :
 - service groups per cluster tested?
 - service units per cluster tested?
 - components per cluster tested?
 - service units on a node tested?
 - components on a node tested?

While continuing to investigate the below problem, we found :
  On a controller node, when the component count for that node reaches 78, 
opensaf fails
  On a payload node, when the component count for that node reaches 90, opensaf 
fails

There are :
  On a controller node, there are 22 opensaf components
  On a payload node, there are 10 opensaf components

Total components running on a node:
Controller node:   78 + 22 = 100
Payload node:   90 + 10 = 100

We looked through the OpenSAF code for defines that had a value of 100.  In 
osaf/services/infrastructure/dtms/dtm

dtm_intra.c:#define DTM_INTRANODE_MAX_PROCESSES 100

We changed that value to 115 and retried our test.  Increasing to 115 allowed 
the total component count to go above 100 and of course then OpenSAF failed 
when we unlocked the SU that pushed the component count to over 115.

Does anyone know of any ill effect if we change the DTM_INTRANODE_MAX_PROCESSES 
to a larger value e.g. 250?  The dtm_intra.c is using it for the number of 
items in 2 different arrays and that is contained within that .c.

We tried this change only with OpenSAF 4.4 and TCP.

Thanks.

Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com Proven 
Partner to Communications Service Providers


-----Original Message-----
From: Neelakanta Reddy [mailto:[email protected]]
Sent: Tuesday, October 21, 2014 2:55 AM
To: [email protected]; Shu Wang
Cc: Lisa Ann Lentz-Liddell
Subject: Re: [users] Max number of SUs/Components in a cluster?

Hi ,

Comments inline.

/Neel.

On Tuesday 21 October 2014 04:41 AM, Shu Wang wrote:
> The IMM documentation states:
>
> Applications that intend to add their own imm classes and imm objects need to 
> be aware that capacity is limited. OpenSAF4.1 has been system tested with up 
> to 350 000 objects of average size 300 bytes. It is not advisable to generate 
> larger imm-contents than that.

>
> What is the definition of an object?
The 300 bytes is the size of each object, (which is the accumulated size of the 
assigned attributes for a class). The size of the object depends upon the 
number of  attributes and different type of attributes of a class.
>
> We have a cluster defined across 6 nodes with a total of 12 SGs, a total of 
> 64 SUs, and a total of 292 components.  We can start OpenSAF successfully 
> across the nodes and unlock all SUs with no problems.
>
> The cluster definition was increased to 6 nodes, 15 SGs, a total of 56 SUs, 
> and a total of 388 components.  We are able to start OpenSAF on all nodes 
> successfully but as soon as a little over 300 components have been unlocked, 
> things start to fall apart.  The opensaf processes start to die and the 
> cluster is no longer usable.
The SU's, SG's and components are internally objects for IMM. Incresing to "6 
nodes, 15 SGs, a total of 56 SUs, and a total of 388 components"
should not have caused any IMM related problems.

> Oct 19 16:29:06 colobus osafamfnd[3649]: NO Assigned 
> 'safSi=amfSDFSISI1.3,safApp=olcApp' ACTIVE to 
> 'safSu=amfSDFSISU1.4,safSg=amfSDFSISG1,safApp=olcApp'
> Oct 19 16:29:07 colobus osafamfnd[3649]: NO 
> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>  faulted due to 'avaDown' : Recovery is 'componentRestart'
> Oct 19 16:29:09 colobus osafamfnd[3649]: NO 
> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>  faulted due to 'avaDown' : Recovery is 'componentRestart'
> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send FAILED 
> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send to ntfa 
> failed rc: 2 Oct 19 16:29:09 colobus osafntfd[3587]: ER 
> ntfs_mds_msg_send FAILED Oct 19 16:29:09 colobus osafntfd[3587]: ER 
> ntfs_mds_msg_send FAILED ....
> Oct 19 16:33:24 colobus ntpd_initres[2608]: host name not found:
> 0.rhel.pool.ntp.org ....
> Oct 19 16:35:18 colobus osafimmnd[3549]: NO Implementer disconnected
> 14 <0, 22b0f> (MsgQueueService142095) Oct 19 16:35:19 colobus
> osafclmd[3602]: NO proc_initialize_msg: send failed.
> dest:22b0f00007a77 Oct 19 16:35:19 colobus osafimmnd[3549]: NO Global discard 
> node received for nodeId:22b0f pid:31333 Oct 19 16:35:20 colobus 
> osafamfnd[3649]: NO 
> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>  faulted due to 'avaDown' : Recovery is 'componentRestart'
> Oct 19 16:35:22 colobus osafamfnd[3649]: NO 
> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>  faulted due to 'avaDown' : Recovery is 'componentRestart'
> Oct 19 16:35:24 colobus osafamfnd[3649]: NO 
> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>  faulted due to 'avaDown' : Recovery is 'componentRestart'
> Oct 19 16:35:26 colobus osafimmnd[3549]: NO Implementer connected: 16
> (MsgQueueService142095) <12021, 2280f> Oct 19 16:35:26 colobus
> osafimmnd[3549]: NO Implementer locally disconnected. Marking it as 
> doomed 16 <12021, 2280f> (MsgQueueService142095) Oct 19 16:35:26 
> colobus osafimmnd[3549]: NO Implementer disconnected 16 <12021, 2280f>
> (MsgQueueService142095) Oct 19 16:35:26 colobus osafamfd[3631]: NO 
> Node 'bedrazzas.monkey.lab' left the cluster
>
> Have we reached a max of the number of SUs/Components that can be started 
> within a single OpenSAF cluster?
OpenSAF 4.4/4.5 is tested for 70 nodes.
>
> We have tried the above with OpenSAF 4.4 and OpenSAF 4.5 and with both TCP 
> and TIPC, all fail similarly.
This should have been an application problem or adjustments related to 
timeouts. Please share the syslog messages of all the nodes.
> Thank you!
>
> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| 
> www.NetCracker.com Proven Partner to Communications Service Providers
>
>
>
>
> ________________________________
> The information transmitted herein is intended only for the person or entity 
> to which it is addressed and may contain confidential, proprietary and/or 
> privileged material. Any review, retransmission, dissemination or other use 
> of, or taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error, please contact the sender and delete the material from any 
> computer.
> ----------------------------------------------------------------------
> -------- Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users



________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to