Hi ,

There is a limitation DTM_INTRANODE_MAX_PROCESSES in present opensaf.
There is an enhancement  ticket 
opened.(https://sourceforge.net/p/opensaf/tickets/1187/)
Based on the discussion limit can be increased or can be configurable.

/Neel.




On Wednesday 22 October 2014 03:56 AM, Shu Wang wrote:
> When testing the cluster for max size what configuration do you use?  Are we 
> able to get a copy of that to understand the look of the cluster?
> What is the max number of :
>   - service groups per cluster tested?
>   - service units per cluster tested?
>   - components per cluster tested?
>   - service units on a node tested?
>   - components on a node tested?
>
> While continuing to investigate the below problem, we found :
>    On a controller node, when the component count for that node reaches 78, 
> opensaf fails
>    On a payload node, when the component count for that node reaches 90, 
> opensaf fails
>
> There are :
>    On a controller node, there are 22 opensaf components
>    On a payload node, there are 10 opensaf components
>
> Total components running on a node:
> Controller node:   78 + 22 = 100
> Payload node:   90 + 10 = 100
>
> We looked through the OpenSAF code for defines that had a value of 100.  In 
> osaf/services/infrastructure/dtms/dtm
>
> dtm_intra.c:#define DTM_INTRANODE_MAX_PROCESSES 100
>
> We changed that value to 115 and retried our test.  Increasing to 115 allowed 
> the total component count to go above 100 and of course then OpenSAF failed 
> when we unlocked the SU that pushed the component count to over 115.
>
> Does anyone know of any ill effect if we change the 
> DTM_INTRANODE_MAX_PROCESSES to a larger value e.g. 250?  The dtm_intra.c is 
> using it for the number of items in 2 different arrays and that is contained 
> within that .c.
>
> We tried this change only with OpenSAF 4.4 and TCP.
>
> Thanks.
>
> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com
> Proven Partner to Communications Service Providers
>
>
> -----Original Message-----
> From: Neelakanta Reddy [mailto:[email protected]]
> Sent: Tuesday, October 21, 2014 2:55 AM
> To: [email protected]; Shu Wang
> Cc: Lisa Ann Lentz-Liddell
> Subject: Re: [users] Max number of SUs/Components in a cluster?
>
> Hi ,
>
> Comments inline.
>
> /Neel.
>
> On Tuesday 21 October 2014 04:41 AM, Shu Wang wrote:
>> The IMM documentation states:
>>
>> Applications that intend to add their own imm classes and imm objects need 
>> to be aware that capacity is limited. OpenSAF4.1 has been system tested with 
>> up to 350 000 objects of average size 300 bytes. It is not advisable to 
>> generate larger imm-contents than that.
>> What is the definition of an object?
> The 300 bytes is the size of each object, (which is the accumulated size of 
> the assigned attributes for a class). The size of the object depends upon the 
> number of  attributes and different type of attributes of a class.
>> We have a cluster defined across 6 nodes with a total of 12 SGs, a total of 
>> 64 SUs, and a total of 292 components.  We can start OpenSAF successfully 
>> across the nodes and unlock all SUs with no problems.
>>
>> The cluster definition was increased to 6 nodes, 15 SGs, a total of 56 SUs, 
>> and a total of 388 components.  We are able to start OpenSAF on all nodes 
>> successfully but as soon as a little over 300 components have been unlocked, 
>> things start to fall apart.  The opensaf processes start to die and the 
>> cluster is no longer usable.
> The SU's, SG's and components are internally objects for IMM. Incresing to "6 
> nodes, 15 SGs, a total of 56 SUs, and a total of 388 components"
> should not have caused any IMM related problems.
>
>> Oct 19 16:29:06 colobus osafamfnd[3649]: NO Assigned 
>> 'safSi=amfSDFSISI1.3,safApp=olcApp' ACTIVE to 
>> 'safSu=amfSDFSISU1.4,safSg=amfSDFSISG1,safApp=olcApp'
>> Oct 19 16:29:07 colobus osafamfnd[3649]: NO 
>> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>>  faulted due to 'avaDown' : Recovery is 'componentRestart'
>> Oct 19 16:29:09 colobus osafamfnd[3649]: NO 
>> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>>  faulted due to 'avaDown' : Recovery is 'componentRestart'
>> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send FAILED
>> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send to ntfa
>> failed rc: 2 Oct 19 16:29:09 colobus osafntfd[3587]: ER
>> ntfs_mds_msg_send FAILED Oct 19 16:29:09 colobus osafntfd[3587]: ER
>> ntfs_mds_msg_send FAILED ....
>> Oct 19 16:33:24 colobus ntpd_initres[2608]: host name not found:
>> 0.rhel.pool.ntp.org ....
>> Oct 19 16:35:18 colobus osafimmnd[3549]: NO Implementer disconnected
>> 14 <0, 22b0f> (MsgQueueService142095) Oct 19 16:35:19 colobus
>> osafclmd[3602]: NO proc_initialize_msg: send failed.
>> dest:22b0f00007a77 Oct 19 16:35:19 colobus osafimmnd[3549]: NO Global 
>> discard node received for nodeId:22b0f pid:31333 Oct 19 16:35:20 colobus 
>> osafamfnd[3649]: NO 
>> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>>  faulted due to 'avaDown' : Recovery is 'componentRestart'
>> Oct 19 16:35:22 colobus osafamfnd[3649]: NO 
>> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>>  faulted due to 'avaDown' : Recovery is 'componentRestart'
>> Oct 19 16:35:24 colobus osafamfnd[3649]: NO 
>> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp'
>>  faulted due to 'avaDown' : Recovery is 'componentRestart'
>> Oct 19 16:35:26 colobus osafimmnd[3549]: NO Implementer connected: 16
>> (MsgQueueService142095) <12021, 2280f> Oct 19 16:35:26 colobus
>> osafimmnd[3549]: NO Implementer locally disconnected. Marking it as
>> doomed 16 <12021, 2280f> (MsgQueueService142095) Oct 19 16:35:26
>> colobus osafimmnd[3549]: NO Implementer disconnected 16 <12021, 2280f>
>> (MsgQueueService142095) Oct 19 16:35:26 colobus osafamfd[3631]: NO
>> Node 'bedrazzas.monkey.lab' left the cluster
>>
>> Have we reached a max of the number of SUs/Components that can be started 
>> within a single OpenSAF cluster?
> OpenSAF 4.4/4.5 is tested for 70 nodes.
>> We have tried the above with OpenSAF 4.4 and OpenSAF 4.5 and with both TCP 
>> and TIPC, all fail similarly.
> This should have been an application problem or adjustments related to 
> timeouts. Please share the syslog messages of all the nodes.
>> Thank you!
>>
>> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917|
>> www.NetCracker.com Proven Partner to Communications Service Providers
>>
>>
>>
>>
>> ________________________________
>> The information transmitted herein is intended only for the person or entity 
>> to which it is addressed and may contain confidential, proprietary and/or 
>> privileged material. Any review, retransmission, dissemination or other use 
>> of, or taking of any action in reliance upon, this information by persons or 
>> entities other than the intended recipient is prohibited. If you received 
>> this in error, please contact the sender and delete the material from any 
>> computer.
>> ----------------------------------------------------------------------
>> -------- Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>> http://p.sf.net/sfu/Zoho
>> _______________________________________________
>> Opensaf-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
>
> ________________________________
> The information transmitted herein is intended only for the person or entity 
> to which it is addressed and may contain confidential, proprietary and/or 
> privileged material. Any review, retransmission, dissemination or other use 
> of, or taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error, please contact the sender and delete the material from any 
> computer.


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to