I tried with sufficient drive space but got same result, neither of the two SCs 
can be promoted to be controller until the payload reboot.

I also checked the network link between SC and payload, they can PING each 
other when this issue happened. I suspect too the problem is caused by 
IMMD/IMMND link among those nodes, but don't know how to prove it.

From: Neelakanta Reddy [mailto:[email protected]]
Sent: Monday, October 10, 2016 8:39 PM
To: Jianfeng Dong <[email protected]>; [email protected]
Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after enable 
"headless cluster" feature

Hi,

Once after the "Headless" if any of the controller started then the IMMND from 
the payaload will send the intro message to IMMD.
Looks like this did not happen, the following is the log from the payload:

2016-10-10T11:09:18.507851+08:00 pld0101 osafimmnd[3141]: message repeated 2 
times: [ logtrace: write failed, No space left on device]
2016-10-10T11:09:18.507883+08:00 pld0101 osafimmnd[3141]: NO Re-introduce-me 
highestProcessed:23839 highestReceived:23839
2016-10-10T11:09:18.508011+08:00 pld0101 osafimmnd[3141]: logtrace: write 
failed, No space left on device
2016-10-10T11:09:18.508129+08:00 pld0101 osafimmnd[3141]: logtrace: write 
failed, No space left on device
2016-10-10T11:09:18.508501+08:00 pld0101 osafimmnd[3141]: WA MDS Send Failed to 
service:IMMD rc:2


Retry, again with the sufficient space in payload.

/Neel.

On 2016/10/10 03:59 PM, Jianfeng Dong wrote:

Hi,



For several years we use OpenSAF(4.5.2 now) to provide HA service in our 
product(including 2 SC and several payload cards), but our customer keep on 
requiring that it's better to do NOT reboot payload card even if both SC reload 
or hang.



We just knew that the new release 5.0.0 has provided this feature(i.e. 
"headless cluster"), so we installed 5.0.0 into our product and enable 
"headless" feature by setting "IMMSV_SC_ABSENCE_ALLOWED" to 900 seconds. After 
installation we found it worked fine, our system with new OpenSAF release can 
start to run successfully, all SC and payload cards can be "UP", and payload 
card will NOT reboot immediately after we reload both SC.



However we got a problem that, neither of two SC can't be promoted to be 
controller after reboot until the "headless" payload reboot due to 
'IMMSV_SC_ABSENCE_ALLOWED' timeout after 900 seconds. Seems OpenSAF modules in 
both SC just wait there and do nothing, till payload reboot due to timeout, 
then OpenSAF in SC continue to run, whole system recovered finally.



We thought ticket #1828 may has resolved this issue so we took another try with 
release 5.0.1 but got same result.



Could you please tell us in our case, why OpenSAF in both SC could not run 
until payload card(in "headless" status) rebooted due to timeout?

Besides 'IMMSV_SC_ABSENCE_ALLOWED', is there any other variable or parameter 
need to set/modify to enable 'headless cluster' feature? Do we miss anything?

Attachments are the syslog of SC and payload card when this problem happened, 
hope the log files can help us to find out the root cause.



Much appreciated to any comment, thanks!



Regards,

Jianfeng Dong






------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, SlashDot.org! http://sdm.link/slashdot




_______________________________________________

Opensaf-users mailing list

[email protected]<mailto:[email protected]>

https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to