I tried with sufficient drive space but got same result, neither of the two SCs can be promoted to be controller until the payload reboot.
I also checked the network link between SC and payload, they can PING each other when this issue happened. I suspect too the problem is caused by IMMD/IMMND link among those nodes, but don't know how to prove it. From: Neelakanta Reddy [mailto:[email protected]] Sent: Monday, October 10, 2016 8:39 PM To: Jianfeng Dong <[email protected]>; [email protected] Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after enable "headless cluster" feature Hi, Once after the "Headless" if any of the controller started then the IMMND from the payaload will send the intro message to IMMD. Looks like this did not happen, the following is the log from the payload: 2016-10-10T11:09:18.507851+08:00 pld0101 osafimmnd[3141]: message repeated 2 times: [ logtrace: write failed, No space left on device] 2016-10-10T11:09:18.507883+08:00 pld0101 osafimmnd[3141]: NO Re-introduce-me highestProcessed:23839 highestReceived:23839 2016-10-10T11:09:18.508011+08:00 pld0101 osafimmnd[3141]: logtrace: write failed, No space left on device 2016-10-10T11:09:18.508129+08:00 pld0101 osafimmnd[3141]: logtrace: write failed, No space left on device 2016-10-10T11:09:18.508501+08:00 pld0101 osafimmnd[3141]: WA MDS Send Failed to service:IMMD rc:2 Retry, again with the sufficient space in payload. /Neel. On 2016/10/10 03:59 PM, Jianfeng Dong wrote: Hi, For several years we use OpenSAF(4.5.2 now) to provide HA service in our product(including 2 SC and several payload cards), but our customer keep on requiring that it's better to do NOT reboot payload card even if both SC reload or hang. We just knew that the new release 5.0.0 has provided this feature(i.e. "headless cluster"), so we installed 5.0.0 into our product and enable "headless" feature by setting "IMMSV_SC_ABSENCE_ALLOWED" to 900 seconds. After installation we found it worked fine, our system with new OpenSAF release can start to run successfully, all SC and payload cards can be "UP", and payload card will NOT reboot immediately after we reload both SC. However we got a problem that, neither of two SC can't be promoted to be controller after reboot until the "headless" payload reboot due to 'IMMSV_SC_ABSENCE_ALLOWED' timeout after 900 seconds. Seems OpenSAF modules in both SC just wait there and do nothing, till payload reboot due to timeout, then OpenSAF in SC continue to run, whole system recovered finally. We thought ticket #1828 may has resolved this issue so we took another try with release 5.0.1 but got same result. Could you please tell us in our case, why OpenSAF in both SC could not run until payload card(in "headless" status) rebooted due to timeout? Besides 'IMMSV_SC_ABSENCE_ALLOWED', is there any other variable or parameter need to set/modify to enable 'headless cluster' feature? Do we miss anything? Attachments are the syslog of SC and payload card when this problem happened, hope the log files can help us to find out the root cause. Much appreciated to any comment, thanks! Regards, Jianfeng Dong ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected]<mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
