I can send you a patch within the next few days and let you try it out. regards,
Anders Widell On 10/11/2016 11:36 AM, Jianfeng Dong wrote: > Do you have a clear plan to remove this requirement? > We want to know if we can't change node_id due to our architecture, when we > could get a no-this-limit release to upgrade? After all, our products have > been deployed to many customers so we have to think about upgrade and > compatibility issues. > > Thanks, > Jianfeng > > -----Original Message----- > From: Anders Widell [mailto:anders.wid...@ericsson.com] > Sent: Tuesday, October 11, 2016 4:10 PM > To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy > <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net > Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after enable > "headless cluster" feature > > Yes, this is required with the current implementation. It might be possible > to remove this requirement - I will think about how it can be done. > > regards, > > Anders Widell > > > On 10/11/2016 09:06 AM, Jianfeng Dong wrote: >> Is it obligatory that controller must have a slower slot_id than payload if >> we want to enable "headless" feature? >> If it is obligatory, seems it's a big change to our architecture, but I will >> have a try at least. >> >> Thanks, >> Jianfeng >> >> -----Original Message----- >> From: Anders Widell [mailto:anders.wid...@ericsson.com] >> Sent: Tuesday, October 11, 2016 2:30 PM >> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy >> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net >> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after >> enable "headless cluster" feature >> >> There is a one-to-one mapping between /etc/opensaf/slot_id and the node_id. >> Simply make sure that all your system controller nodes have lower slot_id >> than any of your payloads. This file is read when the node is booted. You >> should be able to do an in-service renumbering of your nodes - just be >> careful so that you never have two nodes with the same node_id at the same >> time. >> >> Yes, the assumption is there in 5.1.0 as well. >> >> regards, >> >> Anders Widell >> >> >> On 10/11/2016 04:29 AM, Jianfeng Dong wrote: >>> Yes, in our product payload's node_id is lower than SC, could you please >>> tell us how to configure it? >>> >>> And, does this assumption exist in OpenSAF 5.1.0 as well? >>> >>> Thanks, >>> Jianfeng >>> >>> -----Original Message----- >>> From: Anders Widell [mailto:anders.wid...@ericsson.com] >>> Sent: Tuesday, October 11, 2016 12:55 AM >>> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy >>> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net >>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after >>> enable "headless cluster" feature >>> >>> There is a (probably not so well documented :-) assumption that the system >>> controllers are configured with a lower node_id than the payloads. From >>> what I can see in the logs you sent, I think it looks like you have >>> configured the payload with a lower node_id than the system controllers. >>> >>> By the way, the headless feature has been improved in OpenSAF 5.1.0 so I >>> would suggest that you upgrade to that version if possible. >>> >>> regards, >>> >>> Anders Widell >>> >>> >>> On 10/10/2016 06:04 PM, Jianfeng Dong wrote: >>>> I tried with sufficient drive space but got same result, neither of the >>>> two SCs can be promoted to be controller until the payload reboot. >>>> >>>> I also checked the network link between SC and payload, they can PING each >>>> other when this issue happened. I suspect too the problem is caused by >>>> IMMD/IMMND link among those nodes, but don't know how to prove it. >>>> >>>> From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com] >>>> Sent: Monday, October 10, 2016 8:39 PM >>>> To: Jianfeng Dong <jd...@juniper.net>; >>>> opensaf-users@lists.sourceforge.net >>>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after >>>> enable "headless cluster" feature >>>> >>>> Hi, >>>> >>>> Once after the "Headless" if any of the controller started then the IMMND >>>> from the payaload will send the intro message to IMMD. >>>> Looks like this did not happen, the following is the log from the payload: >>>> >>>> 2016-10-10T11:09:18.507851+08:00 pld0101 osafimmnd[3141]: message >>>> repeated 2 times: [ logtrace: write failed, No space left on device] >>>> 2016-10-10T11:09:18.507883+08:00 pld0101 osafimmnd[3141]: NO >>>> Re-introduce-me highestProcessed:23839 highestReceived:23839 >>>> 2016-10-10T11:09:18.508011+08:00 pld0101 osafimmnd[3141]: logtrace: >>>> write failed, No space left on device >>>> 2016-10-10T11:09:18.508129+08:00 pld0101 osafimmnd[3141]: logtrace: >>>> write failed, No space left on device >>>> 2016-10-10T11:09:18.508501+08:00 pld0101 osafimmnd[3141]: WA MDS >>>> Send Failed to service:IMMD rc:2 >>>> >>>> >>>> Retry, again with the sufficient space in payload. >>>> >>>> /Neel. >>>> >>>> On 2016/10/10 03:59 PM, Jianfeng Dong wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> For several years we use OpenSAF(4.5.2 now) to provide HA service in our >>>> product(including 2 SC and several payload cards), but our customer keep >>>> on requiring that it's better to do NOT reboot payload card even if both >>>> SC reload or hang. >>>> >>>> >>>> >>>> We just knew that the new release 5.0.0 has provided this feature(i.e. >>>> "headless cluster"), so we installed 5.0.0 into our product and enable >>>> "headless" feature by setting "IMMSV_SC_ABSENCE_ALLOWED" to 900 seconds. >>>> After installation we found it worked fine, our system with new OpenSAF >>>> release can start to run successfully, all SC and payload cards can be >>>> "UP", and payload card will NOT reboot immediately after we reload both SC. >>>> >>>> >>>> >>>> However we got a problem that, neither of two SC can't be promoted to be >>>> controller after reboot until the "headless" payload reboot due to >>>> 'IMMSV_SC_ABSENCE_ALLOWED' timeout after 900 seconds. Seems OpenSAF >>>> modules in both SC just wait there and do nothing, till payload reboot due >>>> to timeout, then OpenSAF in SC continue to run, whole system recovered >>>> finally. >>>> >>>> >>>> >>>> We thought ticket #1828 may has resolved this issue so we took another try >>>> with release 5.0.1 but got same result. >>>> >>>> >>>> >>>> Could you please tell us in our case, why OpenSAF in both SC could not run >>>> until payload card(in "headless" status) rebooted due to timeout? >>>> >>>> Besides 'IMMSV_SC_ABSENCE_ALLOWED', is there any other variable or >>>> parameter need to set/modify to enable 'headless cluster' feature? Do we >>>> miss anything? >>>> >>>> Attachments are the syslog of SC and payload card when this problem >>>> happened, hope the log files can help us to find out the root cause. >>>> >>>> >>>> >>>> Much appreciated to any comment, thanks! >>>> >>>> >>>> >>>> Regards, >>>> >>>> Jianfeng Dong >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------------------------------------------------------------------- >>>> - >>>> - >>>> -------- >>>> >>>> Check out the vibrant tech community on one of the world's most >>>> >>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> >>>> Opensaf-users mailing list >>>> >>>> Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourc >>>> e >>>> f >>>> orge.net> >>>> >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>>> >>>> -------------------------------------------------------------------- >>>> - >>>> - >>>> -------- Check out the vibrant tech community on one of the world's >>>> most engaging tech sites, SlashDot.org! http://sdm.link/slashdot >>>> _______________________________________________ >>>> Opensaf-users mailing list >>>> Opensaf-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>>> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users