Hi Lennart,

Since we need this fix urgently. I will send out V2 with update about one line 
IF.
About improve handling timer, I think you should open a new ticket to 
enhance/refactor it.
What do you think?

Best Regards,
Thuan

-----Original Message-----
From: Lennart Lund <lennart.l...@ericsson.com> 
Sent: Thursday, September 27, 2018 10:24 PM
To: Thuan Tran <thuan.t...@dektech.com.au>
Cc: Gary Lee <gary....@dektech.com.au>; opensaf-devel@lists.sourceforge.net; 
Lennart Lund <lennart.l...@ericsson.com>
Subject: RE: [PATCH 1/1] smf: campaign is executing forever until cluster reset 
[#1353]

Hi Thuan,

Yes you are right. If sleep time is 2 sec then timeout -= 2 is correct. However 
this is a bit hard to read and that is still not good. I will look a little bit 
more at this and if possible suggest a better way of handling this. For example 
by using the class Timer that can be found in src/base/

Thanks
Lennart

> -----Original Message-----
> From: Thuan TRAN <thuan.t...@dektech.com.au>
> Sent: den 27 september 2018 15:24
> To: Lennart Lund <lennart.l...@ericsson.com>
> Cc: Gary Lee <gary....@dektech.com.au>; opensaf- 
> de...@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] smf: campaign is executing forever until 
> cluster reset [#1353]
> 
> Hi Lennart,
> 
> Thanks for review.
> I will update regarding to 2nd comment about one line code of IF.
> But I still not catch up your 1st comment about change to 5 seconds.
> 
> Current:
> timeout = 10s
> sleep(2s)
> timeout--; //reduce 1
> 
> My change:
> timeout = 10s
> sleep(2s)
> timeout -= 2; //reduce 2
> 
> Your comment:
> timeout = 5s //change this, right?
> sleep(1s) //change this, right?
> timeout--;
> 
> Regards,
> Thuan (from home)
> 
> Quoting Lennart Lund <lennart.l...@ericsson.com>:
> 
> > Hi Thuan
> >
> > I have some comments, see below snippet from the code in function
> > getNodeDestination() in SmfUtils.cc
> > I have not tested
> >
> >
> >   // Try to get the node director for a while. If the nodes reboot very fast
> >   // after a cluster reboot, it could happend the rebooted nodes 
> > comes up before
> >   // the last one is rebooted. This could make the campaign to fail when the
> >   // campaign continue.
> >   if (strcmp(className, "SaClmNode") == 0) {
> >     int timeout = 10 * ONE_SECOND;
> >     while (smfnd_for_name(i_node.c_str(), o_nodeDest) == false) {
> >       if (timeout <= 0) {
> >         LOG_NO("Failed to get node dest for clm node %s", i_node.c_str());
> >         return false;
> >       }
> >       struct timespec time = {2 * ONE_SECOND, 0};
> >       osaf_nanosleep(&time);
> >       // [Lennart] This will in practice change the timeout to 5 seconds and
> >       // to do it like this is confusing and requires a reader to 
> > not miss this code line
> >       // It is likely that a reader will believe it is still 10 seconds
> >       // It is better to change the timeout setting above to 5 * ONE_SECOND
> >       //
> >       timeout -= 2;
> >       // [Lennart] Even if the following code line is not changed by 
> > you but is not
> >       // a good way to write code. First elapsedTime is not a boolean and
> >       // secondly no if statements should be written as one liner like this
> >       // Should be changed to:
> >       // if (elapsedTime != 0) {
> >       //   *elapsedTime = *elapsedTime + 2 * ONE_SECOND;
> >       // }
> >       // It is good to fix things like this if it is found in a 
> > place where other
> >       // changes are done
> >       if (elapsedTime) *elapsedTime = *elapsedTime + 2 * ONE_SECOND;
> >       if (maxWaitTime != -1) {
> >         // [Lennart] The same problem as above with elapsedTime as above
> >         // Change to (elapsedTime != 0)
> >         if ((elapsedTime) && (*elapsedTime >= maxWaitTime)) {
> >           LOG_NO("Failed to get node dest for clm node %s", i_node.c_str());
> >           return false;
> >         }
> >       }
> >     }
> >     LOG_NO("%s: className '%s'", __FUNCTION__, className);
> >     return true;
> >
> > Thanks
> > Lennart
> >
> >> -----Original Message-----
> >> From: thuan.tran <thuan.t...@dektech.com.au>
> >> Sent: den 25 september 2018 09:04
> >> To: Lennart Lund <lennart.l...@ericsson.com>; Gary Lee 
> >> <gary....@dektech.com.au>
> >> Cc: opensaf-devel@lists.sourceforge.net; Thuan Tran 
> >> <thuan.t...@dektech.com.au>
> >> Subject: [PATCH 1/1] smf: campaign is executing forever until 
> >> cluster reset [#1353]
> >>
> >> The function getNodeDestination() reset elapsedTime to zero cause 
> >> the node reboot timeout at waitForNodeDestination() never reach.
> >> If scenario that node reboot cannot come back then campaign is 
> >> stuck in executing forever until cluster reset.
> >> ---
> >>  src/smf/smfd/SmfUpgradeStep.cc |  1 +
> >>  src/smf/smfd/SmfUtils.cc       | 11 ++++-------
> >>  2 files changed, 5 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/src/smf/smfd/SmfUpgradeStep.cc 
> >> b/src/smf/smfd/SmfUpgradeStep.cc index 4c0ddd192..80da668de 100644
> >> --- a/src/smf/smfd/SmfUpgradeStep.cc
> >> +++ b/src/smf/smfd/SmfUpgradeStep.cc
> >> @@ -2399,6 +2399,7 @@ bool SmfUpgradeStep::nodeReboot() {
> >>        "SmfUpgradeStep::nodeReboot: Waiting to get node destination 
> >> with increased UP counter");
> >>
> >>    while (true) {
> >> +    elapsedTime = 0;
> >>      for (nodeIt = rebootedNodeList.begin(); nodeIt !=
> >> rebootedNodeList.end();) {
> >>        if (getNodeDestination((*nodeIt).node_name, &nodeDest, 
> >> &elapsedTime,
> >>                               -1)) { diff --git 
> >> a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc index 
> >> 915c086a5..4ac5af163 100644
> >> --- a/src/smf/smfd/SmfUtils.cc
> >> +++ b/src/smf/smfd/SmfUtils.cc
> >> @@ -95,9 +95,6 @@ bool getNodeDestination(const std::string 
> >> &i_node, SmfndNodeDest *o_nodeDest,
> >>
> >>    TRACE("Find destination for node '%s'", i_node.c_str());
> >>
> >> -  if (elapsedTime)  // Initialize elapsedTime to zero.
> >> -    *elapsedTime = 0;
> >> -
> >>    /* It seems SaAmfNode objects can be stored, but the code
> >>     * indicates that SaClmNode's are expected. Anyway an attempt
> >>     * to go for it is probably faster that examining IMM classes @@ 
> >> -133,10 +130,10 @@ bool getNodeDestination(const std::string 
> >> &i_node, SmfndNodeDest *o_nodeDest,
> >>        }
> >>        struct timespec time = {2 * ONE_SECOND, 0};
> >>        osaf_nanosleep(&time);
> >> -      timeout--;
> >> +      timeout -= 2;
> >>        if (elapsedTime) *elapsedTime = *elapsedTime + 2 * ONE_SECOND;
> >>        if (maxWaitTime != -1) {
> >> -        if (*elapsedTime >= maxWaitTime) {
> >> +        if ((elapsedTime) && (*elapsedTime >= maxWaitTime)) {
> >>            LOG_NO("Failed to get node dest for clm node %s", 
> >> i_node.c_str());
> >>            return false;
> >>          }
> >> @@ -165,11 +162,11 @@ bool getNodeDestination(const std::string 
> >> &i_node, SmfndNodeDest *o_nodeDest,
> >>        }
> >>        struct timespec time = {2 * ONE_SECOND, 0};
> >>        osaf_nanosleep(&time);
> >> -      timeout--;
> >> +      timeout -= 2;
> >>        if (elapsedTime) *elapsedTime = *elapsedTime + 2 * 
> >> ONE_SECOND;
> >>
> >>        if (maxWaitTime != -1) {
> >> -        if (*elapsedTime >= maxWaitTime) {
> >> +        if ((elapsedTime) && (*elapsedTime >= maxWaitTime)) {
> >>            LOG_NO("Failed to get node dest for clm node %s", 
> >> i_node.c_str());
> >>            free(nodeName);
> >>            return false;
> >> --
> >> 2.18.0
> 




_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to