Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-02-16 Thread minh chau
Hi Nagu,

This patch is just for a corner case, where failover happens in between 
AVD_INIT_DONE and AVD_APP_STATE, we still have to reboot the node if out 
of cold sync happens.
So I think we still have to keep that sentence in Compliance Table.

Thanks,
Minh

On 16/02/17 17:24, Nagendra Kumar wrote:
>
> Hi Minh,
>
> Good catch !! Yes, please push, but as such we have documented in 
> Compliance Table that “Before the timer expiry, failover and 
> switchover are not supported.”
>
> Thanks
>
> -Nagu
>
> *From:*minh chau [mailto:minh.c...@dektech.com.au]
> *Sent:* 16 February 2017 07:36
> *To:* Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; 
> gary@dektech.com.au
> *Cc:* opensaf-devel@lists.sourceforge.net
> *Subject:* Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless 
> sync before standby AMFD comes up [#2162]
>
> Hi Nagu,
>
> Thanks for reminding, there's one change in the patch that could 
> affect on upgrade too, it is:
>
> +// The cb->init_state must be AVD_INIT_DONE or 
> AVD_APP_STATE
> +// If AVD_INIT_DONE, there was a SC failover during 
> cluster
> +// instantiation phase in cluster (after all NCS SU 
> is assigned)
> +// If AVD_APP_STATE, this should be come from 2N-MW 
> SI swap
> *+if (cb->init_state >= AVD_INIT_DONE) {*
> +if (cluster_su_instantiation_done(cb, nullptr) == 
> true) {
> + cluster_startup_expiry_event_generate(cb);
> +} else {
> +m_AVD_CLINIT_TMR_START(cb);
> +}
> +}
>
> So, I would like to make it for AVD_INIT_DONE only, it looks like
>
> +// The cb->init_state must be AVD_INIT_DONE or 
> AVD_APP_STATE
> +// If AVD_INIT_DONE, there was a SC failover during 
> cluster
> +// instantiation phase in cluster (after all NCS SU 
> is assigned)
> *+if (cb->init_state == AVD_INIT_DONE) {*
> +if (cluster_su_instantiation_done(cb, nullptr) == 
> true) {
> + cluster_startup_expiry_event_generate(cb);
> +} else {
> +m_AVD_CLINIT_TMR_START(cb);
> +}
> +}
>
> If you agree, I can push the patches with new change.
>
> Thanks,
> Minh
>
> On 15/02/17 15:13, Nagendra Kumar wrote:
>
> Yes, ack for both the patches. I assume you would have tested upgrade 
> scenarios.
>
> Thanks
>
> -Nagu
>
> -Original Message-
>
> From: minh chau [mailto:minh.c...@dektech.com.au]
>
> Sent: 15 February 2017 08:52
>
> To: Nagendra Kumar;hans.nordeb...@ericsson.com 
> ; Praveen Malviya;
>
> gary@dektech.com.au 
>
> Cc:opensaf-devel@lists.sourceforge.net
> 
>
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless 
> sync
>
> before standby AMFD comes up [#2162]
>
> Hi Nagu,
>
> The #2162 has two patches. I think your ack is for [PATCH 2 of 2] 
> AMFND:
>
> Fix SC failover during headless sync before standby AMFD comes up 
> [#2162].
>
> Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during 
> headless
>
> sync at INIT_DONE state [#2162]) look ok?
>
> Thanks,
>
> Minh
>
> On 14/02/17 20:40, Nagendra Kumar wrote:
>
> Ack.
>
> Tested the scenarios.
>
> Thanks
>
> -Nagu
>
> -Original Message-
>
> From: minh chau [mailto:minh.c...@dektech.com.au]
>
> Sent: 23 January 2017 16:24
>
> To: Nagendra Kumar;hans.nordeb...@ericsson.com
> ; Praveen Malviya;
>
> gary@dektech.com.au 
>
> Cc:opensaf-devel@lists.sourceforge.net
> 
>
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during 
> headless
>
> sync before standby AMFD comes up [#2162]
>
> Hi Nagu,
>
> I am checking the logs now.
>
> Thanks, Minh
>
> On 23/01/17 17:47, Nagendra Kumar wrote:
>
> The logs (Logs-tc.rar) attached in the ticket.
>
> Thanks
>
> -Nagu
>
> -Original Message-
>
> From: minh chau [mailto:minh.c...@dektech.com.au]
>
> Sent: 16 January 2017 05:47
>
> To: Nagendra Kumar;hans.nordeb...@ericsson.com
> ; Praveen Malviya;
>
> gary@dektech.com.au
> 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-02-15 Thread Nagendra Kumar
Hi Minh,

Good catch !! Yes, please push, but as such we 
have documented in Compliance Table that "Before the timer expiry, failover and 
switchover are not supported."

 

Thanks

-Nagu

 

From: minh chau [mailto:minh.c...@dektech.com.au] 
Sent: 16 February 2017 07:36
To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before 
standby AMFD comes up [#2162]

 

Hi Nagu,

Thanks for reminding, there's one change in the patch that could affect on 
upgrade too, it is:

+// The cb->init_state must be AVD_INIT_DONE or AVD_APP_STATE
+// If AVD_INIT_DONE, there was a SC failover during cluster
+// instantiation phase in cluster (after all NCS SU is 
assigned)
+// If AVD_APP_STATE, this should be come from 2N-MW SI swap
+if (cb->init_state >= AVD_INIT_DONE) {
+if (cluster_su_instantiation_done(cb, nullptr) == true) {
+cluster_startup_expiry_event_generate(cb);
+} else {
+m_AVD_CLINIT_TMR_START(cb);
+}
+}

So, I would like to make it for AVD_INIT_DONE only, it looks like

+// The cb->init_state must be AVD_INIT_DONE or AVD_APP_STATE
+// If AVD_INIT_DONE, there was a SC failover during cluster
+// instantiation phase in cluster (after all NCS SU is 
assigned)
+if (cb->init_state == AVD_INIT_DONE) {
+if (cluster_su_instantiation_done(cb, nullptr) == true) {
+cluster_startup_expiry_event_generate(cb);
+} else {
+m_AVD_CLINIT_TMR_START(cb);
+}
+}

If you agree, I can push the patches with new change.

Thanks,
Minh

On 15/02/17 15:13, Nagendra Kumar wrote:

Yes, ack for both the patches. I assume you would have tested upgrade scenarios.
 
 
Thanks
-Nagu
 

-Original Message-
From: minh chau [mailto:minh.c...@dektech.com.au]
Sent: 15 February 2017 08:52
To: Nagendra Kumar; HYPERLINK 
"mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen 
Malviya;
HYPERLINK "mailto:gary@dektech.com.au"gary@dektech.com.au
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
before standby AMFD comes up [#2162]
 
Hi Nagu,
 
The #2162 has two patches. I think your ack is for [PATCH 2 of 2] AMFND:
Fix SC failover during headless sync before standby AMFD comes up [#2162].
Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during headless
sync at INIT_DONE state [#2162]) look ok?
 
Thanks,
Minh
On 14/02/17 20:40, Nagendra Kumar wrote:

Ack.
Tested the scenarios.
 
Thanks
-Nagu
 

-Original Message-
From: minh chau [mailto:minh.c...@dektech.com.au]
Sent: 23 January 2017 16:24
To: Nagendra Kumar; HYPERLINK 
"mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen 
Malviya;
HYPERLINK "mailto:gary@dektech.com.au"gary@dektech.com.au
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
sync before standby AMFD comes up [#2162]
 
Hi Nagu,
 
I am checking the logs now.
 
Thanks, Minh
 
On 23/01/17 17:47, Nagendra Kumar wrote:

The logs (Logs-tc.rar) attached in the ticket.
 
Thanks
-Nagu
 

-Original Message-
From: minh chau [mailto:minh.c...@dektech.com.au]
Sent: 16 January 2017 05:47
To: Nagendra Kumar; HYPERLINK 
"mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen 
Malviya;
HYPERLINK "mailto:gary@dektech.com.au"gary@dektech.com.au
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
sync before standby AMFD comes up [#2162]
 
Hi Nagu,
 
I misunderstood your point, and now I get it.
In my test I see it works as expected - SU2 becomes Act and no
assignment for SU1 I guess in your test some how the cluster
initiation timer has not been started on SC2 (new active), there
could be a

missing case in the patch.

Could you please share me the trace?
 
Thanks,
Minh
 
On 13/01/17 21:48, Nagendra Kumar wrote:

Hi Minh,
   Please check my response inlined with [Nagu].
 
Thanks
-Nagu

-Original Message-
From: minh chau [mailto:minh.c...@dektech.com.au]
Sent: 13 January 2017 03:53
To: Nagendra Kumar; HYPERLINK 
"mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen

Malviya;

HYPERLINK "mailto:gary@dektech.com.au"gary@dektech.com.au
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-02-15 Thread minh chau
Hi Nagu,

Thanks for reminding, there's one change in the patch that could affect 
on upgrade too, it is:

+// The cb->init_state must be AVD_INIT_DONE or 
AVD_APP_STATE
+// If AVD_INIT_DONE, there was a SC failover during cluster
+// instantiation phase in cluster (after all NCS SU is 
assigned)
+// If AVD_APP_STATE, this should be come from 2N-MW SI swap
*+if (cb->init_state >= AVD_INIT_DONE) {*
+if (cluster_su_instantiation_done(cb, nullptr) == 
true) {
+cluster_startup_expiry_event_generate(cb);
+} else {
+m_AVD_CLINIT_TMR_START(cb);
+}
+}

So, I would like to make it for AVD_INIT_DONE only, it looks like

+// The cb->init_state must be AVD_INIT_DONE or 
AVD_APP_STATE
+// If AVD_INIT_DONE, there was a SC failover during cluster
+// instantiation phase in cluster (after all NCS SU is 
assigned)
*+if (cb->init_state == AVD_INIT_DONE) {*
+if (cluster_su_instantiation_done(cb, nullptr) == 
true) {
+cluster_startup_expiry_event_generate(cb);
+} else {
+m_AVD_CLINIT_TMR_START(cb);
+}
+}

If you agree, I can push the patches with new change.

Thanks,
Minh

On 15/02/17 15:13, Nagendra Kumar wrote:
> Yes, ack for both the patches. I assume you would have tested upgrade 
> scenarios.
>
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 15 February 2017 08:52
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> The #2162 has two patches. I think your ack is for [PATCH 2 of 2] AMFND:
>> Fix SC failover during headless sync before standby AMFD comes up [#2162].
>> Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during headless
>> sync at INIT_DONE state [#2162]) look ok?
>>
>> Thanks,
>> Minh
>> On 14/02/17 20:40, Nagendra Kumar wrote:
>>> Ack.
>>> Tested the scenarios.
>>>
>>> Thanks
>>> -Nagu
>>>
 -Original Message-
 From: minh chau [mailto:minh.c...@dektech.com.au]
 Sent: 23 January 2017 16:24
 To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
 gary@dektech.com.au
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
 sync before standby AMFD comes up [#2162]

 Hi Nagu,

 I am checking the logs now.

 Thanks, Minh

 On 23/01/17 17:47, Nagendra Kumar wrote:
> The logs (Logs-tc.rar) attached in the ticket.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 16 January 2017 05:47
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>> sync before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> I misunderstood your point, and now I get it.
>> In my test I see it works as expected - SU2 becomes Act and no
>> assignment for SU1 I guess in your test some how the cluster
>> initiation timer has not been started on SC2 (new active), there
>> could be a
 missing case in the patch.
>> Could you please share me the trace?
>>
>> Thanks,
>> Minh
>>
>> On 13/01/17 21:48, Nagendra Kumar wrote:
>>> Hi Minh,
>>> Please check my response inlined with [Nagu].
>>>
>>> Thanks
>>> -Nagu
 -Original Message-
 From: minh chau [mailto:minh.c...@dektech.com.au]
 Sent: 13 January 2017 03:53
 To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen
>> Malviya;
 gary@dektech.com.au
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during
 headless sync before standby AMFD comes up [#2162]

 Hi Nagu,

 Thanks for reviewing, please see comments inline.

 Thanks,
 Minh

 On 12/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>Though I am not able to simulate the problem, I tested as
>> below:
> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act
> and
> SU2 on
 PL-4 as Standby.
> 2. Stop SC1 and SC2 and then stop PL-3.
> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete,
> stop SC1. SC2

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-02-14 Thread Nagendra Kumar
Yes, ack for both the patches. I assume you would have tested upgrade scenarios.


Thanks
-Nagu

> -Original Message-
> From: minh chau [mailto:minh.c...@dektech.com.au]
> Sent: 15 February 2017 08:52
> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
> before standby AMFD comes up [#2162]
> 
> Hi Nagu,
> 
> The #2162 has two patches. I think your ack is for [PATCH 2 of 2] AMFND:
> Fix SC failover during headless sync before standby AMFD comes up [#2162].
> Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during headless
> sync at INIT_DONE state [#2162]) look ok?
> 
> Thanks,
> Minh
> On 14/02/17 20:40, Nagendra Kumar wrote:
> > Ack.
> > Tested the scenarios.
> >
> > Thanks
> > -Nagu
> >
> >> -Original Message-
> >> From: minh chau [mailto:minh.c...@dektech.com.au]
> >> Sent: 23 January 2017 16:24
> >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> >> gary@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> >> sync before standby AMFD comes up [#2162]
> >>
> >> Hi Nagu,
> >>
> >> I am checking the logs now.
> >>
> >> Thanks, Minh
> >>
> >> On 23/01/17 17:47, Nagendra Kumar wrote:
> >>> The logs (Logs-tc.rar) attached in the ticket.
> >>>
> >>> Thanks
> >>> -Nagu
> >>>
>  -Original Message-
>  From: minh chau [mailto:minh.c...@dektech.com.au]
>  Sent: 16 January 2017 05:47
>  To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>  gary@dektech.com.au
>  Cc: opensaf-devel@lists.sourceforge.net
>  Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>  sync before standby AMFD comes up [#2162]
> 
>  Hi Nagu,
> 
>  I misunderstood your point, and now I get it.
>  In my test I see it works as expected - SU2 becomes Act and no
>  assignment for SU1 I guess in your test some how the cluster
>  initiation timer has not been started on SC2 (new active), there
>  could be a
> >> missing case in the patch.
>  Could you please share me the trace?
> 
>  Thanks,
>  Minh
> 
>  On 13/01/17 21:48, Nagendra Kumar wrote:
> > Hi Minh,
> > Please check my response inlined with [Nagu].
> >
> > Thanks
> > -Nagu
> >> -Original Message-
> >> From: minh chau [mailto:minh.c...@dektech.com.au]
> >> Sent: 13 January 2017 03:53
> >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen
> Malviya;
> >> gary@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during
> >> headless sync before standby AMFD comes up [#2162]
> >>
> >> Hi Nagu,
> >>
> >> Thanks for reviewing, please see comments inline.
> >>
> >> Thanks,
> >> Minh
> >>
> >> On 12/01/17 21:48, Nagendra Kumar wrote:
> >>> Hi Minh,
> >>>Though I am not able to simulate the problem, I tested as
> below:
> >>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act
> >>> and
> >>> SU2 on
> >> PL-4 as Standby.
> >>> 2. Stop SC1 and SC2 and then stop PL-3.
> >>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete,
> >>> stop SC1. SC2
> >> becomes Act.
> >> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped,
> >> then only
> >> SU2 has active assignment
> > [Nagu]: PL-3 is stopped in step #2.
> >>> In this case, SC-2 contains both SU1(Act) and SU2(Standby)
> >> assignments.
> >>> Ideally, SU2 assignments should have been Act and there
> >>> shouldn't be
> >>> SU1
> >> assignment.
> >> [M]: This seems to be another test where SU1 and SU2 are hosted
> >> on SC2, then both SU1 and SU2 should get assignment
> > [Nagu]: I mean to say command 'amf-state siass' run on SC-1
> > displays both
>  SU1 and SU2 assignments.
> >SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
> > This is similar test case, which is mentioned in the ticket?
> >>
> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
> >> mo,safApp=AmfDemo1
> >>> saAmfSISUHAState=ACTIVE(1)
> >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> >>>
> >>
> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
> >> mo,safApp=AmfDemo1
> >>> saAmfSISUHAState=STANDBY(2)
> >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> >>>
> >>> Please check.
> >>>
> >>> Thanks
> >>> -Nagu
> >>>
>  -Original Message-
>  From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>  Sent: 08 November 2016 08:53
>  To: hans.nordeb...@ericsson.com; 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-02-14 Thread Nagendra Kumar
Ack. 
Tested the scenarios.

Thanks
-Nagu

> -Original Message-
> From: minh chau [mailto:minh.c...@dektech.com.au]
> Sent: 23 January 2017 16:24
> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
> before standby AMFD comes up [#2162]
> 
> Hi Nagu,
> 
> I am checking the logs now.
> 
> Thanks, Minh
> 
> On 23/01/17 17:47, Nagendra Kumar wrote:
> > The logs (Logs-tc.rar) attached in the ticket.
> >
> > Thanks
> > -Nagu
> >
> >> -Original Message-
> >> From: minh chau [mailto:minh.c...@dektech.com.au]
> >> Sent: 16 January 2017 05:47
> >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> >> gary@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> >> sync before standby AMFD comes up [#2162]
> >>
> >> Hi Nagu,
> >>
> >> I misunderstood your point, and now I get it.
> >> In my test I see it works as expected - SU2 becomes Act and no
> >> assignment for SU1 I guess in your test some how the cluster
> >> initiation timer has not been started on SC2 (new active), there could be a
> missing case in the patch.
> >> Could you please share me the trace?
> >>
> >> Thanks,
> >> Minh
> >>
> >> On 13/01/17 21:48, Nagendra Kumar wrote:
> >>> Hi Minh,
> >>>   Please check my response inlined with [Nagu].
> >>>
> >>> Thanks
> >>> -Nagu
>  -Original Message-
>  From: minh chau [mailto:minh.c...@dektech.com.au]
>  Sent: 13 January 2017 03:53
>  To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>  gary@dektech.com.au
>  Cc: opensaf-devel@lists.sourceforge.net
>  Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>  sync before standby AMFD comes up [#2162]
> 
>  Hi Nagu,
> 
>  Thanks for reviewing, please see comments inline.
> 
>  Thanks,
>  Minh
> 
>  On 12/01/17 21:48, Nagendra Kumar wrote:
> > Hi Minh,
> >  Though I am not able to simulate the problem, I tested as 
> > below:
> > 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
> > SU2 on
>  PL-4 as Standby.
> > 2. Stop SC1 and SC2 and then stop PL-3.
> > 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
> > SC1. SC2
>  becomes Act.
>  [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
>  only
>  SU2 has active assignment
> >>> [Nagu]: PL-3 is stopped in step #2.
> > In this case, SC-2 contains both SU1(Act) and SU2(Standby)
> assignments.
> > Ideally, SU2 assignments should have been Act and there shouldn't
> > be
> > SU1
>  assignment.
>  [M]: This seems to be another test where SU1 and SU2 are hosted on
>  SC2, then both SU1 and SU2 should get assignment
> >>> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays
> >>> both
> >> SU1 and SU2 assignments.
> >>>   SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
> >>> This is similar test case, which is mentioned in the ticket?
> >>
> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>  mo,safApp=AmfDemo1
> >saAmfSISUHAState=ACTIVE(1)
> >saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> >
> >>
> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>  mo,safApp=AmfDemo1
> >saAmfSISUHAState=STANDBY(2)
> >saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> >
> > Please check.
> >
> > Thanks
> > -Nagu
> >
> >> -Original Message-
> >> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> >> Sent: 08 November 2016 08:53
> >> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen
> Malviya;
> >> gary@dektech.com.au; minh.c...@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> >> sync before standby AMFD comes up [#2162]
> >>
> >> osaf/services/saf/amf/amfnd/di.cc   |  7 +--
> >> osaf/services/saf/amf/amfnd/susm.cc |  6 ++
> >> 2 files changed, 11 insertions(+), 2 deletions(-)
> >>
> >>
> >> This case of SC failover causes new active AMFD getting stuck in
> >> node_up messages
> >>
> >> Say first active controller is SC1, which goes down during headless
> sync.
> >> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
> >> is_avd_down and amfd_sync_required are set to true. When SC2
> >> takes over active role, amfnd on SC2 receives mds_up, but only
> >> is_avd_down is set to false and the variable amfd_sync_required
> >> remains true.
> >> When amfnd-SC2 finishes initiating middleware SU, it needs to
> >> send su_oper message to 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-30 Thread minh chau
Hi Nagu,

I sent to you my findings a week ago, copy it here again.

Thanks,
Minh

"
Looks like you are testing Nway model, I haven't tested any headless 
cases for Nway and NpM model

Jan 23 12:02:55.047416 osafamfd [8625:sg_nway_fsm.cc:0215] >> su_insvc: 
'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1', 0

However, after failover SC1, in SC2 the cluster init timer has been 
activated again to failover absent assignment
Jan 23 12:04:38.113581 osafamfd [9935:cluster.cc:0055] >> 
avd_cluster_tmr_init_evh

failover absent assignment of SU1 started here
Jan 23 12:04:38.114051 osafamfd [9935:sg.cc:2270] >> 
failover_absent_assignment: SG:'safSg=AmfDemo_2N,safApp=AmfDemo1'
Jan 23 12:04:38.114055 osafamfd [9935:su.cc:2451] >> any_susi_fsm_in: 
SU:'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1', check_fsm:1
Jan 23 12:04:38.114060 osafamfd [9935:su.cc:2456] TR 
SUSI:'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1',
 
fsm:'1'
Jan 23 12:04:38.114064 osafamfd [9935:su.cc:2459] TR Found
Jan 23 12:04:38.114068 osafamfd [9935:su.cc:2462] << any_susi_fsm_in
Jan 23 12:04:38.114073 osafamfd [9935:sg_nway_fsm.cc:0474] >> node_fail: 0

SU2's assignment has moved to active
Jan 23 12:04:38.114123 osafamfd [9935:siass.cc:0753] >> 
avd_susi_mod_send: SI 'safSi=AmfDemo,safApp=AmfDemo1', SU 
'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1' ha_state:1

assignment of SU1 was deleted
Jan 23 12:04:38.117784 osafamfd [9935:siass.cc:0586] >> avd_susi_delete: 
safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1
...
Jan 23 12:04:38.120109 osafamfd [9935:imm.cc:0275] >> exec: Delete 
safCSIComp=safComp=AmfDemo\,safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1
...
Jan 23 12:04:38.120440 osafamfd [9935:imm.cc:0275] >> exec: Delete 
safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1

So maybe "amf-state siass" had been issued before the failover absent 
assignment finished in SC2?
"


On 30/01/17 16:04, Nagendra Kumar wrote:
> Any update ??
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: Nagendra Kumar
>> Sent: 23 January 2017 12:18
>> To: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless
>> sync before standby AMFD comes up [#2162]
>>
>> The logs (Logs-tc.rar) attached in the ticket.
>>
>> Thanks
>> -Nagu
>>
>>> -Original Message-
>>> From: minh chau [mailto:minh.c...@dektech.com.au]
>>> Sent: 16 January 2017 05:47
>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>>> gary@dektech.com.au
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>> sync before standby AMFD comes up [#2162]
>>>
>>> Hi Nagu,
>>>
>>> I misunderstood your point, and now I get it.
>>> In my test I see it works as expected - SU2 becomes Act and no
>>> assignment for SU1 I guess in your test some how the cluster
>>> initiation timer has not been started on SC2 (new active), there could be a
>> missing case in the patch.
>>> Could you please share me the trace?
>>>
>>> Thanks,
>>> Minh
>>>
>>> On 13/01/17 21:48, Nagendra Kumar wrote:
>>>> Hi Minh,
>>>>Please check my response inlined with [Nagu].
>>>>
>>>> Thanks
>>>> -Nagu
>>>>> -Original Message-
>>>>> From: minh chau [mailto:minh.c...@dektech.com.au]
>>>>> Sent: 13 January 2017 03:53
>>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>>>>> gary@dektech.com.au
>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>>>> sync before standby AMFD comes up [#2162]
>>>>>
>>>>> Hi Nagu,
>>>>>
>>>>> Thanks for reviewing, please see comments inline.
>>>>>
>>>>> Thanks,
>>>>> Minh
>>>>>
>>>>> On 12/01/17 21:48, Nagendra Kumar wrote:
>>>>>> Hi Minh,
>>>>>>   Though I am not able to simulate the problem, I tested as below:
>>>>>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
>>>>>> SU2 on
>>>>> PL-4 as Standby.
>>>>>> 2. Stop SC1 and SC2 and then stop PL-3.
>>

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-29 Thread Nagendra Kumar
Any update ??

Thanks
-Nagu

> -Original Message-
> From: Nagendra Kumar
> Sent: 23 January 2017 12:18
> To: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless
> sync before standby AMFD comes up [#2162]
> 
> The logs (Logs-tc.rar) attached in the ticket.
> 
> Thanks
> -Nagu
> 
> > -Original Message-
> > From: minh chau [mailto:minh.c...@dektech.com.au]
> > Sent: 16 January 2017 05:47
> > To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> > gary@dektech.com.au
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> > sync before standby AMFD comes up [#2162]
> >
> > Hi Nagu,
> >
> > I misunderstood your point, and now I get it.
> > In my test I see it works as expected - SU2 becomes Act and no
> > assignment for SU1 I guess in your test some how the cluster
> > initiation timer has not been started on SC2 (new active), there could be a
> missing case in the patch.
> > Could you please share me the trace?
> >
> > Thanks,
> > Minh
> >
> > On 13/01/17 21:48, Nagendra Kumar wrote:
> > > Hi Minh,
> > >   Please check my response inlined with [Nagu].
> > >
> > > Thanks
> > > -Nagu
> > >> -Original Message-
> > >> From: minh chau [mailto:minh.c...@dektech.com.au]
> > >> Sent: 13 January 2017 03:53
> > >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> > >> gary@dektech.com.au
> > >> Cc: opensaf-devel@lists.sourceforge.net
> > >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> > >> sync before standby AMFD comes up [#2162]
> > >>
> > >> Hi Nagu,
> > >>
> > >> Thanks for reviewing, please see comments inline.
> > >>
> > >> Thanks,
> > >> Minh
> > >>
> > >> On 12/01/17 21:48, Nagendra Kumar wrote:
> > >>> Hi Minh,
> > >>>  Though I am not able to simulate the problem, I tested as 
> > >>> below:
> > >>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
> > >>> SU2 on
> > >> PL-4 as Standby.
> > >>> 2. Stop SC1 and SC2 and then stop PL-3.
> > >>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
> > >>> SC1. SC2
> > >> becomes Act.
> > >> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
> > >> only
> > >> SU2 has active assignment
> > > [Nagu]: PL-3 is stopped in step #2.
> > >>> In this case, SC-2 contains both SU1(Act) and SU2(Standby)
> assignments.
> > >>> Ideally, SU2 assignments should have been Act and there shouldn't
> > >>> be
> > >>> SU1
> > >> assignment.
> > >> [M]: This seems to be another test where SU1 and SU2 are hosted on
> > >> SC2, then both SU1 and SU2 should get assignment
> > > [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays
> > > both
> > SU1 and SU2 assignments.
> > >  SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
> > > This is similar test case, which is mentioned in the ticket?
> > >>>
> > >>
> >
> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
> > >> mo,safApp=AmfDemo1
> > >>>   saAmfSISUHAState=ACTIVE(1)
> > >>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> > >>>
> > >>
> >
> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
> > >> mo,safApp=AmfDemo1
> > >>>   saAmfSISUHAState=STANDBY(2)
> > >>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> > >>>
> > >>> Please check.
> > >>>
> > >>> Thanks
> > >>> -Nagu
> > >>>
> > >>>> -Original Message-
> > >>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> > >>>> Sent: 08 November 2016 08:53
> > >>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen
> Malviya;
> > >>>> gary@dektech.com.au; minh.c...@dektech.com.au
> > >>>> Cc: opensaf-devel@lists.sourceforge

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-23 Thread minh chau
Hi Nagu,

Looks like you are testing Nway model, I haven't tested any headless 
cases for Nway and NpM model

Jan 23 12:02:55.047416 osafamfd [8625:sg_nway_fsm.cc:0215] >> su_insvc: 
'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1', 0

However, after failover SC1, in SC2 the cluster init timer has been 
activated again to failover absent assignment
Jan 23 12:04:38.113581 osafamfd [9935:cluster.cc:0055] >> 
avd_cluster_tmr_init_evh

failover absent assignment of SU1 started here
Jan 23 12:04:38.114051 osafamfd [9935:sg.cc:2270] >> 
failover_absent_assignment: SG:'safSg=AmfDemo_2N,safApp=AmfDemo1'
Jan 23 12:04:38.114055 osafamfd [9935:su.cc:2451] >> any_susi_fsm_in: 
SU:'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1', check_fsm:1
Jan 23 12:04:38.114060 osafamfd [9935:su.cc:2456] TR 
SUSI:'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1',
 
fsm:'1'
Jan 23 12:04:38.114064 osafamfd [9935:su.cc:2459] TR Found
Jan 23 12:04:38.114068 osafamfd [9935:su.cc:2462] << any_susi_fsm_in
Jan 23 12:04:38.114073 osafamfd [9935:sg_nway_fsm.cc:0474] >> node_fail: 0

SU2's assignment has moved to active
Jan 23 12:04:38.114123 osafamfd [9935:siass.cc:0753] >> 
avd_susi_mod_send: SI 'safSi=AmfDemo,safApp=AmfDemo1', SU 
'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1' ha_state:1

assignment of SU1 was deleted
Jan 23 12:04:38.117784 osafamfd [9935:siass.cc:0586] >> avd_susi_delete: 
safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1
...
Jan 23 12:04:38.120109 osafamfd [9935:imm.cc:0275] >> exec: Delete 
safCSIComp=safComp=AmfDemo\,safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1
...
Jan 23 12:04:38.120440 osafamfd [9935:imm.cc:0275] >> exec: Delete 
safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1

So maybe "amf-state siass" had been issued before the failover absent 
assignment finished in SC2?

Thanks,
Minh

On 23/01/17 17:47, Nagendra Kumar wrote:
> The logs (Logs-tc.rar) attached in the ticket.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 16 January 2017 05:47
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> I misunderstood your point, and now I get it.
>> In my test I see it works as expected - SU2 becomes Act and no assignment
>> for SU1 I guess in your test some how the cluster initiation timer has not
>> been started on SC2 (new active), there could be a missing case in the patch.
>> Could you please share me the trace?
>>
>> Thanks,
>> Minh
>>
>> On 13/01/17 21:48, Nagendra Kumar wrote:
>>> Hi Minh,
>>> Please check my response inlined with [Nagu].
>>>
>>> Thanks
>>> -Nagu
 -Original Message-
 From: minh chau [mailto:minh.c...@dektech.com.au]
 Sent: 13 January 2017 03:53
 To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
 gary@dektech.com.au
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
 sync before standby AMFD comes up [#2162]

 Hi Nagu,

 Thanks for reviewing, please see comments inline.

 Thanks,
 Minh

 On 12/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>Though I am not able to simulate the problem, I tested as below:
> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
> SU2 on
 PL-4 as Standby.
> 2. Stop SC1 and SC2 and then stop PL-3.
> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
> SC1. SC2
 becomes Act.
 [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
 only
 SU2 has active assignment
>>> [Nagu]: PL-3 is stopped in step #2.
> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
> Ideally, SU2 assignments should have been Act and there shouldn't be
> SU1
 assignment.
 [M]: This seems to be another test where SU1 and SU2 are hosted on
 SC2, then both SU1 and SU2 should get assignment
>>> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both
>> SU1 and SU2 assignments.
>>>   SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
>>> This is similar test case, which is mentioned in the ticket?
>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
 mo,safApp=AmfDemo1
>saAmfSISUHAState=ACTIVE(1)
>saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
 mo,safApp=AmfDemo1
>saAmfSISUHAState=STANDBY(2)
>saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> Please check.
>
> Thanks
> -Nagu
>
>> 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-23 Thread minh chau
Hi Nagu,

I am checking the logs now.

Thanks, Minh

On 23/01/17 17:47, Nagendra Kumar wrote:
> The logs (Logs-tc.rar) attached in the ticket.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 16 January 2017 05:47
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> I misunderstood your point, and now I get it.
>> In my test I see it works as expected - SU2 becomes Act and no assignment
>> for SU1 I guess in your test some how the cluster initiation timer has not
>> been started on SC2 (new active), there could be a missing case in the patch.
>> Could you please share me the trace?
>>
>> Thanks,
>> Minh
>>
>> On 13/01/17 21:48, Nagendra Kumar wrote:
>>> Hi Minh,
>>> Please check my response inlined with [Nagu].
>>>
>>> Thanks
>>> -Nagu
 -Original Message-
 From: minh chau [mailto:minh.c...@dektech.com.au]
 Sent: 13 January 2017 03:53
 To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
 gary@dektech.com.au
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
 sync before standby AMFD comes up [#2162]

 Hi Nagu,

 Thanks for reviewing, please see comments inline.

 Thanks,
 Minh

 On 12/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>Though I am not able to simulate the problem, I tested as below:
> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
> SU2 on
 PL-4 as Standby.
> 2. Stop SC1 and SC2 and then stop PL-3.
> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
> SC1. SC2
 becomes Act.
 [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
 only
 SU2 has active assignment
>>> [Nagu]: PL-3 is stopped in step #2.
> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
> Ideally, SU2 assignments should have been Act and there shouldn't be
> SU1
 assignment.
 [M]: This seems to be another test where SU1 and SU2 are hosted on
 SC2, then both SU1 and SU2 should get assignment
>>> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both
>> SU1 and SU2 assignments.
>>>   SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
>>> This is similar test case, which is mentioned in the ticket?
>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
 mo,safApp=AmfDemo1
>saAmfSISUHAState=ACTIVE(1)
>saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
 mo,safApp=AmfDemo1
>saAmfSISUHAState=STANDBY(2)
>saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> Please check.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>> Sent: 08 November 2016 08:53
>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
>> gary@dektech.com.au; minh.c...@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> osaf/services/saf/amf/amfnd/di.cc   |  7 +--
>> osaf/services/saf/amf/amfnd/susm.cc |  6 ++
>> 2 files changed, 11 insertions(+), 2 deletions(-)
>>
>>
>> This case of SC failover causes new active AMFD getting stuck in
>> node_up messages
>>
>> Say first active controller is SC1, which goes down during headless sync.
>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>> is_avd_down and amfd_sync_required are set to true. When SC2 takes
>> over active role, amfnd on SC2 receives mds_up, but only
>> is_avd_down is set to false and the variable amfd_sync_required
>> remains true.
>> When amfnd-SC2 finishes initiating middleware SU, it needs to send
>> su_oper message to AMFD, but it is failed to send out due to
 amfd_sync_required.
>> In this scenario of SC failover, amfd_sync_required needs to set to
>> false when amfnd on SC2 receives su_pres message on middleware
>> SUs.
>> That means amfnd on active controller does not need to wait for
>> set_leds message, to be informed that cluster initiation is done,
>> so that amfnd can sen su_oper messages to AMFD. This logic also
>> aligns with normal headless scenario, where amfnd on active
>> controller has amfd_sync_required initially marked as false because
>> no middleware SUs are initiated. When amfd_sync_required is true
>> that means amfnd all 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-22 Thread Nagendra Kumar
The logs (Logs-tc.rar) attached in the ticket.

Thanks
-Nagu

> -Original Message-
> From: minh chau [mailto:minh.c...@dektech.com.au]
> Sent: 16 January 2017 05:47
> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
> before standby AMFD comes up [#2162]
> 
> Hi Nagu,
> 
> I misunderstood your point, and now I get it.
> In my test I see it works as expected - SU2 becomes Act and no assignment
> for SU1 I guess in your test some how the cluster initiation timer has not
> been started on SC2 (new active), there could be a missing case in the patch.
> Could you please share me the trace?
> 
> Thanks,
> Minh
> 
> On 13/01/17 21:48, Nagendra Kumar wrote:
> > Hi Minh,
> > Please check my response inlined with [Nagu].
> >
> > Thanks
> > -Nagu
> >> -Original Message-
> >> From: minh chau [mailto:minh.c...@dektech.com.au]
> >> Sent: 13 January 2017 03:53
> >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
> >> gary@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> >> sync before standby AMFD comes up [#2162]
> >>
> >> Hi Nagu,
> >>
> >> Thanks for reviewing, please see comments inline.
> >>
> >> Thanks,
> >> Minh
> >>
> >> On 12/01/17 21:48, Nagendra Kumar wrote:
> >>> Hi Minh,
> >>>Though I am not able to simulate the problem, I tested as below:
> >>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
> >>> SU2 on
> >> PL-4 as Standby.
> >>> 2. Stop SC1 and SC2 and then stop PL-3.
> >>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
> >>> SC1. SC2
> >> becomes Act.
> >> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
> >> only
> >> SU2 has active assignment
> > [Nagu]: PL-3 is stopped in step #2.
> >>> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
> >>> Ideally, SU2 assignments should have been Act and there shouldn't be
> >>> SU1
> >> assignment.
> >> [M]: This seems to be another test where SU1 and SU2 are hosted on
> >> SC2, then both SU1 and SU2 should get assignment
> > [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both
> SU1 and SU2 assignments.
> >  SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
> > This is similar test case, which is mentioned in the ticket?
> >>>
> >>
> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
> >> mo,safApp=AmfDemo1
> >>>   saAmfSISUHAState=ACTIVE(1)
> >>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> >>>
> >>
> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
> >> mo,safApp=AmfDemo1
> >>>   saAmfSISUHAState=STANDBY(2)
> >>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> >>>
> >>> Please check.
> >>>
> >>> Thanks
> >>> -Nagu
> >>>
>  -Original Message-
>  From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>  Sent: 08 November 2016 08:53
>  To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
>  gary@dektech.com.au; minh.c...@dektech.com.au
>  Cc: opensaf-devel@lists.sourceforge.net
>  Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>  before standby AMFD comes up [#2162]
> 
> osaf/services/saf/amf/amfnd/di.cc   |  7 +--
> osaf/services/saf/amf/amfnd/susm.cc |  6 ++
> 2 files changed, 11 insertions(+), 2 deletions(-)
> 
> 
>  This case of SC failover causes new active AMFD getting stuck in
>  node_up messages
> 
>  Say first active controller is SC1, which goes down during headless sync.
>  Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>  is_avd_down and amfd_sync_required are set to true. When SC2 takes
>  over active role, amfnd on SC2 receives mds_up, but only
>  is_avd_down is set to false and the variable amfd_sync_required
> remains true.
>  When amfnd-SC2 finishes initiating middleware SU, it needs to send
>  su_oper message to AMFD, but it is failed to send out due to
> >> amfd_sync_required.
>  In this scenario of SC failover, amfd_sync_required needs to set to
>  false when amfnd on SC2 receives su_pres message on middleware
> SUs.
>  That means amfnd on active controller does not need to wait for
>  set_leds message, to be informed that cluster initiation is done,
>  so that amfnd can sen su_oper messages to AMFD. This logic also
>  aligns with normal headless scenario, where amfnd on active
>  controller has amfd_sync_required initially marked as false because
>  no middleware SUs are initiated. When amfd_sync_required is true
>  that means amfnd all middleware SUs are initiated and assigned
>  before headless, thus amfnd needs to wait for cluster initiation after

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-15 Thread minh chau
Hi Nagu,

I misunderstood your point, and now I get it.
In my test I see it works as expected - SU2 becomes Act and no 
assignment for SU1
I guess in your test some how the cluster initiation timer has not been 
started on SC2 (new active), there could be a missing case in the patch.
Could you please share me the trace?

Thanks,
Minh

On 13/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>   Please check my response inlined with [Nagu].
>
> Thanks
> -Nagu
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 13 January 2017 03:53
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> Thanks for reviewing, please see comments inline.
>>
>> Thanks,
>> Minh
>>
>> On 12/01/17 21:48, Nagendra Kumar wrote:
>>> Hi Minh,
>>>  Though I am not able to simulate the problem, I tested as below:
>>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on
>> PL-4 as Standby.
>>> 2. Stop SC1 and SC2 and then stop PL-3.
>>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2
>> becomes Act.
>> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then only
>> SU2 has active assignment
> [Nagu]: PL-3 is stopped in step #2.
>>> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
>>> Ideally, SU2 assignments should have been Act and there shouldn't be SU1
>> assignment.
>> [M]: This seems to be another test where SU1 and SU2 are hosted on SC2,
>> then both SU1 and SU2 should get assignment
> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both SU1 
> and SU2 assignments.
>  SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
> This is similar test case, which is mentioned in the ticket?
>>>
>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>> mo,safApp=AmfDemo1
>>>   saAmfSISUHAState=ACTIVE(1)
>>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>> mo,safApp=AmfDemo1
>>>   saAmfSISUHAState=STANDBY(2)
>>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>
>>> Please check.
>>>
>>> Thanks
>>> -Nagu
>>>
 -Original Message-
 From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
 Sent: 08 November 2016 08:53
 To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
 gary@dektech.com.au; minh.c...@dektech.com.au
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
 before standby AMFD comes up [#2162]

osaf/services/saf/amf/amfnd/di.cc   |  7 +--
osaf/services/saf/amf/amfnd/susm.cc |  6 ++
2 files changed, 11 insertions(+), 2 deletions(-)


 This case of SC failover causes new active AMFD getting stuck in
 node_up messages

 Say first active controller is SC1, which goes down during headless sync.
 Therefore, the amfnd on SC2 receives mds_down of AVD, then both
 is_avd_down and amfd_sync_required are set to true. When SC2 takes
 over active role, amfnd on SC2 receives mds_up, but only is_avd_down
 is set to false and the variable amfd_sync_required remains true.
 When amfnd-SC2 finishes initiating middleware SU, it needs to send
 su_oper message to AMFD, but it is failed to send out due to
>> amfd_sync_required.
 In this scenario of SC failover, amfd_sync_required needs to set to
 false when amfnd on SC2 receives su_pres message on middleware SUs.
 That means amfnd on active controller does not need to wait for
 set_leds message, to be informed that cluster initiation is done, so
 that amfnd can sen su_oper messages to AMFD. This logic also aligns
 with normal headless scenario, where amfnd on active controller has
 amfd_sync_required initially marked as false because no middleware
 SUs are initiated. When amfd_sync_required is true that means amfnd
 all middleware SUs are initiated and assigned before headless, thus
 amfnd needs to wait for cluster initiation after headless.

 diff --git a/osaf/services/saf/amf/amfnd/di.cc
 b/osaf/services/saf/amf/amfnd/di.cc
 --- a/osaf/services/saf/amf/amfnd/di.cc
 +++ b/osaf/services/saf/amf/amfnd/di.cc
 @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
if (avnd_diq_rec_add(cb, ) == nullptr) {
rc = NCSCC_RC_FAILURE;
}
 -  LOG_NO("avnd_di_oper_send() deferred as AMF director is
 offline");
 +  LOG_NO("avnd_di_oper_send() deferred as AMF director is
 offline(%d),"
 +  " or sync is required(%d)", 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-12 Thread minh chau
Hi Nagu,

Thanks for reviewing, please see comments inline.

Thanks,
Minh

On 12/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>Though I am not able to simulate the problem, I tested as below:
> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on 
> PL-4 as Standby.
> 2. Stop SC1 and SC2 and then stop PL-3.
> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2 
> becomes Act.
[M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then only 
SU2 has active assignment
>
> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
> Ideally, SU2 assignments should have been Act and there shouldn't be SU1 
> assignment.
[M]: This seems to be another test where SU1 and SU2 are hosted on SC2, 
then both SU1 and SU2 should get assignment
>
> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>  saAmfSISUHAState=ACTIVE(1)
>  saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>  saAmfSISUHAState=STANDBY(2)
>  saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> Please check.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>> Sent: 08 November 2016 08:53
>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
>> gary@dektech.com.au; minh.c...@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before
>> standby AMFD comes up [#2162]
>>
>>   osaf/services/saf/amf/amfnd/di.cc   |  7 +--
>>   osaf/services/saf/amf/amfnd/susm.cc |  6 ++
>>   2 files changed, 11 insertions(+), 2 deletions(-)
>>
>>
>> This case of SC failover causes new active AMFD getting stuck in node_up
>> messages
>>
>> Say first active controller is SC1, which goes down during headless sync.
>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>> is_avd_down and amfd_sync_required are set to true. When SC2 takes over
>> active role, amfnd on SC2 receives mds_up, but only is_avd_down is set to
>> false and the variable amfd_sync_required remains true.
>> When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper
>> message to AMFD, but it is failed to send out due to amfd_sync_required.
>>
>> In this scenario of SC failover, amfd_sync_required needs to set to false
>> when amfnd on SC2 receives su_pres message on middleware SUs. That
>> means amfnd on active controller does not need to wait for set_leds
>> message, to be informed that cluster initiation is done, so that amfnd can
>> sen su_oper messages to AMFD. This logic also aligns with normal headless
>> scenario, where amfnd on active controller has amfd_sync_required initially
>> marked as false because no middleware SUs are initiated. When
>> amfd_sync_required is true that means amfnd all middleware SUs are
>> initiated and assigned before headless, thus amfnd needs to wait for cluster
>> initiation after headless.
>>
>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
>> b/osaf/services/saf/amf/amfnd/di.cc
>> --- a/osaf/services/saf/amf/amfnd/di.cc
>> +++ b/osaf/services/saf/amf/amfnd/di.cc
>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>>  if (avnd_diq_rec_add(cb, ) == nullptr) {
>>  rc = NCSCC_RC_FAILURE;
>>  }
>> -LOG_NO("avnd_di_oper_send() deferred as AMF director is
>> offline");
>> +LOG_NO("avnd_di_oper_send() deferred as AMF director is
>> offline(%d),"
>> +" or sync is required(%d)", cb->is_avd_down,
>> +cb->amfd_sync_required);
>>  } else {
>>  // We are in normal cluster, send msg to director
>>  msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
>>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
>> avnd_di_susi_resp_send(AVND_CB
>>  rc = NCSCC_RC_FAILURE;
>>  }
>>  m_AVND_SU_ALL_SI_RESET(su);
>> -LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
>> offline");
>> +LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
>> offline(%d),"
>> +" or sync is required(%d)", cb->is_avd_down,
>> + cb->amfd_sync_required);
>> +
>>   } else {
>>  // We are in normal cluster, send msg to director
>>  msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb-
>>> snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
>> b/osaf/services/saf/amf/amfnd/susm.cc
>> --- a/osaf/services/saf/amf/amfnd/susm.cc
>> +++ b/osaf/services/saf/amf/amfnd/susm.cc
>> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
>>  goto done;
>>  }
>>  } else { /* => instantiate the su */
>> +// Do not need to wait for headless sync if there is no
>> application SUs
>> +  

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-12 Thread Nagendra Kumar
Hi Minh,
 Though I am not able to simulate the problem, I tested as below:
1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on PL-4 
as Standby.
2. Stop SC1 and SC2 and then stop PL-3.
3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2 
becomes Act.

In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments. 
Ideally, SU2 assignments should have been Act and there shouldn't be SU1 
assignment.

safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

Please check.

Thanks
-Nagu

> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: 08 November 2016 08:53
> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
> gary@dektech.com.au; minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before
> standby AMFD comes up [#2162]
> 
>  osaf/services/saf/amf/amfnd/di.cc   |  7 +--
>  osaf/services/saf/amf/amfnd/susm.cc |  6 ++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> 
> This case of SC failover causes new active AMFD getting stuck in node_up
> messages
> 
> Say first active controller is SC1, which goes down during headless sync.
> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
> is_avd_down and amfd_sync_required are set to true. When SC2 takes over
> active role, amfnd on SC2 receives mds_up, but only is_avd_down is set to
> false and the variable amfd_sync_required remains true.
> When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper
> message to AMFD, but it is failed to send out due to amfd_sync_required.
> 
> In this scenario of SC failover, amfd_sync_required needs to set to false
> when amfnd on SC2 receives su_pres message on middleware SUs. That
> means amfnd on active controller does not need to wait for set_leds
> message, to be informed that cluster initiation is done, so that amfnd can
> sen su_oper messages to AMFD. This logic also aligns with normal headless
> scenario, where amfnd on active controller has amfd_sync_required initially
> marked as false because no middleware SUs are initiated. When
> amfd_sync_required is true that means amfnd all middleware SUs are
> initiated and assigned before headless, thus amfnd needs to wait for cluster
> initiation after headless.
> 
> diff --git a/osaf/services/saf/amf/amfnd/di.cc
> b/osaf/services/saf/amf/amfnd/di.cc
> --- a/osaf/services/saf/amf/amfnd/di.cc
> +++ b/osaf/services/saf/amf/amfnd/di.cc
> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>   if (avnd_diq_rec_add(cb, ) == nullptr) {
>   rc = NCSCC_RC_FAILURE;
>   }
> - LOG_NO("avnd_di_oper_send() deferred as AMF director is
> offline");
> + LOG_NO("avnd_di_oper_send() deferred as AMF director is
> offline(%d),"
> + " or sync is required(%d)", cb->is_avd_down,
> +cb->amfd_sync_required);
>   } else {
>   // We are in normal cluster, send msg to director
>   msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
> >snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
> avnd_di_susi_resp_send(AVND_CB
>   rc = NCSCC_RC_FAILURE;
>   }
>   m_AVND_SU_ALL_SI_RESET(su);
> - LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
> offline");
> +LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
> offline(%d),"
> +" or sync is required(%d)", cb->is_avd_down,
> + cb->amfd_sync_required);
> +
>  } else {
>   // We are in normal cluster, send msg to director
>   msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb-
> >snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
> b/osaf/services/saf/amf/amfnd/susm.cc
> --- a/osaf/services/saf/amf/amfnd/susm.cc
> +++ b/osaf/services/saf/amf/amfnd/susm.cc
> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
>   goto done;
>   }
>   } else { /* => instantiate the su */
> + // Do not need to wait for headless sync if there is no
> application SUs
> + // initiated. This is known because here we are receiving
> su_pres message
> + // for NCS SUs
> + if (su->is_ncs == true)
> + cb->amfd_sync_required = false;
> +
>   AVND_EVT *evt_ir = 0;
>   TRACE("Sending to Imm thread.");
>   evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, 
> >su_name, 0, 0);


Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2016-11-13 Thread minh chau

Hi,

It's not problem from sender, the log of mail server showed that all 
messages had been delivered successfully. So I resend the patches with 
attachment, hope that you receive them this time.


Thanks,
Minh

On 08/11/16 19:43, Nagendra Kumar wrote:

Hi Minh,

I have resent to you the missing patches that I received, as I put myself in  
reception list. Anyway, I'm checking this problem.

We didn't get that also.
I think, you have said it before also in #1725 patch context, at that time 
also, we had not got.

Please correct the problem now so that we need not take the patches from source 
forge.
When needed, it is easier to search in mail box, going to source forge is not 
an easy way.

Please re-float the patches again after correcting the problem and let us know.

Thanks
-Nagu

-Original Message-
From: minh chau [mailto:minh.c...@dektech.com.au]
Sent: 08 November 2016 13:28
To: Nagendra Kumar; Praveen Malviya; hans.nordeb...@ericsson.com;
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
before standby AMFD comes up [#2162]

Hi Nagu, Praveen

Sorry if any inconvenience. All patches have reached sourceforge and those
can be seen in devel mailing list.
I have resent to you the missing patches that I received, as I put myself in
reception list. Anyway, I'm checking this problem.

Thanks,
Minh

On 08/11/16 18:26, Nagendra Kumar wrote:

Hi Minh,
This is common problem in your patches that only few are

received. Can you please correct it.

Thanks
-Nagu


-Original Message-
From: praveen malviya
Sent: 08 November 2016 12:50
To: Minh Hon Chau; hans.nordeb...@ericsson.com; Nagendra Kumar;
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
sync before standby AMFD comes up [#2162]

Hi Minh,

Patch 0 and 1 are not received. Please resend.


Thanks,
Praveen

On 08-Nov-16 8:53 AM, Minh Hon Chau wrote:

   osaf/services/saf/amf/amfnd/di.cc   |  7 +--
   osaf/services/saf/amf/amfnd/susm.cc |  6 ++
   2 files changed, 11 insertions(+), 2 deletions(-)


This case of SC failover causes new active AMFD getting stuck in
node_up messages

Say first active controller is SC1, which goes down during headless
sync. Therefore, the amfnd on SC2 receives mds_down of AVD, then
both is_avd_down and amfd_sync_required are set to true. When SC2
takes over active role, amfnd on SC2 receives mds_up, but only
is_avd_down is

set to false and the variable amfd_sync_required remains true.

When amfnd-SC2 finishes initiating middleware SU, it needs to send
su_oper message to AMFD, but it is failed to send out due to

amfd_sync_required.

In this scenario of SC failover, amfd_sync_required needs to set to
false when amfnd on SC2 receives su_pres message on middleware SUs.
That means amfnd on active controller does not need to wait for
set_leds message, to be informed that cluster initiation is done, so
that amfnd can sen su_oper messages to AMFD. This logic also aligns
with normal headless scenario, where amfnd on active controller has
amfd_sync_required initially marked as false because no middleware
SUs are initiated. When amfd_sync_required is true that means amfnd
all

middleware SUs are initiated and assigned before headless, thus amfnd
needs to wait for cluster initiation after headless.

diff --git a/osaf/services/saf/amf/amfnd/di.cc
b/osaf/services/saf/amf/amfnd/di.cc
--- a/osaf/services/saf/amf/amfnd/di.cc
+++ b/osaf/services/saf/amf/amfnd/di.cc
@@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
if (avnd_diq_rec_add(cb, ) == nullptr) {
rc = NCSCC_RC_FAILURE;
}
-   LOG_NO("avnd_di_oper_send() deferred as AMF director is

offline");

+   LOG_NO("avnd_di_oper_send() deferred as AMF director is

offline(%d),"

+   " or sync is required(%d)", cb->is_avd_down,
+cb->amfd_sync_required);
} else {
// We are in normal cluster, send msg to director
msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
avnd_di_susi_resp_send(AVND_CB
rc = NCSCC_RC_FAILURE;
}
m_AVND_SU_ALL_SI_RESET(su);
-   LOG_NO("avnd_di_susi_resp_send() deferred as AMF

director is

offline");

+LOG_NO("avnd_di_susi_resp_send() deferred as AMF
+ director is

offline(%d),"

+" or sync is required(%d)",
+ cb->is_avd_down,
+ cb->amfd_sync_required);
+
   } else {
// We are in normal cluster, send msg to director
msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
++(cb->snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
b/osaf/services/saf/amf/amfnd/susm.cc
--- a/osaf/services/saf/amf/amfnd/susm.cc
+++ 

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2016-11-08 Thread Nagendra Kumar
Hi Minh,
> I have resent to you the missing patches that I received, as I put myself in  
> reception list. Anyway, I'm checking this problem.
We didn't get that also.
I think, you have said it before also in #1725 patch context, at that time 
also, we had not got.

Please correct the problem now so that we need not take the patches from source 
forge.
When needed, it is easier to search in mail box, going to source forge is not 
an easy way.

Please re-float the patches again after correcting the problem and let us know.

Thanks
-Nagu
> -Original Message-
> From: minh chau [mailto:minh.c...@dektech.com.au]
> Sent: 08 November 2016 13:28
> To: Nagendra Kumar; Praveen Malviya; hans.nordeb...@ericsson.com;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
> before standby AMFD comes up [#2162]
> 
> Hi Nagu, Praveen
> 
> Sorry if any inconvenience. All patches have reached sourceforge and those
> can be seen in devel mailing list.
> I have resent to you the missing patches that I received, as I put myself in
> reception list. Anyway, I'm checking this problem.
> 
> Thanks,
> Minh
> 
> On 08/11/16 18:26, Nagendra Kumar wrote:
> > Hi Minh,
> > This is common problem in your patches that only few are
> received. Can you please correct it.
> >
> > Thanks
> > -Nagu
> >
> >> -Original Message-
> >> From: praveen malviya
> >> Sent: 08 November 2016 12:50
> >> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Nagendra Kumar;
> >> gary@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
> >> sync before standby AMFD comes up [#2162]
> >>
> >> Hi Minh,
> >>
> >> Patch 0 and 1 are not received. Please resend.
> >>
> >>
> >> Thanks,
> >> Praveen
> >>
> >> On 08-Nov-16 8:53 AM, Minh Hon Chau wrote:
> >>>   osaf/services/saf/amf/amfnd/di.cc   |  7 +--
> >>>   osaf/services/saf/amf/amfnd/susm.cc |  6 ++
> >>>   2 files changed, 11 insertions(+), 2 deletions(-)
> >>>
> >>>
> >>> This case of SC failover causes new active AMFD getting stuck in
> >>> node_up messages
> >>>
> >>> Say first active controller is SC1, which goes down during headless
> >>> sync. Therefore, the amfnd on SC2 receives mds_down of AVD, then
> >>> both is_avd_down and amfd_sync_required are set to true. When SC2
> >>> takes over active role, amfnd on SC2 receives mds_up, but only
> >>> is_avd_down is
> >> set to false and the variable amfd_sync_required remains true.
> >>> When amfnd-SC2 finishes initiating middleware SU, it needs to send
> >>> su_oper message to AMFD, but it is failed to send out due to
> >> amfd_sync_required.
> >>> In this scenario of SC failover, amfd_sync_required needs to set to
> >>> false when amfnd on SC2 receives su_pres message on middleware SUs.
> >>> That means amfnd on active controller does not need to wait for
> >>> set_leds message, to be informed that cluster initiation is done, so
> >>> that amfnd can sen su_oper messages to AMFD. This logic also aligns
> >>> with normal headless scenario, where amfnd on active controller has
> >>> amfd_sync_required initially marked as false because no middleware
> >>> SUs are initiated. When amfd_sync_required is true that means amfnd
> >>> all
> >> middleware SUs are initiated and assigned before headless, thus amfnd
> >> needs to wait for cluster initiation after headless.
> >>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
> >>> b/osaf/services/saf/amf/amfnd/di.cc
> >>> --- a/osaf/services/saf/amf/amfnd/di.cc
> >>> +++ b/osaf/services/saf/amf/amfnd/di.cc
> >>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
> >>>   if (avnd_diq_rec_add(cb, ) == nullptr) {
> >>>   rc = NCSCC_RC_FAILURE;
> >>>   }
> >>> - LOG_NO("avnd_di_oper_send() deferred as AMF director is
> >> offline");
> >>> + LOG_NO("avnd_di_oper_send() deferred as AMF director is
> >> offline(%d),"
> >>> + " or sync is required(%d)", cb->is_avd_down,
> >>> +cb->amfd_sync_required);
> >>>   } else {
> >>>   // We are in normal cluster, send msg to director
> >>>   msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
> >>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
> >>> avnd_di_susi_resp_send(AVND_CB
> >>>   rc = NCSCC_RC_FAILURE;
> >>>   }
> >>>   m_AVND_SU_ALL_SI_RESET(su);
> >>> - LOG_NO("avnd_di_susi_resp_send() deferred as AMF
> director is
> >> offline");
> >>> +LOG_NO("avnd_di_susi_resp_send() deferred as AMF
> >>> + director is
> >> offline(%d),"
> >>> +" or sync is required(%d)",
> >>> + cb->is_avd_down,
> >>> + cb->amfd_sync_required);
> >>> +
> >>>   } else {
> >>>   // We are in normal cluster, send msg to director
> >>>   

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2016-11-07 Thread minh chau
Hi Nagu, Praveen

Sorry if any inconvenience. All patches have reached sourceforge and 
those can be seen in devel mailing list.
I have resent to you the missing patches that I received, as I put 
myself in reception list. Anyway, I'm checking this problem.

Thanks,
Minh

On 08/11/16 18:26, Nagendra Kumar wrote:
> Hi Minh,
>   This is common problem in your patches that only few are 
> received. Can you please correct it.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: praveen malviya
>> Sent: 08 November 2016 12:50
>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Nagendra Kumar;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Minh,
>>
>> Patch 0 and 1 are not received. Please resend.
>>
>>
>> Thanks,
>> Praveen
>>
>> On 08-Nov-16 8:53 AM, Minh Hon Chau wrote:
>>>   osaf/services/saf/amf/amfnd/di.cc   |  7 +--
>>>   osaf/services/saf/amf/amfnd/susm.cc |  6 ++
>>>   2 files changed, 11 insertions(+), 2 deletions(-)
>>>
>>>
>>> This case of SC failover causes new active AMFD getting stuck in
>>> node_up messages
>>>
>>> Say first active controller is SC1, which goes down during headless
>>> sync. Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>>> is_avd_down and amfd_sync_required are set to true. When SC2 takes
>>> over active role, amfnd on SC2 receives mds_up, but only is_avd_down is
>> set to false and the variable amfd_sync_required remains true.
>>> When amfnd-SC2 finishes initiating middleware SU, it needs to send
>>> su_oper message to AMFD, but it is failed to send out due to
>> amfd_sync_required.
>>> In this scenario of SC failover, amfd_sync_required needs to set to
>>> false when amfnd on SC2 receives su_pres message on middleware SUs.
>>> That means amfnd on active controller does not need to wait for
>>> set_leds message, to be informed that cluster initiation is done, so
>>> that amfnd can sen su_oper messages to AMFD. This logic also aligns
>>> with normal headless scenario, where amfnd on active controller has
>>> amfd_sync_required initially marked as false because no middleware SUs
>>> are initiated. When amfd_sync_required is true that means amfnd all
>> middleware SUs are initiated and assigned before headless, thus amfnd
>> needs to wait for cluster initiation after headless.
>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
>>> b/osaf/services/saf/amf/amfnd/di.cc
>>> --- a/osaf/services/saf/amf/amfnd/di.cc
>>> +++ b/osaf/services/saf/amf/amfnd/di.cc
>>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>>> if (avnd_diq_rec_add(cb, ) == nullptr) {
>>> rc = NCSCC_RC_FAILURE;
>>> }
>>> -   LOG_NO("avnd_di_oper_send() deferred as AMF director is
>> offline");
>>> +   LOG_NO("avnd_di_oper_send() deferred as AMF director is
>> offline(%d),"
>>> +   " or sync is required(%d)", cb->is_avd_down,
>>> +cb->amfd_sync_required);
>>> } else {
>>> // We are in normal cluster, send msg to director
>>> msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
>>> snd_msg_id);
>>> @@ -881,7 +882,9 @@ uint32_t avnd_di_susi_resp_send(AVND_CB
>>> rc = NCSCC_RC_FAILURE;
>>> }
>>> m_AVND_SU_ALL_SI_RESET(su);
>>> -   LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
>> offline");
>>> +LOG_NO("avnd_di_susi_resp_send() deferred as AMF director 
>>> is
>> offline(%d),"
>>> +" or sync is required(%d)", cb->is_avd_down,
>>> + cb->amfd_sync_required);
>>> +
>>>   } else {
>>> // We are in normal cluster, send msg to director
>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
>>> ++(cb->snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
>>> b/osaf/services/saf/amf/amfnd/susm.cc
>>> --- a/osaf/services/saf/amf/amfnd/susm.cc
>>> +++ b/osaf/services/saf/amf/amfnd/susm.cc
>>> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
>>> goto done;
>>> }
>>> } else { /* => instantiate the su */
>>> +   // Do not need to wait for headless sync if there is no
>> application SUs
>>> +   // initiated. This is known because here we are receiving
>> su_pres message
>>> +   // for NCS SUs
>>> +   if (su->is_ncs == true)
>>> +   cb->amfd_sync_required = false;
>>> +
>>> AVND_EVT *evt_ir = 0;
>>> TRACE("Sending to Imm thread.");
>>> evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr,
>>> >su_name, 0, 0);
>>>


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.

Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2016-11-07 Thread Nagendra Kumar
Hi Minh,
This is common problem in your patches that only few are 
received. Can you please correct it.

Thanks
-Nagu

> -Original Message-
> From: praveen malviya
> Sent: 08 November 2016 12:50
> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Nagendra Kumar;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
> before standby AMFD comes up [#2162]
> 
> Hi Minh,
> 
> Patch 0 and 1 are not received. Please resend.
> 
> 
> Thanks,
> Praveen
> 
> On 08-Nov-16 8:53 AM, Minh Hon Chau wrote:
> >  osaf/services/saf/amf/amfnd/di.cc   |  7 +--
> >  osaf/services/saf/amf/amfnd/susm.cc |  6 ++
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> >
> >
> > This case of SC failover causes new active AMFD getting stuck in
> > node_up messages
> >
> > Say first active controller is SC1, which goes down during headless
> > sync. Therefore, the amfnd on SC2 receives mds_down of AVD, then both
> > is_avd_down and amfd_sync_required are set to true. When SC2 takes
> > over active role, amfnd on SC2 receives mds_up, but only is_avd_down is
> set to false and the variable amfd_sync_required remains true.
> > When amfnd-SC2 finishes initiating middleware SU, it needs to send
> > su_oper message to AMFD, but it is failed to send out due to
> amfd_sync_required.
> >
> > In this scenario of SC failover, amfd_sync_required needs to set to
> > false when amfnd on SC2 receives su_pres message on middleware SUs.
> > That means amfnd on active controller does not need to wait for
> > set_leds message, to be informed that cluster initiation is done, so
> > that amfnd can sen su_oper messages to AMFD. This logic also aligns
> > with normal headless scenario, where amfnd on active controller has
> > amfd_sync_required initially marked as false because no middleware SUs
> > are initiated. When amfd_sync_required is true that means amfnd all
> middleware SUs are initiated and assigned before headless, thus amfnd
> needs to wait for cluster initiation after headless.
> >
> > diff --git a/osaf/services/saf/amf/amfnd/di.cc
> > b/osaf/services/saf/amf/amfnd/di.cc
> > --- a/osaf/services/saf/amf/amfnd/di.cc
> > +++ b/osaf/services/saf/amf/amfnd/di.cc
> > @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
> > if (avnd_diq_rec_add(cb, ) == nullptr) {
> > rc = NCSCC_RC_FAILURE;
> > }
> > -   LOG_NO("avnd_di_oper_send() deferred as AMF director is
> offline");
> > +   LOG_NO("avnd_di_oper_send() deferred as AMF director is
> offline(%d),"
> > +   " or sync is required(%d)", cb->is_avd_down,
> > +cb->amfd_sync_required);
> > } else {
> > // We are in normal cluster, send msg to director
> > msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
> >snd_msg_id);
> > @@ -881,7 +882,9 @@ uint32_t avnd_di_susi_resp_send(AVND_CB
> > rc = NCSCC_RC_FAILURE;
> > }
> > m_AVND_SU_ALL_SI_RESET(su);
> > -   LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
> offline");
> > +LOG_NO("avnd_di_susi_resp_send() deferred as AMF director 
> > is
> offline(%d),"
> > +" or sync is required(%d)", cb->is_avd_down,
> > + cb->amfd_sync_required);
> > +
> >  } else {
> > // We are in normal cluster, send msg to director
> > msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
> > ++(cb->snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
> > b/osaf/services/saf/amf/amfnd/susm.cc
> > --- a/osaf/services/saf/amf/amfnd/susm.cc
> > +++ b/osaf/services/saf/amf/amfnd/susm.cc
> > @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
> > goto done;
> > }
> > } else { /* => instantiate the su */
> > +   // Do not need to wait for headless sync if there is no
> application SUs
> > +   // initiated. This is known because here we are receiving
> su_pres message
> > +   // for NCS SUs
> > +   if (su->is_ncs == true)
> > +   cb->amfd_sync_required = false;
> > +
> > AVND_EVT *evt_ir = 0;
> > TRACE("Sending to Imm thread.");
> > evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr,
> > >su_name, 0, 0);
> >

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2016-11-07 Thread praveen malviya
Hi Minh,

Patch 0 and 1 are not received. Please resend.


Thanks,
Praveen

On 08-Nov-16 8:53 AM, Minh Hon Chau wrote:
>  osaf/services/saf/amf/amfnd/di.cc   |  7 +--
>  osaf/services/saf/amf/amfnd/susm.cc |  6 ++
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
>
> This case of SC failover causes new active AMFD getting stuck in node_up 
> messages
>
> Say first active controller is SC1, which goes down during headless sync. 
> Therefore,
> the amfnd on SC2 receives mds_down of AVD, then both is_avd_down and 
> amfd_sync_required
> are set to true. When SC2 takes over active role, amfnd on SC2 receives 
> mds_up, but
> only is_avd_down is set to false and the variable amfd_sync_required remains 
> true.
> When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper 
> message
> to AMFD, but it is failed to send out due to amfd_sync_required.
>
> In this scenario of SC failover, amfd_sync_required needs to set to false 
> when amfnd
> on SC2 receives su_pres message on middleware SUs. That means amfnd on active 
> controller
> does not need to wait for set_leds message, to be informed that cluster 
> initiation is
> done, so that amfnd can sen su_oper messages to AMFD. This logic also aligns 
> with normal
> headless scenario, where amfnd on active controller has amfd_sync_required 
> initially
> marked as false because no middleware SUs are initiated. When 
> amfd_sync_required is true
> that means amfnd all middleware SUs are initiated and assigned before 
> headless, thus
> amfnd needs to wait for cluster initiation after headless.
>
> diff --git a/osaf/services/saf/amf/amfnd/di.cc 
> b/osaf/services/saf/amf/amfnd/di.cc
> --- a/osaf/services/saf/amf/amfnd/di.cc
> +++ b/osaf/services/saf/amf/amfnd/di.cc
> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>   if (avnd_diq_rec_add(cb, ) == nullptr) {
>   rc = NCSCC_RC_FAILURE;
>   }
> - LOG_NO("avnd_di_oper_send() deferred as AMF director is 
> offline");
> + LOG_NO("avnd_di_oper_send() deferred as AMF director is 
> offline(%d),"
> + " or sync is required(%d)", cb->is_avd_down, 
> cb->amfd_sync_required);
>   } else {
>   // We are in normal cluster, send msg to director
>   msg.info.avd->msg_info.n2d_opr_state.msg_id = 
> ++(cb->snd_msg_id);
> @@ -881,7 +882,9 @@ uint32_t avnd_di_susi_resp_send(AVND_CB
>   rc = NCSCC_RC_FAILURE;
>   }
>   m_AVND_SU_ALL_SI_RESET(su);
> - LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is 
> offline");
> +LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is 
> offline(%d),"
> +" or sync is required(%d)", cb->is_avd_down, 
> cb->amfd_sync_required);
> +
>  } else {
>   // We are in normal cluster, send msg to director
>   msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 
> ++(cb->snd_msg_id);
> diff --git a/osaf/services/saf/amf/amfnd/susm.cc 
> b/osaf/services/saf/amf/amfnd/susm.cc
> --- a/osaf/services/saf/amf/amfnd/susm.cc
> +++ b/osaf/services/saf/amf/amfnd/susm.cc
> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
>   goto done;
>   }
>   } else { /* => instantiate the su */
> + // Do not need to wait for headless sync if there is no 
> application SUs
> + // initiated. This is known because here we are receiving 
> su_pres message
> + // for NCS SUs
> + if (su->is_ncs == true)
> + cb->amfd_sync_required = false;
> +
>   AVND_EVT *evt_ir = 0;
>   TRACE("Sending to Imm thread.");
>   evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, 
> >su_name, 0, 0);
>

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2016-11-07 Thread Minh Hon Chau
 osaf/services/saf/amf/amfnd/di.cc   |  7 +--
 osaf/services/saf/amf/amfnd/susm.cc |  6 ++
 2 files changed, 11 insertions(+), 2 deletions(-)


This case of SC failover causes new active AMFD getting stuck in node_up 
messages

Say first active controller is SC1, which goes down during headless sync. 
Therefore,
the amfnd on SC2 receives mds_down of AVD, then both is_avd_down and 
amfd_sync_required
are set to true. When SC2 takes over active role, amfnd on SC2 receives mds_up, 
but
only is_avd_down is set to false and the variable amfd_sync_required remains 
true.
When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper 
message
to AMFD, but it is failed to send out due to amfd_sync_required.

In this scenario of SC failover, amfd_sync_required needs to set to false when 
amfnd
on SC2 receives su_pres message on middleware SUs. That means amfnd on active 
controller
does not need to wait for set_leds message, to be informed that cluster 
initiation is
done, so that amfnd can sen su_oper messages to AMFD. This logic also aligns 
with normal
headless scenario, where amfnd on active controller has amfd_sync_required 
initially
marked as false because no middleware SUs are initiated. When 
amfd_sync_required is true
that means amfnd all middleware SUs are initiated and assigned before headless, 
thus
amfnd needs to wait for cluster initiation after headless.

diff --git a/osaf/services/saf/amf/amfnd/di.cc 
b/osaf/services/saf/amf/amfnd/di.cc
--- a/osaf/services/saf/amf/amfnd/di.cc
+++ b/osaf/services/saf/amf/amfnd/di.cc
@@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb, 
if (avnd_diq_rec_add(cb, ) == nullptr) {
rc = NCSCC_RC_FAILURE;
}
-   LOG_NO("avnd_di_oper_send() deferred as AMF director is 
offline");
+   LOG_NO("avnd_di_oper_send() deferred as AMF director is 
offline(%d),"
+   " or sync is required(%d)", cb->is_avd_down, 
cb->amfd_sync_required);
} else {
// We are in normal cluster, send msg to director
msg.info.avd->msg_info.n2d_opr_state.msg_id = 
++(cb->snd_msg_id);
@@ -881,7 +882,9 @@ uint32_t avnd_di_susi_resp_send(AVND_CB 
rc = NCSCC_RC_FAILURE;
}
m_AVND_SU_ALL_SI_RESET(su);
-   LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is 
offline");
+LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is 
offline(%d)," 
+" or sync is required(%d)", cb->is_avd_down, 
cb->amfd_sync_required);
+
 } else {
// We are in normal cluster, send msg to director
msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 
++(cb->snd_msg_id);
diff --git a/osaf/services/saf/amf/amfnd/susm.cc 
b/osaf/services/saf/amf/amfnd/susm.cc
--- a/osaf/services/saf/amf/amfnd/susm.cc
+++ b/osaf/services/saf/amf/amfnd/susm.cc
@@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
goto done;
}
} else { /* => instantiate the su */
+   // Do not need to wait for headless sync if there is no 
application SUs
+   // initiated. This is known because here we are receiving 
su_pres message
+   // for NCS SUs
+   if (su->is_ncs == true)
+   cb->amfd_sync_required = false;
+
AVND_EVT *evt_ir = 0;
TRACE("Sending to Imm thread.");
evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, 
>su_name, 0, 0);

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel