Re: [devel] [PATCH 1 of 1] smfd: fix race condition when detecting async failures [#2413]

2017-04-07 Thread Neelakanta Reddy
Hi Alex,

got it.
Ack from me.

Thanks,
Neel.

On 2017/04/08 03:25 AM, Alex Jones wrote:
>
> Hi Neel,
>
>   Replies to your comments:
>
> (1)I don’t see this. With this patch can you get the system into a 
> state where the suMaintenanceCampaign is set in AMF, the component 
> crashes right before the SMF commit clears the suMaintenanceCampaign, 
> and thecmgnError is not set? If so, that’s a bug that we should 
> address. The SMF-AIS spec states that the suMaintenanceCampaign 
> attribute is cleared when the campaign is committed. Therefore it is 
> entirely possible that an SU could become disabled right before the 
> attribute is cleared, and the campaign still transitions to COMMITTED. 
> But, cmpnError should still be set in this case. So, as I read the 
> spec the operator needs to check this after committing, and may need 
> to repair an SU after committing a campaign, if this corner case occurs.
>
> (2)Well, the SMF-AIS spec says that asyncFailures are considered in 
> EVERY state while the suMaintenanceCampaign attribute is set. A 
> suspend is only done in the states you mention, but cmpnError should 
> be set in all states. This is already handled by the campaign state 
> machine. Look at SmfCampState.cc. The ::asyncFailure virtual function 
> only suspends the campaign in the states that you mention. If an 
> asyncFailure occurs in any other state, the base class virtual 
> function is called which does nothing.
>
> Alex
>
> *From:*Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
> *Sent:* Friday, April 7, 2017 5:05 AM
> *To:* Alex Jones ; lennart.l...@ericsson.com; 
> rafael.odza...@ericsson.com
> *Cc:* opensaf-devel@lists.sourceforge.net
> *Subject:* Re: [PATCH 1 of 1] smfd: fix race condition when detecting 
> async failures [#2413]
>
> 
>
> NOTICE: This email was received from an EXTERNAL sender
>
> 
>
>
> Hi Alex,
>
> Reviewed and tested the patch.
> The patch avoids, the crash in smfd.
>
> I have the following Questions on the implementation(which was
> overlooked at the time of #2145 review):
>
> 1. If the component is killed when the campaign is in
> SA_SMF_CMPG_EXECUTION_COMPLETED state(as in the defect) and
> the amfnd does not re-start the component because,
> saAmfSUMaintenanceCampaign is set. The saSmfCmpgError indication is not
> given in SMF, because the campaign is in the process of committing.
>
> who does this justify Asynchronous Failures of AMF Entities?
>
> 2.
> according to SMF AIS, the async failures has to be considered Executing,
> Suspending, or RollingBack only.
> Then how do we prevent in other states? And the AMFND still will not
> re-start(correct) because of saAmfSUMaintenanceCampaign is set.
>
> Thanks,
> Neel.
>
> On 2017/04/06 11:56 PM, Alex Jones wrote:
> > src/smf/smfd/SmfCampaignThread.cc | 16 ++--
> > 1 files changed, 10 insertions(+), 6 deletions(-)
> >
> >
> > smfd core dumps during commit of campaign.
> >
> > If an AMF SU under maintenance fails right as the campaign commit is 
> done, there
> > is a race condition present. Before SMF clears the suMaintenaceCampaign
> > attribute of the SU, if the SU fails, it will send a notification. 
> Meanwhile,
> > the commit deletes upgrade campaign pointer inside smfd. If the 
> deletion of the
> > campaign pointer inside smfd occurs before it receives the NTF event 
> a crash
> > will occur because the campaign pointer is gone.
> >
> > Solution is to always process NTF events before processing the 
> termination of
> > the campaign. The campaign termination code sets "m_running" to 
> false, and
> > deletes the pointer. This should always be last in the poll loop so 
> that the
> > loop will exit immediately without processing any NTF events (or 
> other future
> > events.)
> >
> > diff --git a/src/smf/smfd/SmfCampaignThread.cc 
> b/src/smf/smfd/SmfCampaignThread.cc
> > --- a/src/smf/smfd/SmfCampaignThread.cc
> > +++ b/src/smf/smfd/SmfCampaignThread.cc
> > @@ -907,12 +907,10 @@ int SmfCampaignThread::handleEvents(void
> > break;
> > }
> >
> > - /* Process the Mail box events */
> > - if (fds[0].revents & POLLIN) {
> > - /* dispatch MBX events */
> > - processEvt();
> > - }
> > -
> > + /*
> > + * Handle NTF events first because processEvt may delete and 
> terminate the
> > + * campaign thread.
> > + */
> > if (fds[1].revents & POLLIN) {
> > // dispatch NTF events
> > rc = saNtfDispatch(m_ntfHandle, SA_DISPATCH_ALL);
> > @@ -922,6 +920,12 @@ int SmfCampaignThread::handleEvents(void
> > }
> > }
> >
> > + /* Process the Mail box events */
> > + if (fds[0].revents & POLLIN) {
> > + /* dispatch MBX events */
> > + processEvt();
> > + }
> > +
> > m_campaign->updateElapsedTime();
> > }
> > TRACE_LEAVE();
> >
>

--
Check out the vibrant tech community on one of 

Re: [devel] [PATCH 1 of 1] smfd: fix race condition when detecting async failures [#2413]

2017-04-07 Thread Lennart Lund
Hi Alex

Ack. Legacy tested

Thanks
Lennart

> -Original Message-
> From: Alex Jones [mailto:ajo...@genband.com]
> Sent: den 6 april 2017 20:26
> To: reddy.neelaka...@oracle.com; Lennart Lund
> ; Rafael Odzakow
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] smfd: fix race condition when detecting async failures
> [#2413]
> 
>  src/smf/smfd/SmfCampaignThread.cc |  16 ++--
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> 
> smfd core dumps during commit of campaign.
> 
> If an AMF SU under maintenance fails right as the campaign commit is done,
> there
> is a race condition present. Before SMF clears the suMaintenaceCampaign
> attribute of the SU, if the SU fails, it will send a notification. Meanwhile,
> the commit deletes upgrade campaign pointer inside smfd. If the deletion of
> the
> campaign pointer inside smfd occurs before it receives the NTF event a crash
> will occur because the campaign pointer is gone.
> 
> Solution is to always process NTF events before processing the termination
> of
> the campaign. The campaign termination code sets "m_running" to false, and
> deletes the pointer. This should always be last in the poll loop so that the
> loop will exit immediately without processing any NTF events (or other
> future
> events.)
> 
> diff --git a/src/smf/smfd/SmfCampaignThread.cc
> b/src/smf/smfd/SmfCampaignThread.cc
> --- a/src/smf/smfd/SmfCampaignThread.cc
> +++ b/src/smf/smfd/SmfCampaignThread.cc
> @@ -907,12 +907,10 @@ int SmfCampaignThread::handleEvents(void
>   break;
>   }
> 
> -/* Process the Mail box events */
> - if (fds[0].revents & POLLIN) {
> - /* dispatch MBX events */
> - processEvt();
> - }
> -
> +/*
> + * Handle NTF events first because processEvt may delete and terminate
> the
> + * campaign thread.
> + */
>   if (fds[1].revents & POLLIN) {
>   // dispatch NTF events
>   rc = saNtfDispatch(m_ntfHandle, SA_DISPATCH_ALL);
> @@ -922,6 +920,12 @@ int SmfCampaignThread::handleEvents(void
>   }
>   }
> 
> +/* Process the Mail box events */
> + if (fds[0].revents & POLLIN) {
> + /* dispatch MBX events */
> + processEvt();
> + }
> +
>   m_campaign->updateElapsedTime();
>   }
>   TRACE_LEAVE();


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] smfd: fix race condition when detecting async failures [#2413]

2017-04-07 Thread Rafael Odzakow
ACK


On 04/06/2017 08:26 PM, Alex Jones wrote:
>   src/smf/smfd/SmfCampaignThread.cc |  16 ++--
>   1 files changed, 10 insertions(+), 6 deletions(-)
>
>
> smfd core dumps during commit of campaign.
>
> If an AMF SU under maintenance fails right as the campaign commit is done, 
> there
> is a race condition present. Before SMF clears the suMaintenaceCampaign
> attribute of the SU, if the SU fails, it will send a notification. Meanwhile,
> the commit deletes upgrade campaign pointer inside smfd. If the deletion of 
> the
> campaign pointer inside smfd occurs before it receives the NTF event a crash
> will occur because the campaign pointer is gone.
>
> Solution is to always process NTF events before processing the termination of
> the campaign. The campaign termination code sets "m_running" to false, and
> deletes the pointer. This should always be last in the poll loop so that the
> loop will exit immediately without processing any NTF events (or other future
> events.)
>
> diff --git a/src/smf/smfd/SmfCampaignThread.cc 
> b/src/smf/smfd/SmfCampaignThread.cc
> --- a/src/smf/smfd/SmfCampaignThread.cc
> +++ b/src/smf/smfd/SmfCampaignThread.cc
> @@ -907,12 +907,10 @@ int SmfCampaignThread::handleEvents(void
>   break;
>   }
>   
> -/* Process the Mail box events */
> - if (fds[0].revents & POLLIN) {
> - /* dispatch MBX events */
> - processEvt();
> - }
> -
> +/*
> + * Handle NTF events first because processEvt may delete and terminate 
> the
> + * campaign thread.
> + */
>   if (fds[1].revents & POLLIN) {
>   // dispatch NTF events
>   rc = saNtfDispatch(m_ntfHandle, SA_DISPATCH_ALL);
> @@ -922,6 +920,12 @@ int SmfCampaignThread::handleEvents(void
>   }
>   }
>   
> +/* Process the Mail box events */
> + if (fds[0].revents & POLLIN) {
> + /* dispatch MBX events */
> + processEvt();
> + }
> +
>   m_campaign->updateElapsedTime();
>   }
>   TRACE_LEAVE();
>


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] smfd: fix race condition when detecting async failures [#2413]

2017-04-07 Thread Neelakanta Reddy
Hi Alex,

Reviewed and tested the patch.
The patch avoids, the crash in smfd.

I have the following Questions on the implementation(which was 
overlooked at the time of #2145 review):

1. If the component is killed when the campaign is in 
SA_SMF_CMPG_EXECUTION_COMPLETED state(as in the defect) and
the amfnd does not re-start the component because, 
saAmfSUMaintenanceCampaign  is set. The saSmfCmpgError indication is not
given in SMF, because the campaign is  in the process of committing.

who does this justify Asynchronous Failures of AMF Entities?

2.
according to SMF AIS, the async failures has to be considered Executing, 
Suspending, or RollingBack only.
Then how do we prevent in other states? And the AMFND still will not 
re-start(correct) because of saAmfSUMaintenanceCampaign  is set.

Thanks,
Neel.

On 2017/04/06 11:56 PM, Alex Jones wrote:
>   src/smf/smfd/SmfCampaignThread.cc |  16 ++--
>   1 files changed, 10 insertions(+), 6 deletions(-)
>
>
> smfd core dumps during commit of campaign.
>
> If an AMF SU under maintenance fails right as the campaign commit is done, 
> there
> is a race condition present. Before SMF clears the suMaintenaceCampaign
> attribute of the SU, if the SU fails, it will send a notification. Meanwhile,
> the commit deletes upgrade campaign pointer inside smfd. If the deletion of 
> the
> campaign pointer inside smfd occurs before it receives the NTF event a crash
> will occur because the campaign pointer is gone.
>
> Solution is to always process NTF events before processing the termination of
> the campaign. The campaign termination code sets "m_running" to false, and
> deletes the pointer. This should always be last in the poll loop so that the
> loop will exit immediately without processing any NTF events (or other future
> events.)
>
> diff --git a/src/smf/smfd/SmfCampaignThread.cc 
> b/src/smf/smfd/SmfCampaignThread.cc
> --- a/src/smf/smfd/SmfCampaignThread.cc
> +++ b/src/smf/smfd/SmfCampaignThread.cc
> @@ -907,12 +907,10 @@ int SmfCampaignThread::handleEvents(void
>   break;
>   }
>   
> -/* Process the Mail box events */
> - if (fds[0].revents & POLLIN) {
> - /* dispatch MBX events */
> - processEvt();
> - }
> -
> +/*
> + * Handle NTF events first because processEvt may delete and terminate 
> the
> + * campaign thread.
> + */
>   if (fds[1].revents & POLLIN) {
>   // dispatch NTF events
>   rc = saNtfDispatch(m_ntfHandle, SA_DISPATCH_ALL);
> @@ -922,6 +920,12 @@ int SmfCampaignThread::handleEvents(void
>   }
>   }
>   
> +/* Process the Mail box events */
> + if (fds[0].revents & POLLIN) {
> + /* dispatch MBX events */
> + processEvt();
> + }
> +
>   m_campaign->updateElapsedTime();
>   }
>   TRACE_LEAVE();
>


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] ckpt: Add option OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 for backwards compatibility [#2395]

2017-04-07 Thread A V Mahesh
Ok,  ACK.

-AVM

On 4/7/2017 1:05 PM, Anders Widell wrote:
> See reply below.
>
> thanks,
>
> Anders Widell
>
>
> On 04/07/2017 05:17 AM, A V Mahesh wrote:
>> Hi Anders Widell,
>>
>> Ack with following comments :
>>
>> On 4/6/2017 4:55 PM, Anders Widell wrote:
>>> The setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 results in increased 
>>> memory usage,
>>> whereas the setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 results in lower
>>> performance. Thus, there was no way to configure CKPT to avoid a 
>>> characteristics
>>> regression.
>> Change the commit comment to other way  :
>>
>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 results in increased memory usage
>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 results in lower performance
>>
>> Change the  following in /etc/opensaf/ckptnd.conf :
>>
>> -  Add comment `The Default configuration is 
>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 (Neither per-allocated nor check if 
>> memory is available)
>>and in this configuration  it is responsibility of application to 
>> make sure the required SHM if application is using  high memory usage. `
>>
>> -  Change  #OSAF_CKPT_SHM_ALLOC_GUARANTEE=2  (default) to 
>> #OSAF_CKPT_SHM_ALLOC_GUARANTEE=0
>
>
> Anders W> I would prefer to keep OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 to 
> show the default setting.
>
>>
>> -AVM
>>
>>
>> On 4/6/2017 4:55 PM, Anders Widell wrote:
>>>   src/ckpt/ckptnd/ckptnd.conf |  6 ++
>>>   src/ckpt/ckptnd/cpnd_cb.h   |  3 ++-
>>>   src/ckpt/ckptnd/cpnd_init.c |  3 ++-
>>>   src/ckpt/ckptnd/cpnd_proc.c |  9 +
>>>   src/ckpt/ckptnd/cpnd_res.c  |  9 +
>>>   5 files changed, 20 insertions(+), 10 deletions(-)
>>>
>>>
>>> The setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 results in increased 
>>> memory usage,
>>> whereas the setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 results in lower
>>> performance. Thus, there was no way to configure CKPT to avoid a 
>>> characteristics
>>> regression.
>>>
>>> Fix this by adding a third option: OSAF_CKPT_SHM_ALLOC_GUARANTEE=2. 
>>> This setting
>>> will be the default, and result in no degradation of performance or 
>>> memory
>>> usage.
>>>
>>> diff --git a/src/ckpt/ckptnd/ckptnd.conf b/src/ckpt/ckptnd/ckptnd.conf
>>> --- a/src/ckpt/ckptnd/ckptnd.conf
>>> +++ b/src/ckpt/ckptnd/ckptnd.conf
>>> @@ -7,5 +7,11 @@
>>>   # Healthcheck keys
>>>   export CPSV_ENV_HEALTHCHECK_KEY="Default"
>>>   +# Controls how shared memory is allocated:
>>> +#  0 - No pre-allocation, but check if memory is available before 
>>> writing
>>> +#  1 - Pre allocated
>>> +#  2 - Neither pre-allocated nor check if memory is available
>>> +#export OSAF_CKPT_SHM_ALLOC_GUARANTEE=2
>>> +
>>>   # Uncomment the next line to enable info level logging
>>>   #args="--loglevel=info"
>>> diff --git a/src/ckpt/ckptnd/cpnd_cb.h b/src/ckpt/ckptnd/cpnd_cb.h
>>> --- a/src/ckpt/ckptnd/cpnd_cb.h
>>> +++ b/src/ckpt/ckptnd/cpnd_cb.h
>>> @@ -1,6 +1,7 @@
>>>   /*  -*- OpenSAF  -*-
>>>*
>>>* (C) Copyright 2008 The OpenSAF Foundation
>>> + * Copyright Ericsson AB 2017 - All Rights Reserved.
>>>*
>>>* This program is distributed in the hope that it will be useful, 
>>> but
>>>* WITHOUT ANY WARRANTY; without even the implied warranty of 
>>> MERCHANTABILITY
>>> @@ -322,7 +323,7 @@ typedef struct cpnd_cb_tag {
>>>   NCS_QUEUE cpnd_cpd_deferred_reqs_list;/* Queue for storing 
>>> CPD timeout requests  */
>>> bool scAbsenceAllowed;
>>> -bool shm_alloc_guaranteed;
>>> +int shm_alloc_guaranteed;
>>> NCS_SEL_OBJ clm_updated_sel_obj; /* The CLM select object 
>>> updated event */
>>>   diff --git a/src/ckpt/ckptnd/cpnd_init.c 
>>> b/src/ckpt/ckptnd/cpnd_init.c
>>> --- a/src/ckpt/ckptnd/cpnd_init.c
>>> +++ b/src/ckpt/ckptnd/cpnd_init.c
>>> @@ -1,6 +1,7 @@
>>>   /*  -*- OpenSAF  -*-
>>>*
>>>* (C) Copyright 2008 The OpenSAF Foundation
>>> + * Copyright Ericsson AB 2017 - All Rights Reserved.
>>>*
>>>* This program is distributed in the hope that it will be useful, 
>>> but
>>>* WITHOUT ANY WARRANTY; without even the implied warranty of 
>>> MERCHANTABILITY
>>> @@ -223,7 +224,7 @@ static uint32_t cpnd_lib_init(CPND_CREAT
>>>   if ((ptr = getenv("OSAF_CKPT_SHM_ALLOC_GUARANTEE")) != NULL) {
>>>   cb->shm_alloc_guaranteed = atoi(ptr);
>>>   } else {
>>> -cb->shm_alloc_guaranteed = false;
>>> +cb->shm_alloc_guaranteed = 2;
>>>   }
>>> /* create a mail box */
>>> diff --git a/src/ckpt/ckptnd/cpnd_proc.c b/src/ckpt/ckptnd/cpnd_proc.c
>>> --- a/src/ckpt/ckptnd/cpnd_proc.c
>>> +++ b/src/ckpt/ckptnd/cpnd_proc.c
>>> @@ -1,6 +1,7 @@
>>>   /*  -*- OpenSAF  -*-
>>>*
>>>* (C) Copyright 2008 The OpenSAF Foundation
>>> + * Copyright Ericsson AB 2017 - All Rights Reserved.
>>>*
>>>* This program is distributed in the hope that it will be useful, 
>>> but
>>>* WITHOUT ANY WARRANTY; without even the implied warranty of 
>>> MERCHANTABILITY
>>> @@ -478,7 +479,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C
>>>   

Re: [devel] [PATCH 1 of 1] ckpt: Add option OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 for backwards compatibility [#2395]

2017-04-07 Thread Anders Widell
See reply below.

thanks,

Anders Widell


On 04/07/2017 05:17 AM, A V Mahesh wrote:
> Hi Anders Widell,
>
> Ack with following comments :
>
> On 4/6/2017 4:55 PM, Anders Widell wrote:
>> The setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 results in increased 
>> memory usage,
>> whereas the setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 results in lower
>> performance. Thus, there was no way to configure CKPT to avoid a 
>> characteristics
>> regression.
> Change the commit comment to other way  :
>
> OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 results in increased memory usage
> OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 results in lower performance
>
> Change the  following in /etc/opensaf/ckptnd.conf :
>
> -  Add comment `The Default configuration is 
> OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 (Neither per-allocated nor check if 
> memory is available)
>and in this configuration  it is responsibility of  application to 
> make sure the required SHM if application is using  high memory usage. `
>
> -  Change  #OSAF_CKPT_SHM_ALLOC_GUARANTEE=2  (default) to 
> #OSAF_CKPT_SHM_ALLOC_GUARANTEE=0


Anders W> I would prefer to keep OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 to show 
the default setting.

>
> -AVM
>
>
> On 4/6/2017 4:55 PM, Anders Widell wrote:
>>   src/ckpt/ckptnd/ckptnd.conf |  6 ++
>>   src/ckpt/ckptnd/cpnd_cb.h   |  3 ++-
>>   src/ckpt/ckptnd/cpnd_init.c |  3 ++-
>>   src/ckpt/ckptnd/cpnd_proc.c |  9 +
>>   src/ckpt/ckptnd/cpnd_res.c  |  9 +
>>   5 files changed, 20 insertions(+), 10 deletions(-)
>>
>>
>> The setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 results in increased 
>> memory usage,
>> whereas the setting OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 results in lower
>> performance. Thus, there was no way to configure CKPT to avoid a 
>> characteristics
>> regression.
>>
>> Fix this by adding a third option: OSAF_CKPT_SHM_ALLOC_GUARANTEE=2. 
>> This setting
>> will be the default, and result in no degradation of performance or 
>> memory
>> usage.
>>
>> diff --git a/src/ckpt/ckptnd/ckptnd.conf b/src/ckpt/ckptnd/ckptnd.conf
>> --- a/src/ckpt/ckptnd/ckptnd.conf
>> +++ b/src/ckpt/ckptnd/ckptnd.conf
>> @@ -7,5 +7,11 @@
>>   # Healthcheck keys
>>   export CPSV_ENV_HEALTHCHECK_KEY="Default"
>>   +# Controls how shared memory is allocated:
>> +#  0 - No pre-allocation, but check if memory is available before 
>> writing
>> +#  1 - Pre allocated
>> +#  2 - Neither pre-allocated nor check if memory is available
>> +#export OSAF_CKPT_SHM_ALLOC_GUARANTEE=2
>> +
>>   # Uncomment the next line to enable info level logging
>>   #args="--loglevel=info"
>> diff --git a/src/ckpt/ckptnd/cpnd_cb.h b/src/ckpt/ckptnd/cpnd_cb.h
>> --- a/src/ckpt/ckptnd/cpnd_cb.h
>> +++ b/src/ckpt/ckptnd/cpnd_cb.h
>> @@ -1,6 +1,7 @@
>>   /*  -*- OpenSAF  -*-
>>*
>>* (C) Copyright 2008 The OpenSAF Foundation
>> + * Copyright Ericsson AB 2017 - All Rights Reserved.
>>*
>>* This program is distributed in the hope that it will be useful, but
>>* WITHOUT ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY
>> @@ -322,7 +323,7 @@ typedef struct cpnd_cb_tag {
>>   NCS_QUEUE cpnd_cpd_deferred_reqs_list;/* Queue for storing 
>> CPD timeout requests  */
>> bool scAbsenceAllowed;
>> -bool shm_alloc_guaranteed;
>> +int shm_alloc_guaranteed;
>> NCS_SEL_OBJ clm_updated_sel_obj; /* The CLM select object 
>> updated event */
>>   diff --git a/src/ckpt/ckptnd/cpnd_init.c b/src/ckpt/ckptnd/cpnd_init.c
>> --- a/src/ckpt/ckptnd/cpnd_init.c
>> +++ b/src/ckpt/ckptnd/cpnd_init.c
>> @@ -1,6 +1,7 @@
>>   /*  -*- OpenSAF  -*-
>>*
>>* (C) Copyright 2008 The OpenSAF Foundation
>> + * Copyright Ericsson AB 2017 - All Rights Reserved.
>>*
>>* This program is distributed in the hope that it will be useful, but
>>* WITHOUT ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY
>> @@ -223,7 +224,7 @@ static uint32_t cpnd_lib_init(CPND_CREAT
>>   if ((ptr = getenv("OSAF_CKPT_SHM_ALLOC_GUARANTEE")) != NULL) {
>>   cb->shm_alloc_guaranteed = atoi(ptr);
>>   } else {
>> -cb->shm_alloc_guaranteed = false;
>> +cb->shm_alloc_guaranteed = 2;
>>   }
>> /* create a mail box */
>> diff --git a/src/ckpt/ckptnd/cpnd_proc.c b/src/ckpt/ckptnd/cpnd_proc.c
>> --- a/src/ckpt/ckptnd/cpnd_proc.c
>> +++ b/src/ckpt/ckptnd/cpnd_proc.c
>> @@ -1,6 +1,7 @@
>>   /*  -*- OpenSAF  -*-
>>*
>>* (C) Copyright 2008 The OpenSAF Foundation
>> + * Copyright Ericsson AB 2017 - All Rights Reserved.
>>*
>>* This program is distributed in the hope that it will be useful, but
>>* WITHOUT ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY
>> @@ -478,7 +479,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C
>>   cp_node->replica_info.open.info.open.i_size =
>>   sizeof(CPSV_CKPT_HDR) + cp_node->create_attrib.maxSections 
>> * (sizeof(CPSV_SECT_HDR) +
>> cp_node->create_attrib.maxSectionSize);
>> -cp_node->replica_info.open.ensures_space 

Re: [devel] [PATCH 1 of 1] smfd: fix race condition when detecting async failures [#2413]

2017-04-07 Thread Lennart Lund
Hi Alex

No comments on code.
I am running tests. Please do not push until test is done. I will come back 
with result

Thanks
Lennart

> -Original Message-
> From: Alex Jones [mailto:ajo...@genband.com]
> Sent: den 6 april 2017 20:26
> To: reddy.neelaka...@oracle.com; Lennart Lund
> ; Rafael Odzakow
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] smfd: fix race condition when detecting async failures
> [#2413]
> 
>  src/smf/smfd/SmfCampaignThread.cc |  16 ++--
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> 
> smfd core dumps during commit of campaign.
> 
> If an AMF SU under maintenance fails right as the campaign commit is done,
> there
> is a race condition present. Before SMF clears the suMaintenaceCampaign
> attribute of the SU, if the SU fails, it will send a notification. Meanwhile,
> the commit deletes upgrade campaign pointer inside smfd. If the deletion of
> the
> campaign pointer inside smfd occurs before it receives the NTF event a crash
> will occur because the campaign pointer is gone.
> 
> Solution is to always process NTF events before processing the termination
> of
> the campaign. The campaign termination code sets "m_running" to false, and
> deletes the pointer. This should always be last in the poll loop so that the
> loop will exit immediately without processing any NTF events (or other
> future
> events.)
> 
> diff --git a/src/smf/smfd/SmfCampaignThread.cc
> b/src/smf/smfd/SmfCampaignThread.cc
> --- a/src/smf/smfd/SmfCampaignThread.cc
> +++ b/src/smf/smfd/SmfCampaignThread.cc
> @@ -907,12 +907,10 @@ int SmfCampaignThread::handleEvents(void
>   break;
>   }
> 
> -/* Process the Mail box events */
> - if (fds[0].revents & POLLIN) {
> - /* dispatch MBX events */
> - processEvt();
> - }
> -
> +/*
> + * Handle NTF events first because processEvt may delete and terminate
> the
> + * campaign thread.
> + */
>   if (fds[1].revents & POLLIN) {
>   // dispatch NTF events
>   rc = saNtfDispatch(m_ntfHandle, SA_DISPATCH_ALL);
> @@ -922,6 +920,12 @@ int SmfCampaignThread::handleEvents(void
>   }
>   }
> 
> +/* Process the Mail box events */
> + if (fds[0].revents & POLLIN) {
> + /* dispatch MBX events */
> + processEvt();
> + }
> +
>   m_campaign->updateElapsedTime();
>   }
>   TRACE_LEAVE();


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel