Re: [devel] [PATCH 1/1] ntf: restart ntfimcnd if it fails to get operation invoke name [#3178]

Thang Duc Nguyen Tue, 21 Apr 2020 03:51:41 -0700

I updated solution and sent out V2.

-----Original Message-----
From: Minh Hon Chau <minh.c...@dektech.com.au> 
Sent: Tuesday, April 21, 2020 2:23 PM
To: Thuan Tran <thuan.t...@dektech.com.au>; Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get operation 
invoke name [#3178]


Agree.

On 21/4/20 12:24 pm, Thuan Tran wrote:
> Hi,
>
> If there is no way to get admin owner or object implementer in middle of one 
> CCB many operations.
> Then a "unknown" invoker is better than keep restarting by each operation of 
> that CCB.
>
> Best Regards,
> ThuanTr
>
> -----Original Message-----
> From: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>
> Sent: Tuesday, April 21, 2020 8:39 AM
> To: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>; Minh Hon Chau 
> <minh.c...@dektech.com.au>; Thuan Tran <thuan.t...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get 
> operation invoke name [#3178]
>
> Update.
>
> If we accept to avoid coredump, there is @operation_invoke_name that needs to 
> be freed before exit?
> [Thang]: as above can fill invoke_name as unknown in this case to avoid the 
> coredump.
> And free in applyccbcb.
>
> -----Original Message-----
> From: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>
> Sent: Tuesday, April 21, 2020 8:29 AM
> To: Minh Hon Chau <minh.c...@dektech.com.au>; Thuan Tran 
> <thuan.t...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 1/1] ntf: restart ntfimcnd if it fails to 
> get operation invoke name [#3178]
>
> Hi Minh,
> See my command inline.
>
> -----Original Message-----
> From: Minh Hon Chau <minh.c...@dektech.com.au>
> Sent: Monday, April 20, 2020 5:24 PM
> To: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>; Thuan Tran 
> <thuan.t...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get 
> operation invoke name [#3178]
>
> Hi Thang,
>
> I understand the invoke_name is only present in the first callback, thus 
> ntfimcn must memorize it in the userdata. My question is, is it ok that this 
> userdata being lost because ntfimcn restart? I think it is, since the ccb has 
> not committed.
> [Thang]: can accept it and fill invoke_name as unknown instead of do nothing.
>
> If we accept the userdata being lost, then we can look at to avoid the 
> coredump, otherwise Thuan can give an idea if it is imm issue that causes the 
> lost userdata.
>
> If we accept to avoid coredump, there is @operation_invoke_name that needs to 
> be freed before exit?
> [Thang]: as above can fill invoke_name as unknown in this case to avoid the 
> coredump.
>
>
> thanks
>
> Minh
>
> On 20/4/20 6:30 pm, Thang Duc Nguyen wrote:
>> Hi Minh,
>>
>> See my  comment inline.
>>
>> -----Original Message-----
>> From: Minh Hon Chau <minh.c...@dektech.com.au>
>> Sent: Monday, April 20, 2020 11:51 AM
>> To: Thuan Tran <thuan.t...@dektech.com.au>; Thang Duc Nguyen 
>> <thang.d.ngu...@dektech.com.au>
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get 
>> operation invoke name [#3178]
>>
>> Hi,
>>
>> One similarity to #2859 is that the invoke_name is only present in the first 
>> callback, so ntfimcn must memorize it in ccb userdata.
>>
>> But after ntfimcn calls ccbutil_ccbAddModifyOperation, this userdata is not 
>> written to immnd and sync across the other immnd(s)?
>> Meanings the userdata is only stored in imm agent? So after switchover, the 
>> next ccb callback does not have the invoke_name, and ntfimcn has lost its 
>> user data since restart.
>>
>> [Thang]: with a ccb with multi ops. The invoke_name, in this case only the 
>> first op contain the adminOwnername. And after ntfimcnd restarts, it 
>> received the seond or larger op modify. And this modify callback does not 
>> contain any more about this invoke_name.
>> Maybe we can retrieve the invoke_name from imm db but we can not got all 
>> info about all ops in that ccb.
>>
>> Thanks
>>
>> Minh
>>
>> On 16/4/20 3:32 pm, Thuan Tran wrote:
>>> Hi,
>>>
>>> I think this is just enhancement, not an urgent fix.
>>> Then we should make it better if possible.
>>>
>>> About #2859, I am not reviewer at that time.
>>> But I would not agree that solution as we can see service keep 
>>> restart if service still start in middle of one CCB many operations.
>>>
>>> Best Regards,
>>> ThuanTr
>>>
>>> -----Original Message-----
>>> From: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>
>>> Sent: Thursday, April 16, 2020 10:51 AM
>>> To: Thuan Tran <thuan.t...@dektech.com.au>; Minh Hon Chau 
>>> <minh.c...@dektech.com.au>
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: RE: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get 
>>> operation invoke name [#3178]
>>>
>>> Hi Thuan,
>>>
>>> Thanks for your comment.
>>> First this issue happen only in specific situation. And I think restart it 
>>> is no cause big issue.
>>> And the ccb is internal data based mange by ntf/ntfimcnd. After 
>>> ntfimcnd restart, it reinitialize CcbUtilCcbData and operation invoke name 
>>> is empty.
>>>
>>> Moreover, in current code in ntfimcn_imm.c, there are many place use
>>> imcn_exit(EXIT_FAILURE) when detect the error. Example for this is #2859.
>>> We consider to open a new ticket to consider your suggestion by 
>>> refactor/change current behavior of ntfimcnd.
>>>
>>> B.R/Thang
>>>
>>> -----Original Message-----
>>> From: Thuan Tran <thuan.t...@dektech.com.au>
>>> Sent: Thursday, April 16, 2020 10:16 AM
>>> To: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>; Minh Hon Chau 
>>> <minh.c...@dektech.com.au>
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: RE: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get 
>>> operation invoke name [#3178]
>>>
>>> Hi Thang,
>>>
>>>    From reproduce method, with solution after exit (instead of crash), user 
>>> continue input another operation then service exit again.
>>> The point is why we cannot get admin owner or object implementer via 2nd 
>>> imm modify callback in this scenario?
>>> Is it an IMM limit that don't include admin owner or object implementer 
>>> from 2nd modify callback?
>>>
>>> If limit, can we use another way to get admin owner or object implementer 
>>> base on object name?
>>> By this, we can avoid continuous exit if user keep going on operations by 
>>> same CCB.
>>>
>>> Best Regards,
>>> ThuanTr
>>>
>>> -----Original Message-----
>>> From: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au>
>>> Sent: Wednesday, April 15, 2020 3:43 PM
>>> To: Minh Hon Chau <minh.c...@dektech.com.au>; Thuan Tran 
>>> <thuan.t...@dektech.com.au>
>>> Cc: opensaf-devel@lists.sourceforge.net; Thang Duc Nguyen 
>>> <thang.d.ngu...@dektech.com.au>
>>> Subject: [PATCH 1/1] ntf: restart ntfimcnd if it fails to get 
>>> operation invoke name [#3178]
>>>
>>> If ntfimcnd is restarted during ccb modify, it will initialize 
>>> ccbUtilCcbData that not contain operation invoke name.
>>> This causes ntfimcnd crashed due to operation invoke name not existed.
>>>
>>> The fix is to restart ntfimcnd instead of raising the coredump.
>>> ---
>>>     src/ntf/ntfimcnd/ntfimcn_imm.c | 4 ++--
>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/ntf/ntfimcnd/ntfimcn_imm.c 
>>> b/src/ntf/ntfimcnd/ntfimcn_imm.c index 3c0a8c02a..3563a2264 100644
>>> --- a/src/ntf/ntfimcnd/ntfimcn_imm.c
>>> +++ b/src/ntf/ntfimcnd/ntfimcn_imm.c
>>> @@ -376,9 +376,9 @@ get_operation_invoke_name_modify(SaImmOiCcbIdT ccbId,
>>>                             goto done;
>>>                     }
>>>             }
>>> -   /* If we get here no name is found! */
>>> +   /* ntfimcnd was restarted during ccb midify */
>>>             LOG_ER("%s no name was found", __FUNCTION__);
>>> -   osafassert(0);
>>> +   imcn_exit(EXIT_FAILURE);
>>>     
>>>     done:
>>>             TRACE_LEAVE();
>>> --
>>> 2.17.1
>>>
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel

_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] ntf: restart ntfimcnd if it fails to get operation invoke name [#3178]

Reply via email to