Re: [devel] [PATCH 1/1] imm: return try-again on write requests while fs is unavailable [#3019]

2019-04-10 Thread Vu Minh Nguyen
Hi Anders,

See my responses inline. Thanks.

Regards, Vu

> -Original Message-
> From: and...@acm.org 
> Sent: Wednesday, April 10, 2019 2:16 PM
> To: lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au;
> hans.nordeb...@ericsson.com; gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> while fs is unavailable [#3019]
> 
> Ok I think I get it then.
> 
> The ticket slogan is missleading.
> It should say something like: return try-again on requests that imply 
> fs-access
> when fs is configured to be administratively unavailable.
[Vu] Thanks for your suggestion.

> 
> 
> The idea then is that the server (IMMND) detects the logical unavailability of
> the file system through this variable and returns TRY_AGAIN on file system
> accessing requests
> such as apply of ccb or updates to persistent runtime attributes or
> creation/deletion of classes or creation deletion of persistent runtime
> objects.
[Vu] Absolutely right.

> 
> But now you probably need to tweak that TRY_AGAIN logic in the client
> library side for handling ERR_TIMEOUT on ccb-apply.
> If its still there...
[Vu] I don't see any issue if we keep that logic. Can you explain a bit more 
about your concern here?

> 
> I would also suggest that the operation of setting this new admin state
> should ideally be two-phased and delayed in response not to interfere with
> already
> accepted and on-going applies. An apply can generate many completed
> callbacks to OIs and be drawn out in time.
> By two phased I mean rejecting new apply requests from clients by the
> TRY_AGAIN but letting already started apply requests to finish.
[Vu] Yes. I agree. The patch actually complies with that point.
> 
> 
> /Anders
> 
> 
> 
> 
> 
> 
> 
> 
> 
> /AndersBj
> 
> 
> >Ursprungligt meddelande
> >Från : vu.m.ngu...@dektech.com.au
> >Datum : 2019-04-10 - 04:33 (CEST)
> >Till : anders.bjornerst...@telia.com, gary....@dektech.com.au,
> hans.nordeb...@ericsson.com, lennart.l...@ericsson.com
> >Kopia : opensaf-devel@lists.sourceforge.net
> >Ämne : Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> while fs is unavailable [#3019]
> >
> >Hi Anders,
> >
> >Here is the text that I have put into the README file for this patch. Please
> give it a look.
> >
> >---
> >Return try-again on write requests if file system is unresponsive
> >=
> ==
> >When underlying file system is unresponsive to pbe write request, all IMM
> >requests that need their changes to be persistent such as creating an IMM
> >object or creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.
> >
> >And in case of putting pbe database on a network file system, there is a high
> possibility
> >the network file system server is unresponsive for a quite long time that 
> >will
> result in
> >timeout to all write requests at IMM library side.
> >
> >With this patch, IMM introduces two new administrative operations
> >which are used to inform IMM in advance that the file system is being
> unavailable/unresponsive or not.
> >
> >If IMM is informed that the file system is not being available for write, IMM
> will return SA_AIS_ERR_TRY_AGAIN
> >*earlier* by local IMMND to the caller  [...]
> >--
> >In my opinion, the idea of this ticket is quite similar to what IMM has been
> done in the case
> >*PBE mode is enabled but the PBE implementer is not yet attached/PBE
> process is not started yet*
> >
> >Please note that, the try-again is only returned to callers when IMM has
> been informed that the underlying file system is currently not able for write.
> >And once the file system is back, IMM must be updated that status and then
> IMM will operate normally.
> >
> >Regards, Vu
> >
> >> -Original Message-
> >> From: and...@acm.org 
> >> Sent: Wednesday, April 10, 2019 7:54 AM
> >> To: gary@dektech.com.au; hans.nordeb...@ericsson.com;
> >> lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> >> while fs is unavailable [#3019]
> >>
> >> Hi again
> >>
> >> I have an additional objection/comment to this ticket.
> >> It concerns the semantics of the returncode ERR_TRY_AGAIN.
> >>
> >> The meaning of this error code is that the operation could NOT be
> performed
> >

Re: [devel] [PATCH 1/1] imm: return try-again on write requests while fs is unavailable [#3019]

2019-04-09 Thread Vu Minh Nguyen
Hi Anders,

Here is the text that I have put into the README file for this patch. Please 
give it a look.

---
Return try-again on write requests if file system is unresponsive
===
When underlying file system is unresponsive to pbe write request, all IMM
requests that need their changes to be persistent such as creating an IMM
object or creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.

And in case of putting pbe database on a network file system, there is a high 
possibility 
the network file system server is unresponsive for a quite long time that will 
result in
timeout to all write requests at IMM library side.

With this patch, IMM introduces two new administrative operations
which are used to inform IMM in advance that the file system is being 
unavailable/unresponsive or not. 

If IMM is informed that the file system is not being available for write, IMM 
will return SA_AIS_ERR_TRY_AGAIN
*earlier* by local IMMND to the caller  [...]
--
In my opinion, the idea of this ticket is quite similar to what IMM has been 
done in the case 
*PBE mode is enabled but the PBE implementer is not yet attached/PBE process is 
not started yet*

Please note that, the try-again is only returned to callers when IMM has been 
informed that the underlying file system is currently not able for write. 
And once the file system is back, IMM must be updated that status and then IMM 
will operate normally.

Regards, Vu

> -Original Message-
> From: and...@acm.org 
> Sent: Wednesday, April 10, 2019 7:54 AM
> To: gary@dektech.com.au; hans.nordeb...@ericsson.com;
> lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> while fs is unavailable [#3019]
> 
> Hi again
> 
> I have an additional objection/comment to this ticket.
> It concerns the semantics of the returncode ERR_TRY_AGAIN.
> 
> The meaning of this error code is that the operation could NOT be performed
> and the server has NOT executed the request.
> 
> That is the server currently rejects the requests but invites the client to 
> try-
> again.
> 
> Crucial here is that if the client descides NOT to retry thre request, then 
> the
> client *knows* that the request was not (will not) be done!!
> This is the original and only sensible semantics of ERR_TRY_AGAIN.
> 
> In essence a client that gets TRY_AGAIN is in control of whether or not the
> operation shall be retried and may decide NOT to.
> 
> Do you see how this is glaringly missmatched in this proposed ticket ?
> This ticket proposes to return TRY_AGAIN for a case where the operation
> may *still* be on-going.
> THis violates the semantics of TRY_AGAIN.
> 
> ERR_TIMEOUT is verry different.
> It simply tells the user that his client side timer expired and it is unknown 
> at
> the client what happened with the request in the server.
> It may have succeeded, it may have failed, or it may still be stuck in
> processing.
> 
> Receiving ERR_TIMEOUT is a nuisance since the client can not know what
> happened.
> 
> But the solution must be to provide tools for the client to find out what
> happened.
> You need something like a new SaAisCcbDidItCommit API to get arround this
> problem at the API level
> 
> But it gets complicated since the CCB Handles are a handle to a chain of
> actual ccb-ids (actual transactions).
> The ccb-ids are not vissible/exposed over the IMM SAF API.
> It is truly a flawed api in its design.
> 
> I dont see any really good solution.
> 
> THe work arround that was in place when I departed was the hidden logic on
> the ccb-client side that
> tried to recover the result in case of timeout by actually probing the imm ccb
> commit log in the database.
> But that *hidden* client side retry mechanism had to have a meta timeout
> since we could not block the actual client (ccb api user) forever.
> And it was (is?) ugly since it overrides the normal/original/general client 
> side
> timeout concept on handles
> 
> /Anders
> 
> 
> 
> 
> 
> >Ursprungligt meddelande
> >Från : anders.bjornerst...@telia.com
> >Datum : 2019-04-09 - 10:40 (CEST)
> >Till : hans.nordeb...@ericsson.com, vu.m.ngu...@dektech.com.au,
> lennart.l...@ericsson.com, gary@dektech.com.au
> >Kopia : opensaf-devel@lists.sourceforge.net
> >Ämne : Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> while fs is unavailable [#3019]
> >
> >Hi Vu,
> >
> >Well I think all these three cases could cause problems.
> >The key issues here are the perceived reliability of the IMM service from the
> perspective of the user.
> >
> >If the 

Re: [devel] [PATCH 1/1] imm: return try-again on write requests while fs is unavailable [#3019]

2019-04-08 Thread Vu Minh Nguyen
Hi AndersBj,

Thanks for your comments. 

However, I need your help to provide more info on your concern so that I can 
fully understand it. Which use case below describe exactly your concern about 
"additional Apply requests"?

Here are some use cases that I can think about the "additional apply requests".

1) "Additional Apply requests" are invoked after the first apply has returned 
with SA_AIS_OK (the transaction has been committed).
The additional apply should get FAILED_OPERATION as the CCB has been applied. 

2) "Additional Apply requests" are invoked after the first apply has returned 
due to timeout even the transaction commit is in progress.
The additional apply request should get BAD_HANDLE as the ccb has been cleaned 
up as the result of timeout from the first one.

3) "Additional Apply requests" are invoked concurrently in another thread while 
the first one is on-going.
The additional apply request should get TRY_AGAIN as in legacy code the first 
call is blocking on waiting for reply 
even such code is *not allowed* though - user should make sure not using same 
handle concurrently from different threads. 

4) Other cases

Regards, Vu

> -Original Message-
> From: and...@acm.org 
> Sent: Monday, April 8, 2019 6:15 PM
> To: gary@dektech.com.au; vu.m.ngu...@dektech.com.au;
> lennart.l...@ericsson.com; hans.nordeb...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> while fs is unavailable [#3019]
> 
> Hi,
> 
> I think this ticket (#3019) is erroneous and dangerous.
> If you currently have an apply in progress (a commit of a transaction) that
> means you DONT KNOW if the operation will/has succeeded.
> What you DO know is that the wrtite/commit most likely will succeed or fail
> at some time.
> If you during the uncertainty generate additional Apply requests on the same
> CCB, then you can be pretty sure that those requests
> do not "by pass" the blocked request. Instead you can be pretty sure that the
> redundant apply request(s) will return an error code
> which most likely will cause the user to conclude that the CCB faile to apply,
> when in fact this is uncertain and likely to be false.
> 
> A disaster in other words.
> 
> /AndersBj
> 
> 
> >Ursprungligt meddelande
> >Från : vu.m.ngu...@dektech.com.au
> >Datum : 2019-04-08 - 03:05 ()
> >Till : hans.nordeb...@ericsson.com, lennart.l...@ericsson.com,
> gary....@dektech.com.au
> >Kopia : opensaf-devel@lists.sourceforge.net
> >Ämne : Re: [devel] [PATCH 1/1] imm: return try-again on write requests
> while fs is unavailable [#3019]
> >
> >Hi all,
> >
> >Have you had time to review the patch? I will push this ticket by this
> >Wednesday if there is no comment/feedback.
> >
> >Regards, Vu
> >
> >> -Original Message-
> >> From: Vu Minh Nguyen 
> >> Sent: Tuesday, March 26, 2019 1:19 PM
> >> To: hans.nordeb...@ericsson.com; lennart.l...@ericsson.com;
> >> gary@dektech.com.au
> >> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> >> 
> >> Subject: [PATCH 1/1] imm: return try-again on write requests while fs is
> >> unavailable [#3019]
> >>
> >> When underlying file system is unresponsive to pbe write request, all IMM
> >> requests that required to be persistent such as creating an IMM object or
> >> creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.
> >>
> >> This changeset introduces two administrative operations which are
> provided
> >> to
> >> let user inform IMM if the file system is unavailable or not. If the file
> >> system is not available, IMM will return SA_AIS_ERR_TRY_AGAIN earlier to
> >> the
> >> caller instead of SA_AIS_ERR_TIMEOUT.
> >>
> >> Besides, a new IMM attribute, saImmFileSystemStatus, is added to
> >> SaImmMngt class; the value shows the current status of the file system
> >> according to IMM view.
> >> ---
> >>  src/imm/README|  49 ++-
> >>  .../management/test_saImmOmClassCreate_2.c| 136
> >> ++
> >>  src/imm/common/immsv_api.h|   5 +-
> >>  src/imm/config/immsv_classes.xml  |   7 +
> >>  src/imm/immnd/ImmModel.cc |  91 ++--
> >>  src/imm/immnd/ImmModel.h  |   1 +
> >>  src/imm/immnd/immnd_evt.c |  11 +-
> >>  src/imm/immnd/immnd_init.h|   1 +
> >>  8 files changed, 288 insertions(+), 

Re: [devel] [PATCH 1/1] imm: return try-again on write requests while fs is unavailable [#3019]

2019-04-07 Thread Vu Minh Nguyen
Hi all,

Have you had time to review the patch? I will push this ticket by this
Wednesday if there is no comment/feedback.

Regards, Vu

> -Original Message-
> From: Vu Minh Nguyen 
> Sent: Tuesday, March 26, 2019 1:19 PM
> To: hans.nordeb...@ericsson.com; lennart.l...@ericsson.com;
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> 
> Subject: [PATCH 1/1] imm: return try-again on write requests while fs is
> unavailable [#3019]
> 
> When underlying file system is unresponsive to pbe write request, all IMM
> requests that required to be persistent such as creating an IMM object or
> creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.
> 
> This changeset introduces two administrative operations which are provided
> to
> let user inform IMM if the file system is unavailable or not. If the file
> system is not available, IMM will return SA_AIS_ERR_TRY_AGAIN earlier to
> the
> caller instead of SA_AIS_ERR_TIMEOUT.
> 
> Besides, a new IMM attribute, saImmFileSystemStatus, is added to
> SaImmMngt class; the value shows the current status of the file system
> according to IMM view.
> ---
>  src/imm/README|  49 ++-
>  .../management/test_saImmOmClassCreate_2.c| 136
> ++
>  src/imm/common/immsv_api.h|   5 +-
>  src/imm/config/immsv_classes.xml  |   7 +
>  src/imm/immnd/ImmModel.cc |  91 ++--
>  src/imm/immnd/ImmModel.h  |   1 +
>  src/imm/immnd/immnd_evt.c |  11 +-
>  src/imm/immnd/immnd_init.h|   1 +
>  8 files changed, 288 insertions(+), 13 deletions(-)
> 
> diff --git a/src/imm/README b/src/imm/README
> index 132ee0ac0..8e91b534c 100644
> --- a/src/imm/README
> +++ b/src/imm/README
> @@ -2969,7 +2969,7 @@ attribute in the object:
> 
>  The following is the shell command:
> 
> -immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:1024 \
> +immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:512 \
> opensafImm=opensafImm,safApp=safImmService
> 
>  This will set bit 10 of the 'opensafImmNostdFlags' runtime attribute
inside
> the immsv.
> @@ -3027,7 +3027,6 @@ expires.
>  To be possible to use this new feature, bit 10 must be set in
>  opensafImmNostdFlags attribute in IMM object.
> 
> -
>  Provide an admin-operation for re-generating backend database from one in
> RAM
> 
> ==
> ===
>  https://sourceforge.net/p/opensaf/tickets/2940/
> @@ -3046,6 +3045,52 @@ back-end database from one in memory to keep
> them both consistent.
> 
>  immadm -o 303 safRdn=immManagement,safApp=safImmService
> 
> +Return try-again on write requests while file system is unavailable
> +=
> ==
> +https://sourceforge.net/p/opensaf/tickets/3019
> +
> +When underlying file system is unresponsive to pbe write request, all IMM
> +requests that required to be persistent such as creating an IMM object or
> +creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.
> +
> +Since OpenSAF version 5.19.06, IMM introduces two new administrative
> operations
> +which are used to inform IMM if the file system is unavailable or not. If
the
> +file system is not available for write, IMM will return
> SA_AIS_ERR_TRY_AGAIN
> +earlier to the caller instead of SA_AIS_ERR_TIMEOUT.
> +
> +Operation ID 400 is used to inform the file system is unavailable:
> + immadm -o 400 safRdn=immManagement,safApp=safImmService
> +
> +and operation ID 401 is used to inform the file system is back:
> + immadm -o 401 safRdn=immManagement,safApp=safImmService
> +
> +Besides, a new runtime IMM attribute, saImmFileSystemStatus, is added to
> +SaImmMngt class; the value shows the current status of the file system:
> +saImmFileSystemStatus = 0 means the file system is unavailable and
> +saImmFileSystemStatus = 1 means the file system is fine for write.
> +
> +Fetching the value of that attribute, other services or applications that
> +have activities to read/write data from/to the file system may benefit as
> well.
> +
> +Note that, to use this new feature, bit 11 must be set in
> opensafImmNostdFlags.
> +
> +The following is the shell command to set the 11st bit:
> + immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:1024 \
> + opensafImm=opensafImm,safApp=safImmService
> +
> +In summary:
> + Bit 1 controls schema (imm class) changes allowed or not (normally
off/0).
> + Bit 2 controls OpenSAF4.1 protocols allowed or not (normally on/1).
> + Bit 3 controls OpenSAF4.3 protocols allowed or not (normally on/1).
> + Bit 4 controls 2PBE oneSafe2PBE, see 2PBE feature in OpenSAF4.4 above
> (normally off/0).
> + Bit 5 controls OpenSAF4.5 protocols allowed or not (normally on/1).
> + Bit 6 controls OpenSAF4.6 protocols allowed or not (normally on/1).
> + Bit 7 controls OpenSAF4.7 protocols allowed or not (normally 

[devel] [PATCH 1/1] imm: return try-again on write requests while fs is unavailable [#3019]

2019-03-26 Thread Vu Minh Nguyen
When underlying file system is unresponsive to pbe write request, all IMM
requests that required to be persistent such as creating an IMM object or
creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.

This changeset introduces two administrative operations which are provided to
let user inform IMM if the file system is unavailable or not. If the file
system is not available, IMM will return SA_AIS_ERR_TRY_AGAIN earlier to the
caller instead of SA_AIS_ERR_TIMEOUT.

Besides, a new IMM attribute, saImmFileSystemStatus, is added to
SaImmMngt class; the value shows the current status of the file system
according to IMM view.
---
 src/imm/README|  49 ++-
 .../management/test_saImmOmClassCreate_2.c| 136 ++
 src/imm/common/immsv_api.h|   5 +-
 src/imm/config/immsv_classes.xml  |   7 +
 src/imm/immnd/ImmModel.cc |  91 ++--
 src/imm/immnd/ImmModel.h  |   1 +
 src/imm/immnd/immnd_evt.c |  11 +-
 src/imm/immnd/immnd_init.h|   1 +
 8 files changed, 288 insertions(+), 13 deletions(-)

diff --git a/src/imm/README b/src/imm/README
index 132ee0ac0..8e91b534c 100644
--- a/src/imm/README
+++ b/src/imm/README
@@ -2969,7 +2969,7 @@ attribute in the object:
 
 The following is the shell command:
 
-immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:1024 \
+immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:512 \
opensafImm=opensafImm,safApp=safImmService
 
 This will set bit 10 of the 'opensafImmNostdFlags' runtime attribute inside 
the immsv.
@@ -3027,7 +3027,6 @@ expires.
 To be possible to use this new feature, bit 10 must be set in
 opensafImmNostdFlags attribute in IMM object.
 
-
 Provide an admin-operation for re-generating backend database from one in RAM
 =
 https://sourceforge.net/p/opensaf/tickets/2940/
@@ -3046,6 +3045,52 @@ back-end database from one in memory to keep them both 
consistent.
 
 immadm -o 303 safRdn=immManagement,safApp=safImmService
 
+Return try-again on write requests while file system is unavailable
+===
+https://sourceforge.net/p/opensaf/tickets/3019
+
+When underlying file system is unresponsive to pbe write request, all IMM
+requests that required to be persistent such as creating an IMM object or
+creating an IMM class likely gets SA_AIS_ERR_TIMEOUT.
+
+Since OpenSAF version 5.19.06, IMM introduces two new administrative operations
+which are used to inform IMM if the file system is unavailable or not. If the
+file system is not available for write, IMM will return SA_AIS_ERR_TRY_AGAIN
+earlier to the caller instead of SA_AIS_ERR_TIMEOUT.
+
+Operation ID 400 is used to inform the file system is unavailable:
+ immadm -o 400 safRdn=immManagement,safApp=safImmService
+
+and operation ID 401 is used to inform the file system is back:
+ immadm -o 401 safRdn=immManagement,safApp=safImmService
+
+Besides, a new runtime IMM attribute, saImmFileSystemStatus, is added to
+SaImmMngt class; the value shows the current status of the file system:
+saImmFileSystemStatus = 0 means the file system is unavailable and
+saImmFileSystemStatus = 1 means the file system is fine for write.
+
+Fetching the value of that attribute, other services or applications that
+have activities to read/write data from/to the file system may benefit as well.
+
+Note that, to use this new feature, bit 11 must be set in opensafImmNostdFlags.
+
+The following is the shell command to set the 11st bit:
+ immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:1024 \
+   opensafImm=opensafImm,safApp=safImmService
+
+In summary:
+ Bit 1 controls schema (imm class) changes allowed or not (normally off/0).
+ Bit 2 controls OpenSAF4.1 protocols allowed or not (normally on/1).
+ Bit 3 controls OpenSAF4.3 protocols allowed or not (normally on/1).
+ Bit 4 controls 2PBE oneSafe2PBE, see 2PBE feature in OpenSAF4.4 above 
(normally off/0).
+ Bit 5 controls OpenSAF4.5 protocols allowed or not (normally on/1).
+ Bit 6 controls OpenSAF4.6 protocols allowed or not (normally on/1).
+ Bit 7 controls OpenSAF4.7 protocols allowed or not (normally on/1).
+ Bit 8 controls OpenSAF5.0 protocols allowed or not (normally on/1).
+ Bit 9 controls OpenSAF5.1 protocols allowed or not (normally on/1).
+ Bit 10 controls OpenSAF5.17.11 protocols allowed or not (normally on/1).
+ Bit 11 controls OpenSAF5.19.06 protocols allowed or not (normally on/1).
+
 
 DEPENDENCIES
 
diff --git a/src/imm/apitest/management/test_saImmOmClassCreate_2.c 
b/src/imm/apitest/management/test_saImmOmClassCreate_2.c
index f43884e1b..9e12f4f61 100644
--- a/src/imm/apitest/management/test_saImmOmClassCreate_2.c
+++ b/src/imm/apitest/management/test_saImmOmClassCreate_2.c
@@ -16,6 +16,7 @@
  */