Re: [devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]

2019-12-23 Thread Nguyen Minh Vu

Thanks Gary.

I will update the README as your comments.

Thanks, Vu

On 12/24/19 6:11 AM, Gary Lee wrote:

Hi Vu

Very minor comments with [GL].

Gary

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
Sent: Thursday, 28 November 2019 7:25 PM
To: lennart.l...@ericsson.com; Gary Lee ; Minh Hon Chau 

Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 4/5] log: update README file for improvement of log resilience 
[#3116]

---
  src/log/README | 38 ++
  1 file changed, 38 insertions(+)

diff --git a/src/log/README b/src/log/README index b83d472e4..ab96a8157 100644
--- a/src/log/README
+++ b/src/log/README
@@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node  
(CLM state will available as soon as CLM started), so LGS is a taking  AVD Up 
event as trigger to do CLM initialize.
   
+

+4. Improve the resilience of OpenSAF LOG service (#3116)
+-
+When the file system is unresponsive, log client gets try-again from
+write callback very shortly after I/O timeout reaches the setting; the

[GL] "reaches the setting" sounds confusing. What setting?

+value of I/O timeout is configurable via the attribute logFileIoTimeout
+within this valid range [500ms – 5000ms]. This is legacy behavior.
+
+This ticket improves the resilience of LOG service, so that log service

+can cache async write requests up to an configurable time that is

[GL] a configurable

+around 15-30 seconds before returning status to log client via write async 
callback.
+
+The cache size is configurable via a new attribute 
`logMaxPendingWriteRequests`.
+Default value is zero (0) - means this feature is disabled. The valid
+range is [current queue size - 1000]. To know what is the current size
+of the queue, fetching the value of pure runtime attribute

[GL] To find the current size of the queue, fetch the ...

+`logCurrentPendingWriteRequests` of `OpenSafLogCurrentConfig` class.
+When the cache size reaches the limit, all coming requests will get
+acknowledgement right away with SA_AIS_ERR_TRY_AGAIN.
+
+The resilient timeout can also be configurable via a new attribute
+`logResilienceTimeout`. The valid range is [15-30] seconds. When a
+pending write async can be dropped and removed from the queue in cases:
+a) Stays in the queue longer than the given resilient timeout.
+b) The targeting stream has been closed.
+
+The queue is always kept in sync with standby.
+
+Besides, log agent has a light list keeping track all invocations which
+not yet get acknowledgements from log server. If cluster goes to
+headless; in other words, log server is disappeared and all cached data
+has been lost, log agent (library) will notify all lost invocations to
+log client via write async callback with SA_AIS_ERR_TRY_AGAIN error code.
+
+To test this feature, a gcc flag is added during compile time to
+simulate the case the underlying file system is unresponsive, and it
+only takes affect when the cache size is given to an non-zero value.

[G][ it only takes effect when the cache size is set to a non-zero value

+With that, the I/O thread will sleep *16 seconds* every 02 write requests.
\ No newline at end of file
--
2.17.1





___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]

2019-12-23 Thread Gary Lee
Hi Vu

Very minor comments with [GL].

Gary

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Thursday, 28 November 2019 7:25 PM
To: lennart.l...@ericsson.com; Gary Lee ; Minh Hon 
Chau 
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 4/5] log: update README file for improvement of log resilience 
[#3116]

---
 src/log/README | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/log/README b/src/log/README index b83d472e4..ab96a8157 100644
--- a/src/log/README
+++ b/src/log/README
@@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node  
(CLM state will available as soon as CLM started), so LGS is a taking  AVD Up 
event as trigger to do CLM initialize.
  
+
+4. Improve the resilience of OpenSAF LOG service (#3116)
+-
+When the file system is unresponsive, log client gets try-again from 
+write callback very shortly after I/O timeout reaches the setting; the 

[GL] "reaches the setting" sounds confusing. What setting?

+value of I/O timeout is configurable via the attribute logFileIoTimeout 
+within this valid range [500ms – 5000ms]. This is legacy behavior.
+
+This ticket improves the resilience of LOG service, so that log service 

+can cache async write requests up to an configurable time that is 

[GL] a configurable

+around 15-30 seconds before returning status to log client via write async 
callback.
+
+The cache size is configurable via a new attribute 
`logMaxPendingWriteRequests`.
+Default value is zero (0) - means this feature is disabled. The valid 
+range is [current queue size - 1000]. To know what is the current size 
+of the queue, fetching the value of pure runtime attribute 

[GL] To find the current size of the queue, fetch the ...

+`logCurrentPendingWriteRequests` of `OpenSafLogCurrentConfig` class. 
+When the cache size reaches the limit, all coming requests will get 
+acknowledgement right away with SA_AIS_ERR_TRY_AGAIN.
+
+The resilient timeout can also be configurable via a new attribute 
+`logResilienceTimeout`. The valid range is [15-30] seconds. When a 
+pending write async can be dropped and removed from the queue in cases:
+a) Stays in the queue longer than the given resilient timeout.
+b) The targeting stream has been closed.
+
+The queue is always kept in sync with standby.
+
+Besides, log agent has a light list keeping track all invocations which 
+not yet get acknowledgements from log server. If cluster goes to 
+headless; in other words, log server is disappeared and all cached data 
+has been lost, log agent (library) will notify all lost invocations to 
+log client via write async callback with SA_AIS_ERR_TRY_AGAIN error code.
+
+To test this feature, a gcc flag is added during compile time to 
+simulate the case the underlying file system is unresponsive, and it 
+only takes affect when the cache size is given to an non-zero value. 

[G][ it only takes effect when the cache size is set to a non-zero value

+With that, the I/O thread will sleep *16 seconds* every 02 write requests.
\ No newline at end of file
--
2.17.1



smime.p7s
Description: S/MIME cryptographic signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]

2019-11-28 Thread Vu Minh Nguyen
---
 src/log/README | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/log/README b/src/log/README
index b83d472e4..ab96a8157 100644
--- a/src/log/README
+++ b/src/log/README
@@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node
 (CLM state will available as soon as CLM started), so LGS is a taking
 AVD Up event as trigger to do CLM initialize.
  
+
+4. Improve the resilience of OpenSAF LOG service (#3116)
+-
+When the file system is unresponsive, log client gets try-again from write
+callback very shortly after I/O timeout reaches the setting; the value of I/O
+timeout is configurable via the attribute logFileIoTimeout within this valid
+range [500ms – 5000ms]. This is legacy behavior.
+
+This ticket improves the resilience of LOG service, so that log service can
+cache async write requests up to an configurable time that is around 15-30
+seconds before returning status to log client via write async callback.
+
+The cache size is configurable via a new attribute 
`logMaxPendingWriteRequests`.
+Default value is zero (0) - means this feature is disabled. The valid range is
+[current queue size - 1000]. To know what is the current size of the queue,
+fetching the value of pure runtime attribute `logCurrentPendingWriteRequests`
+of `OpenSafLogCurrentConfig` class. When the cache size reaches the limit,
+all coming requests will get acknowledgement right away with
+SA_AIS_ERR_TRY_AGAIN.
+
+The resilient timeout can also be configurable via a new attribute
+`logResilienceTimeout`. The valid range is [15-30] seconds. When a pending 
write
+async can be dropped and removed from the queue in cases:
+a) Stays in the queue longer than the given resilient timeout.
+b) The targeting stream has been closed.
+
+The queue is always kept in sync with standby.
+
+Besides, log agent has a light list keeping track all invocations which not yet
+get acknowledgements from log server. If cluster goes to headless; in other
+words, log server is disappeared and all cached data has been lost,
+log agent (library) will notify all lost invocations to log client via write
+async callback with SA_AIS_ERR_TRY_AGAIN error code.
+
+To test this feature, a gcc flag is added during compile time to simulate
+the case the underlying file system is unresponsive, and it only takes
+affect when the cache size is given to an non-zero value. With that,
+the I/O thread will sleep *16 seconds* every 02 write requests.
\ No newline at end of file
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel