Re: [devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]
Thanks Gary. I will update the README as your comments. Thanks, Vu On 12/24/19 6:11 AM, Gary Lee wrote: Hi Vu Very minor comments with [GL]. Gary -Original Message- From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] Sent: Thursday, 28 November 2019 7:25 PM To: lennart.l...@ericsson.com; Gary Lee ; Minh Hon Chau Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen Subject: [PATCH 4/5] log: update README file for improvement of log resilience [#3116] --- src/log/README | 38 ++ 1 file changed, 38 insertions(+) diff --git a/src/log/README b/src/log/README index b83d472e4..ab96a8157 100644 --- a/src/log/README +++ b/src/log/README @@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node (CLM state will available as soon as CLM started), so LGS is a taking AVD Up event as trigger to do CLM initialize. + +4. Improve the resilience of OpenSAF LOG service (#3116) +- +When the file system is unresponsive, log client gets try-again from +write callback very shortly after I/O timeout reaches the setting; the [GL] "reaches the setting" sounds confusing. What setting? +value of I/O timeout is configurable via the attribute logFileIoTimeout +within this valid range [500ms – 5000ms]. This is legacy behavior. + +This ticket improves the resilience of LOG service, so that log service +can cache async write requests up to an configurable time that is [GL] a configurable +around 15-30 seconds before returning status to log client via write async callback. + +The cache size is configurable via a new attribute `logMaxPendingWriteRequests`. +Default value is zero (0) - means this feature is disabled. The valid +range is [current queue size - 1000]. To know what is the current size +of the queue, fetching the value of pure runtime attribute [GL] To find the current size of the queue, fetch the ... +`logCurrentPendingWriteRequests` of `OpenSafLogCurrentConfig` class. +When the cache size reaches the limit, all coming requests will get +acknowledgement right away with SA_AIS_ERR_TRY_AGAIN. + +The resilient timeout can also be configurable via a new attribute +`logResilienceTimeout`. The valid range is [15-30] seconds. When a +pending write async can be dropped and removed from the queue in cases: +a) Stays in the queue longer than the given resilient timeout. +b) The targeting stream has been closed. + +The queue is always kept in sync with standby. + +Besides, log agent has a light list keeping track all invocations which +not yet get acknowledgements from log server. If cluster goes to +headless; in other words, log server is disappeared and all cached data +has been lost, log agent (library) will notify all lost invocations to +log client via write async callback with SA_AIS_ERR_TRY_AGAIN error code. + +To test this feature, a gcc flag is added during compile time to +simulate the case the underlying file system is unresponsive, and it +only takes affect when the cache size is given to an non-zero value. [G][ it only takes effect when the cache size is set to a non-zero value +With that, the I/O thread will sleep *16 seconds* every 02 write requests. \ No newline at end of file -- 2.17.1 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]
Hi Vu Very minor comments with [GL]. Gary -Original Message- From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] Sent: Thursday, 28 November 2019 7:25 PM To: lennart.l...@ericsson.com; Gary Lee ; Minh Hon Chau Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen Subject: [PATCH 4/5] log: update README file for improvement of log resilience [#3116] --- src/log/README | 38 ++ 1 file changed, 38 insertions(+) diff --git a/src/log/README b/src/log/README index b83d472e4..ab96a8157 100644 --- a/src/log/README +++ b/src/log/README @@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node (CLM state will available as soon as CLM started), so LGS is a taking AVD Up event as trigger to do CLM initialize. + +4. Improve the resilience of OpenSAF LOG service (#3116) +- +When the file system is unresponsive, log client gets try-again from +write callback very shortly after I/O timeout reaches the setting; the [GL] "reaches the setting" sounds confusing. What setting? +value of I/O timeout is configurable via the attribute logFileIoTimeout +within this valid range [500ms – 5000ms]. This is legacy behavior. + +This ticket improves the resilience of LOG service, so that log service +can cache async write requests up to an configurable time that is [GL] a configurable +around 15-30 seconds before returning status to log client via write async callback. + +The cache size is configurable via a new attribute `logMaxPendingWriteRequests`. +Default value is zero (0) - means this feature is disabled. The valid +range is [current queue size - 1000]. To know what is the current size +of the queue, fetching the value of pure runtime attribute [GL] To find the current size of the queue, fetch the ... +`logCurrentPendingWriteRequests` of `OpenSafLogCurrentConfig` class. +When the cache size reaches the limit, all coming requests will get +acknowledgement right away with SA_AIS_ERR_TRY_AGAIN. + +The resilient timeout can also be configurable via a new attribute +`logResilienceTimeout`. The valid range is [15-30] seconds. When a +pending write async can be dropped and removed from the queue in cases: +a) Stays in the queue longer than the given resilient timeout. +b) The targeting stream has been closed. + +The queue is always kept in sync with standby. + +Besides, log agent has a light list keeping track all invocations which +not yet get acknowledgements from log server. If cluster goes to +headless; in other words, log server is disappeared and all cached data +has been lost, log agent (library) will notify all lost invocations to +log client via write async callback with SA_AIS_ERR_TRY_AGAIN error code. + +To test this feature, a gcc flag is added during compile time to +simulate the case the underlying file system is unresponsive, and it +only takes affect when the cache size is given to an non-zero value. [G][ it only takes effect when the cache size is set to a non-zero value +With that, the I/O thread will sleep *16 seconds* every 02 write requests. \ No newline at end of file -- 2.17.1 smime.p7s Description: S/MIME cryptographic signature ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]
--- src/log/README | 38 ++ 1 file changed, 38 insertions(+) diff --git a/src/log/README b/src/log/README index b83d472e4..ab96a8157 100644 --- a/src/log/README +++ b/src/log/README @@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node (CLM state will available as soon as CLM started), so LGS is a taking AVD Up event as trigger to do CLM initialize. + +4. Improve the resilience of OpenSAF LOG service (#3116) +- +When the file system is unresponsive, log client gets try-again from write +callback very shortly after I/O timeout reaches the setting; the value of I/O +timeout is configurable via the attribute logFileIoTimeout within this valid +range [500ms – 5000ms]. This is legacy behavior. + +This ticket improves the resilience of LOG service, so that log service can +cache async write requests up to an configurable time that is around 15-30 +seconds before returning status to log client via write async callback. + +The cache size is configurable via a new attribute `logMaxPendingWriteRequests`. +Default value is zero (0) - means this feature is disabled. The valid range is +[current queue size - 1000]. To know what is the current size of the queue, +fetching the value of pure runtime attribute `logCurrentPendingWriteRequests` +of `OpenSafLogCurrentConfig` class. When the cache size reaches the limit, +all coming requests will get acknowledgement right away with +SA_AIS_ERR_TRY_AGAIN. + +The resilient timeout can also be configurable via a new attribute +`logResilienceTimeout`. The valid range is [15-30] seconds. When a pending write +async can be dropped and removed from the queue in cases: +a) Stays in the queue longer than the given resilient timeout. +b) The targeting stream has been closed. + +The queue is always kept in sync with standby. + +Besides, log agent has a light list keeping track all invocations which not yet +get acknowledgements from log server. If cluster goes to headless; in other +words, log server is disappeared and all cached data has been lost, +log agent (library) will notify all lost invocations to log client via write +async callback with SA_AIS_ERR_TRY_AGAIN error code. + +To test this feature, a gcc flag is added during compile time to simulate +the case the underlying file system is unresponsive, and it only takes +affect when the cache size is given to an non-zero value. With that, +the I/O thread will sleep *16 seconds* every 02 write requests. \ No newline at end of file -- 2.17.1 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel