- **status**: review --> fixed
- **Comment**:
commit de3ab24682dde57a48db6bf7f769477524b1afe4 (HEAD -> develop,
origin/develop, ticket-3116)
Author: Vu Minh Nguyen <[email protected]>
Date: Wed Dec 25 13:10:57 2019 +0700
log: add test cases of improving the log resilience [#3116]
Adding 08 new test cases into 02 suites:
1) Suite 20 with 07 test cases, including:
- Test changing queue size & resilient timeout;
- Test if a write async is dropped if its timeout setting is overdue,
also verify if log server has kept the request in proper time.
- Test if getting write callback right away if the cache is full.
- Test if the cache is fully and correctly synced with standby.
2) Suite 21 with one test case:
Test if LOG agent notifies all lost invocation to log client.
As the suite 21 requires manual interaction, it is put into
'extended' tests. Only run with option '-e'.
To run most of these test cases, have to compile log service
and logtest with the flag SIMULATE_NFS_UNRESPONSE enabled.
commit 0da79ed4da30f0106b1bd9adfc506ace7912242f
Author: Vu Minh Nguyen <[email protected]>
Date: Wed Dec 25 13:10:57 2019 +0700
log: update README file for improvement of log resilience [#3116]
commit 39601b676772d8d7ee0f983e83722ed7fd59dac8
Author: Vu Minh Nguyen <[email protected]>
Date: Wed Dec 25 13:10:57 2019 +0700
saflogger: make timeout waiting for getting acknowledgment configurable
[#3116]
Introducing a new option `-t second` or `--timeout=second` to let user input
his desired timeout of waiting for write async acknowledgment.
Default timeout is 20 seconds to keep saflogger backward compatible.
commit 829a023dc42944c8509f09595b89c03feb5b1f86
Author: Vu Minh Nguyen <[email protected]>
Date: Wed Dec 25 13:10:57 2019 +0700
log: notify all lost log records when cluster goes to headless [#3116]
This change introduces a light list keeping all invocations that not yet
get the acknowledgement from log server. If the server is disappeared
in case of headless, log agent will notify all lost invocations to log
client
with error code SA_AIS_ERR_TRY_AGAIN.
commit 711145472366fd071a84f7c8a9eb4e82e30fc1fb
Author: Vu Minh Nguyen <[email protected]>
Date: Wed Dec 25 13:10:57 2019 +0700
log: improve the resilience of log service [#3116]
In order to improve resilience of OpenSAF LOG service when underlying
file system is unresponsive, a queue is introduced to hold async
write request up to an configurable time that is around 15 - 30 seconds.
The readiness of the I/O thread will periodically check, and if it turns
to ready state, the front element will go first. Returns
SA_AIS_ERR_TRY_AGAIN
to client if the element stays in the queue longer than the setting time.
The queue capacity and the resilient time are configurable via the
attributes:
`logMaxPendingWriteRequests` and `logResilienceTimeout`.
In default, this feature is disabled to keep log server backward compatible.
---
** [tickets:#3116] log: improve the resilience of log service**
**Status:** fixed
**Milestone:** 5.20.01
**Created:** Tue Nov 05, 2019 07:22 AM UTC by Vu Minh Nguyen
**Last Updated:** Thu Nov 28, 2019 08:25 AM UTC
**Owner:** Vu Minh Nguyen
When the file system is unresponsive, log client gets try-again from write
callback very shortly after I/O timeout reaches the setting; the value of I/O
timeout is configurable via the attribute `logFileIoTimeout` within this valid
range [500ms – 5000ms].
This ticket is going to improve the resilience of LOG service, so that log
service can cache the write requests up to 30 seconds or so before giving up
and returning status to caller.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets