On 8/18/20 2:27 PM, Lukas Straub wrote: > On Tue, 4 Aug 2020 12:46:29 +0200 > Lukas Straub <lukasstra...@web.de> wrote: > >> Hello Everyone, >> So here is v3. Patch 1 can already be merged independently of the others. >> Please review. >> >> Regards, >> Lukas Straub >> >> Based-on: <cover.1596528468.git.lukasstra...@web.de> >> "Introduce 'yank' oob qmp command to recover from hanging qemu" >> >> Changes: >> >> v3: >> -resource-agent: Don't determine local qemu state by remote master-score, >> query >> directly via qmp instead >> -resource-agent: Add max_queue_size parameter for colo-compare >> -resource-agent: Fix monitor action on secondary returning error during >> clean shutdown >> -resource-agent: Fix stop action setting master-score to 0 on primary on >> clean shutdown >> >> v2: >> -use new yank api >> -drop disk_size parameter >> -introduce pick_qemu_util function and use it >> >> Overview: >> >> Hello Everyone, >> These patches introduce a resource agent for fully automatic management of >> colo >> and a test suite building upon the resource agent to extensively test colo. >> >> Test suite features: >> -Tests failover with peer crashing and hanging and failover during checkpoint >> -Tests network using ssh and iperf3 >> -Quick test requires no special configuration >> -Network test for testing colo-compare >> -Stress test: failover all the time with network load >> >> Resource agent features: >> -Fully automatic management of colo >> -Handles many failures: hanging/crashing qemu, replication error, disk >> error, ... >> -Recovers from hanging qemu by using the "yank" oob command >> -Tracks which node has up-to-date data >> -Works well in clusters with more than 2 nodes >> >> Run times on my laptop: >> Quick test: 200s >> Network test: 800s (tagged as slow) >> Stress test: 1300s (tagged as slow) >> >> For the last two tests, the test suite needs access to a network bridge to >> properly test the network, so some parameters need to be given to the test >> run. See tests/acceptance/colo.py for more information. >> >> Regards, >> Lukas Straub >> >> Lukas Straub (7): >> block/quorum.c: stable children names >> avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries >> boot_linux.py: Use pick_qemu_util >> colo: Introduce resource agent >> colo: Introduce high-level test suite >> configure,Makefile: Install colo resource-agent >> MAINTAINERS: Add myself as maintainer for COLO resource agent >> >> MAINTAINERS | 6 + >> Makefile | 5 + >> block/quorum.c | 20 +- >> configure | 10 + >> scripts/colo-resource-agent/colo | 1501 +++++++++++++++++++++ >> scripts/colo-resource-agent/crm_master | 44 + >> scripts/colo-resource-agent/crm_resource | 12 + >> tests/acceptance/avocado_qemu/__init__.py | 15 + >> tests/acceptance/boot_linux.py | 11 +- >> tests/acceptance/colo.py | 677 ++++++++++ >> 10 files changed, 2286 insertions(+), 15 deletions(-) >> create mode 100755 scripts/colo-resource-agent/colo >> create mode 100755 scripts/colo-resource-agent/crm_master >> create mode 100755 scripts/colo-resource-agent/crm_resource >> create mode 100644 tests/acceptance/colo.py >> >> -- >> 2.20.1 > > Ping... >
Cleber, Wainer, can you have a look at tests/acceptance/colo.py please?