Hi Wainer, As Cleber is busy with Gating CI, can you review tests/acceptance/colo.py please?
On 8/27/20 10:40 AM, Lukas Straub wrote: > On Tue, 18 Aug 2020 14:27:01 +0200 > Lukas Straub <lukasstra...@web.de> wrote: > >> On Tue, 4 Aug 2020 12:46:29 +0200 >> Lukas Straub <lukasstra...@web.de> wrote: >> >>> Hello Everyone, >>> So here is v3. Patch 1 can already be merged independently of the others. >>> Please review. >>> >>> Regards, >>> Lukas Straub >>> >>> Based-on: <cover.1596528468.git.lukasstra...@web.de> >>> "Introduce 'yank' oob qmp command to recover from hanging qemu" >>> >>> Changes: >>> >>> v3: >>> -resource-agent: Don't determine local qemu state by remote master-score, >>> query >>> directly via qmp instead >>> -resource-agent: Add max_queue_size parameter for colo-compare >>> -resource-agent: Fix monitor action on secondary returning error during >>> clean shutdown >>> -resource-agent: Fix stop action setting master-score to 0 on primary on >>> clean shutdown >>> >>> v2: >>> -use new yank api >>> -drop disk_size parameter >>> -introduce pick_qemu_util function and use it >>> >>> Overview: >>> >>> Hello Everyone, >>> These patches introduce a resource agent for fully automatic management of >>> colo >>> and a test suite building upon the resource agent to extensively test colo. >>> >>> Test suite features: >>> -Tests failover with peer crashing and hanging and failover during >>> checkpoint >>> -Tests network using ssh and iperf3 >>> -Quick test requires no special configuration >>> -Network test for testing colo-compare >>> -Stress test: failover all the time with network load >>> >>> Resource agent features: >>> -Fully automatic management of colo >>> -Handles many failures: hanging/crashing qemu, replication error, disk >>> error, ... >>> -Recovers from hanging qemu by using the "yank" oob command >>> -Tracks which node has up-to-date data >>> -Works well in clusters with more than 2 nodes >>> >>> Run times on my laptop: >>> Quick test: 200s >>> Network test: 800s (tagged as slow) >>> Stress test: 1300s (tagged as slow) >>> >>> For the last two tests, the test suite needs access to a network bridge to >>> properly test the network, so some parameters need to be given to the test >>> run. See tests/acceptance/colo.py for more information. >>> >>> Regards, >>> Lukas Straub >>> >>> Lukas Straub (7): >>> block/quorum.c: stable children names >>> avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries >>> boot_linux.py: Use pick_qemu_util >>> colo: Introduce resource agent >>> colo: Introduce high-level test suite >>> configure,Makefile: Install colo resource-agent >>> MAINTAINERS: Add myself as maintainer for COLO resource agent >>> >>> MAINTAINERS | 6 + >>> Makefile | 5 + >>> block/quorum.c | 20 +- >>> configure | 10 + >>> scripts/colo-resource-agent/colo | 1501 +++++++++++++++++++++ >>> scripts/colo-resource-agent/crm_master | 44 + >>> scripts/colo-resource-agent/crm_resource | 12 + >>> tests/acceptance/avocado_qemu/__init__.py | 15 + >>> tests/acceptance/boot_linux.py | 11 +- >>> tests/acceptance/colo.py | 677 ++++++++++ >>> 10 files changed, 2286 insertions(+), 15 deletions(-) >>> create mode 100755 scripts/colo-resource-agent/colo >>> create mode 100755 scripts/colo-resource-agent/crm_master >>> create mode 100755 scripts/colo-resource-agent/crm_resource >>> create mode 100644 tests/acceptance/colo.py >>> >>> -- >>> 2.20.1 >> >> Ping... > > Ping 2... > > Kevin, can you already apply patch 1 "block/quorum.c: stable children names"? > It resolves the following bug: https://bugs.launchpad.net/qemu/+bug/1881231 > > Regards, > Lukas Straub >