On Wed, Sep 18, 2019 at 05:16:54PM +1000, David Gibson wrote: > Hi, > > I'm finding make check-acceptance is currently useless for me as a > pre-pull test, because a bunch of the tests are not at all reliable. > There are a bunch which I'm still investigating, but for now I'm > looking at the MIPS Malta SSH tests. > > There seem to be at least two problems here. First, the test includes > a download of a pretty big guest disk image. This can easily exhaust > the 2m30 timeout on its own. >
You're correct that successes and failures on those tests depend largely on bandwith. On a shared environment I used for tests the download of those images take roughly 400 seconds, resulting in failures. On my own machine, around 60, and the tests pass. There's a conceptual and conflicting problem in that the environment for tests to run should be prepared beforehand. The conflicting solutions can be: * extensive bootstrapping of the test execution environment, such as the installation of guests from ISOs or installation trees, or the download of "default" images wether the tests will use it or not (this is what Avocado-VT does/requires) * keeping test assets in the tree (Avocado allows this if you have a your_test.py.data/ directory), but it's not practical for large files or files that can't or shouldn't be redistributed > Even without the timeout, it makes the test really slow, even on > repeated runs. Is there some way we can make the image download part > of "building" the tests rather than actually running the testsuite, so > that a) the test themselves go faster and b) we don't include the > download in the test timeout - obviously the download speed is hugely > dependent on factors that aren't really related to what we're testing > here. > On Avocado version 72.0 we attempted to minimize the isse by implementing a "vmimage" command. So, if you expect to use Fedora 30 aarch64 images, you could run before your tests: $ avocado vmimage get --distro fedora --distro-version 30 --arch aarch64 And to list the images on your cache: $ avocado vmimage list Unfortunately, this test doesn't use the vmimage API. Actually that is fine because not all test assets map nicely to the vmimage goal, and should keep using the more generic (and lower level) fetch_asset(). We're now working on various "asset fetcher" improvements that should allow us to check/cache all assets before a test is executed. Also, we're adding a mode in which the "fetch_asset()" API will default to cancel (aka SKIP) a test if the asset could not be downloaded. If you're interested in the card we're using to track that new feature: https://trello.com/c/T3SC1sZs/1521-implement-fetch-assets-command-line-parameter Another possibility that we've prototyped, and we'll be working on further, is to make a specific part of the "test" code execution (really a pre-test phase) to be executed without a timeout and even be tried a number of times before bailing out and skipping the test. > In the meantime, I tried hacking it by just increasing the timeout to > 10m. That got several of the tests working for me, but one still > failed. Specifically 'LinuxSSH.test_mips_malta32eb_kernel3_2_0' still > timed out for me, but now after booting the guest, rather than during > the image download. Looking at the avocado log file I'm seeing a > bunch of soft lockup messages from the guest console, AFAICT. So it > looks like we have a real bug here, which I suspect has been > overlooked precisely because the download problems mean this test > isn't reliable. > I've schedulled a 100 executions of `make check-acceptance` builds, with the linux_ssh_mips_malta.py tests having a 1500 seconds timeout. The very first execution already brought interesting results: ... (15/39) /home/cleber/src/qemu/tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32eb_kernel3_2_0: PASS (198.38 s) (16/39) /home/cleber/src/qemu/tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64el_kernel3_2_0: FAIL: Failure message found in console: Oops (22.83 s) I'll let you know about my full results. This should also serve as a starting point to a discussion about the reliability of other tests, as you mentioned before. In my experience, and backed by the executions on Travis, most tests have been really stable on x86_64 hosts. Last week I've worked in ppc64 and aarch64 hosts, and posted a number of patches addressing the failures I found. I'll compile a list of the posted patches and their status. Thanks for reporting those issues. - Cleber. > Any thoughts on how to improve the situation? > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ > _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson