On Sat, 11 Sep 2021 15:15:16 +0200 Ichthyostega <p...@ichthyostega.de> wrote:
>Hi Will, >Hi Kristian, > >when we run a testcase, we get a timing measurement as by-product, >and this sparks our hopes for a convenient safety net, allowing to >spot consequences of ongoing code changes early. > >However, there is a catch: timings are highly platform dependant, >and they tend to fluctuate randomly. Naively raised timing factors >wouldn't be comparable, and we'd have to throw away all past data >on each environment change. To alleviate this complication, we may >seek for ways to factor out influences of platform and environment. > >This is possible by discerning between changes within the software >and timing variations due to environmental changes. In the latter >case, when we're sure the software as such is "the same", we may >then re-calibrate the test setup -- ideally making all tests pass >again within limits. > >Now the question is: how elaborate does this need to be? >If we use an oversimplified model, some tests will still fail >after calibration within a different run environment. We'd have to >raise the tolerances then to cover the shortfall of our simplistic >model, thereby hampering our ability to spot a change trend due to >code reworking. On the other hand, some overengineered, excessively >elaborate model would trace ephemeral patterns of behaviour, and >blind our ability to recognise what is essentially unchanged. > >And this whole dilemma is somewhat sobering, insofar we can not know, >up front, what is the proper middle ground. Does it suffice to use >just a simple /platform factor/? Do we need linear regression, or >even multiple factors (speed per sample and a socket per note)? >Will we have to face tangible quadratic growth of expenses? > >All we know now is that we'll need some experimentation and tweaking >until timing measurements will be smooth and painless and comparable. > >-- Hermann Interesting (ha!) points. However... Should we perhaps first define out reference conditions as far as possible. My own broad-brush 'best' performance is with a sample rate of 48kHz, and buffer size of either 64 or 128 frames. As were are working internally almost entirely with standard floating point or integer I would guess bit depth would only be relevant for ALSA audio - and in the test suite we are not using any audio output. Some thoughts on time & Yoshimi. Does Yoshimi actually know anything about time? It knows about samples, the steps it has to make between them (whether they be floats or ints), the size of these steps and the number of them. The test suite is recording these samples. Does it actually matter if one processor performs a background task between steps 20 and 21, while another does the same between 40 and 41, or even does two different tasks? Surely the test will remain the same, although if the processor was especially over-stressed there may be audible errors in how that recording is presented. We know there is a problem with envelopes and increasing buffer sizes, but that is down to a miscalculation on our part (and the attack truncation is present right back to Zyn 2.2.1). Also, though I can usually work at 32 frames, I don't because I then notice a small variation on some sounds. -- Will J Godfrey https://willgodfrey.bandcamp.com/ http://yoshimi.github.io Say you have a poem and I have a tune. Exchange them and we can both have a poem, a tune, and a song. _______________________________________________ Yoshimi-devel mailing list Yoshimi-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/yoshimi-devel