Hi Will,
Hi Kristian,

when we run a testcase, we get a timing measurement as by-product,
and this sparks our hopes for a convenient safety net, allowing to
spot consequences of ongoing code changes early.

However, there is a catch: timings are highly platform dependant,
and they tend to fluctuate randomly. Naively raised timing factors
wouldn't be comparable, and we'd have to throw away all past data
on each environment change. To alleviate this complication, we may
seek for ways to factor out influences of platform and environment.

This is possible by discerning between changes within the software
and timing variations due to environmental changes. In the latter
case, when we're sure the software as such is "the same", we may
then re-calibrate the test setup -- ideally making all tests pass
again within limits.

Now the question is: how elaborate does this need to be?
If we use an oversimplified model, some tests will still fail
after calibration within a different run environment. We'd have to
raise the tolerances then to cover the shortfall of our simplistic
model, thereby hampering our ability to spot a change trend due to
code reworking. On the other hand, some overengineered, excessively
elaborate model would trace ephemeral patterns of behaviour, and
blind our ability to recognise what is essentially unchanged.

And this whole dilemma is somewhat sobering, insofar we can not know,
up front, what is the proper middle ground. Does it suffice to use
just a simple /platform factor/? Do we need linear regression, or
even multiple factors (speed per sample and a socket per note)?
Will we have to face tangible quadratic growth of expenses?

All we know now is that we'll need some experimentation and tweaking
until timing measurements will be smooth and painless and comparable.

-- Hermann





_______________________________________________
Yoshimi-devel mailing list
Yoshimi-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/yoshimi-devel

Reply via email to