Hello Yoshimi-devs, last days I made a rather nasty observation -- unfortunately without reaching any conclusive results. It started when I noticed a small numeric difference between the waveform as generated before the "padthread"-refactoring. While in itself, this difference is certainly too small to be noticeable (-70dB), I still wanted to find out about the reason, because the "padthrad" branch brings many rather deep refactorings regarding memory management, and moreover I attempted to make the usage of overtone index numbers coherent, and thus I could have introduced a subtle bug somewhere. So I drilled into the most simple test showing those differences: BasicADD.test I applied my changeset for dumping intermediary computation results, and it was immediately obvious, that the differences seem to emerge from our old friend, the AnalogFilter. However, I did /not/ change anything of relevance there, and so I added detailed dumping of the Filter's internal pipelines. And this showed, that a smallish difference is injected into the filter's *input* line, intermittently and irregularly (but totally reproducible). After about 5 buffers of computation, those differences have accumulated sufficiently within the filter's feedback line, to become noticeable on the global output, and over time, the differences build up more and more. Now this observation raised some concerns, since the input of the Filter must be drawn from the wavetable, right? However a complete dump of the generated spectra and wavetables showed absolutely no difference -- which in itself is comforting and resolves some of my apprehension. But what causes those damn differences then? The only relevant part in between is the *interpolation* applied when reading the wavetable. This is a piece of code where the computation spends a considerable fraction of the overall synth generation time, and it is highly optimised.
int poshi = oscposhi[nvoice][k]; int poslo = oscposlo[nvoice][k] * (1<<24); int freqhi = oscfreqhi[nvoice][k]; int freqlo = oscfreqlo[nvoice][k] * (1<<24); float *smps = NoteVoicePar[nvoice].OscilSmp; float *tw = tmpwave_unison[k]; for (int i = 0; i < synth->sent_buffersize; ++i) { tw[i] = (smps[poshi] * ((1<<24) - poslo) + smps[poshi + 1] * poslo) / (1.0f*(1<<24)); poslo += freqlo; poshi += freqhi + (poslo>>24); poslo &= 0xffffff; poshi &= synth->oscilsize - 1; } oscposhi[nvoice][k] = poshi; oscposlo[nvoice][k] = poslo/(1.0f*(1<<24));
As you can see, the distance between samples in the wavetable is quantised to an integer with 24 bit, which is reasonable, since the floating point mantissa is known to have a maximum resolution between 23 and 24 bits: * the smallest float number different from 1.0 is 1.0 + 2^-23 * the largest float number different from 1.0 is 1.0 - 2^-24 Btw, 2^-23 / 1.0 corresponds to an attenuation of -138dB (FS) So this is the finest step, which can be represented on a waveform rendered in float at maximum amplitude (the resolution for smaller values is finer, since they are represented using a smaller exponent, thus the step at maximum amplitude is the weak spot of float samples). And indeed, it turned out, the code quoted above produces intermittent numeric glitches, when compiled with optimisation. And those glithes are *much larger than at the least bit*. I saw flips of the 5th and the 6th last bit of the Mantissa. Here I used the "forThisCPU" setting, which translates into
-O3 -march=native -mtune=native
NOTE: this setting does not use --fast-math and thus the reason must be the well known "impedance mismatch" between the normal processor floating point engine and the SSE extensions. At least that is my conclusion. Now, what triggered those differences? The code isn't directly changed by "padthread". But the meaning of the access operator was changed. * in the old code, tw[i] and smps[poshi] did an indexed access via float* * in the new code, these are overloaded inline operators And, seemingly, here the introduction of that changed memory access prevented the more aggressive Optimisation by the compiler. The generated assembly in fact looks quite different (while understanding the details beyond some landmarks surpasses my knowledge of assembly and CPU internals) However, I have verified the numbers at several incidents of the difference. Both with a standalone C++ program with floating point nubmers, and with the calculator desktop application "speedcrunch". In all cases, the new code produced more accurate numbers. However, when I add dumping of intermediary results, both the old and the new code produce identical (and more accurate) results. So this thing qualifies as "Heisenbug" So what can we do? Nothing, it seems! We are at the mercy of the compilers/optimisers + the innards of the CPU, which just happen to flip some minor bits in the mantissa if they feel like it. Whew. In the and we should call ourselves happy when all we have to worry about are some minor bits in the Mantissa within sound synthesis table interpolation. -- Hermann _______________________________________________ Yoshimi-devel mailing list Yoshimi-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/yoshimi-devel