On 2022-09-13, Fons Adriaensen wrote:
Even if that case it isn't as simple as you seem to think.
Obviously I simplify. I know a thing or two already.
Any set of measured HRIR will need some non trivial preprocessing
before it can be used. One reason is low-frequency errors. Accurate IR
measurements below say 200 Hz are difficult (unless you have a very
big and good anechoic room). OTOH we know that HRIR in that frequency
range are very low order and can be synthesised quite easily.
As such we put in a priori knowledge into the model, and/or somehow
repeat the measurement, coherently adding the resulting signal so as to
bring down the noise floor.
Another reason is that you can't reduce a set of HRIR to low order
(the order of the content you want to render) without introducing
significant new errors.
I believe the systematic way to talk about this is to see how
directional sources reduced to a central framework is about the
Fourier-Bessel reduction, which doesn't reduce easily to the rectilinear
Fourier decomposition. Even their reduced orders aren't comparible. In
each frame, a low order, finite order field which looks perfectly
even/in-quadrature, in the other has an infinite order decomposition.
But they work pretty well against each other for most outside-of-the-rig
sources. The higher order crossterms in the transform fall off fast, so
that you can approximate pretty well either way. That's where the
Daniel, Nicol & Moreau NFC-HOA paper comes from. (They also did Hankel
functions, in outwards going energy transfer. Their solution is exact,
and they've talked about the connection to rectangular WFS, but even
they didn't quantify this all fully.)
One way to reduce these is to reduce or even fully remove ITD at mid
and high frequencies, again depending on the order the renderer is
supposed to support.
This is all implicit in the order of reconstruction. ITD is just
derivative of the soundfield over your ears. Of course Gerzon took
Makita theory, but the latter is derivable from first the acoustic wave
equation, and then its ecologically minded, reduced psychoacoustics.
Once you go to third order ambisonic or beyond, no psychoacoustics are
necessary.
Getting the magnitudes (and hence ILD) accurate requires much lower
order than if you also want to keep the delays.
My point is that in binaural work, especially if head tracked, you can
easily get to order twenty or so. No ITD/ILD-analysis needed, because
it'll mimic physical reality.
If we can just do it right. How do we, from a sparse and anisotropic
meaasurement set?
Compared to these and some other issues, not having a set on a regular
grid (e.g. t-design or Lebedev) is the least of problems you will
encounter.
Tell me more? I don't recognize these ones just yet.
There are other considerations. For best results you need head
tracking and a plausible room sound (even if the content already
includes its own).
In plausible room reverberation, I might have some ideas as well. :)
The practical solutions do not depend on such concepts and are much
more ad-hoc. Some members of my team and myself worked on them for the
last three years. Most of the results are confidential, although
others (e.g. IEM) have arrived at some similar results and published
them.
IEM is serious shit. If the results are confidential, so be it, but at
the same time, if I'd have something to contribute, I'd happily sign an
NDA. Just to be in the know, and contribute; I've done it before, and
would do again.
It'd just be fun to actually solve or at least quantify the limits of
this particular mathematical problem. How well can you actually go from
say a KEMAR set to a binaural-to-ambisonic rendition? Isotropically? If
you don't really have the bottomwards set? How do you interpolate, and
where's your overall error metric? And so on?!? My problem.
Another question is if for high quality binaural rendering, starting
from Ambisonic content is a good idea at all.
Obviously it isn't, in all cases. For example, if your content is such
that the listener mostly looks forward, as say towards a movie screen, a
fully isotropic ambisonic signal of any order wastes bandwidth/entropy.
A lot of it. Even at first order POA, probably something like 2+epsilon
channels worth of it. If they used just periphonic ambisonic, they'd get
more typical theatrical sounds. In fact, if they really optimized the
thing for frontal sound, and maybe a supplemental, theatrically minded
surround track, such as in Dolby Surround, maybe it'd need even less.
But the thing is, ambisonic has always been an excercise in generality,
regularity, and so virtual auditory reality, with mathematical
certainty. Above instant efficiency or cheapness. It's never been about
what is easy, but about being able to look allround, and to perceive the
same isotropic soundfield, even if you look up or down. The auditory
idea of what we now call