Re: [Sursound] about principled rendering of ambisonic to binaural

2022-10-16 Thread Sampo Syreeni

On 2022-09-17, Ralph Jones wrote:

But the subject is of real concern for me, because I am currently 
working in 5.1.4 surround format (channel-based, not Atmos) and I 
would dearly love to find a mac-compatible VST plugin that would 
convincingly render my work in binaural.


Hmm. Sursound has been a bit of a mathematicians' list for some time. 
Ambisonic and later on WFS. Adverse to the more usual 
x.y.z-sound-systems.


What if we now finally did some usable code or examples? Us fiends?

So, is there a plugin that does what Fons describes here? (i.e., given 
azimuth and elevation for each channel, render the signals to binaural 
convincingly, including an impression of elevation for height 
channels.)


You gave a channel arrangement. Or a speaker arrangement. That 5.1.4 
arrangement sort of tells us what you have, or where, but not 
*precisely*. It doesn't tell at which precise angles or at which 
distances.


As such, it doesn't tell us how those various channels *sound* to a 
listener. Not precisely. So it's impossible to even start to render them 
into binaural. Also, you'd need to specify a model of your ears, which 
you didn't give. (The KEMAR-model is a model of ears, so I'd probably 
start with those. But they are not *your* ears, but a make-do 
average-sounding one. Plus the set is symmetrized, unlike anybodys 
real.)

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] about principled rendering of ambisonic to binaural

2022-10-16 Thread Sampo Syreeni

On 2022-09-13, Fons Adriaensen wrote:


Even if that case it isn't as simple as you seem to think.


Obviously I simplify. I know a thing or two already.

Any set of measured HRIR will need some non trivial preprocessing 
before it can be used. One reason is low-frequency errors. Accurate IR 
measurements below say 200 Hz are difficult (unless you have a very 
big and good anechoic room). OTOH we know that HRIR in that frequency 
range are very low order and can be synthesised quite easily.


As such we put in a priori knowledge into the model, and/or somehow 
repeat the measurement, coherently adding the resulting signal so as to 
bring down the noise floor.


Another reason is that you can't reduce a set of HRIR to low order 
(the order of the content you want to render) without introducing 
significant new errors.


I believe the systematic way to talk about this is to see how 
directional sources reduced to a central framework is about the 
Fourier-Bessel reduction, which doesn't reduce easily to the rectilinear 
Fourier decomposition. Even their reduced orders aren't comparible. In 
each frame, a low order, finite order field which looks perfectly 
even/in-quadrature, in the other has an infinite order decomposition.


But they work pretty well against each other for most outside-of-the-rig 
sources. The higher order crossterms in the transform fall off fast, so 
that you can approximate pretty well either way. That's where the 
Daniel, Nicol & Moreau NFC-HOA paper comes from. (They also did Hankel 
functions, in outwards going energy transfer. Their solution is exact, 
and they've talked about the connection to rectangular WFS, but even 
they didn't quantify this all fully.)


One way to reduce these is to reduce or even fully remove ITD at mid 
and high frequencies, again depending on the order the renderer is 
supposed to support.


This is all implicit in the order of reconstruction. ITD is just 
derivative of the soundfield over your ears. Of course Gerzon took 
Makita theory, but the latter is derivable from first the acoustic wave 
equation, and then its ecologically minded, reduced psychoacoustics. 
Once you go to third order ambisonic or beyond, no psychoacoustics are 
necessary.


Getting the magnitudes (and hence ILD) accurate requires much lower 
order than if you also want to keep the delays.


My point is that in binaural work, especially if head tracked, you can 
easily get to order twenty or so. No ITD/ILD-analysis needed, because 
it'll mimic physical reality.


If we can just do it right. How do we, from a sparse and anisotropic 
meaasurement set?


Compared to these and some other issues, not having a set on a regular 
grid (e.g. t-design or Lebedev) is the least of problems you will 
encounter.


Tell me more? I don't recognize these ones just yet.

There are other considerations. For best results you need head 
tracking and a plausible room sound (even if the content already 
includes its own).


In plausible room reverberation, I might have some ideas as well. :)

The practical solutions do not depend on such concepts and are much 
more ad-hoc. Some members of my team and myself worked on them for the 
last three years. Most of the results are confidential, although 
others (e.g. IEM) have arrived at some similar results and published 
them.


IEM is serious shit. If the results are confidential, so be it, but at 
the same time, if I'd have something to contribute, I'd happily sign an 
NDA. Just to be in the know, and contribute; I've done it before, and 
would do again.


It'd just be fun to actually solve or at least quantify the limits of 
this particular mathematical problem. How well can you actually go from 
say a KEMAR set to a binaural-to-ambisonic rendition? Isotropically? If 
you don't really have the bottomwards set? How do you interpolate, and 
where's your overall error metric? And so on?!? My problem.


Another question is if for high quality binaural rendering, starting 
from Ambisonic content is a good idea at all.


Obviously it isn't, in all cases. For example, if your content is such 
that the listener mostly looks forward, as say towards a movie screen, a 
fully isotropic ambisonic signal of any order wastes bandwidth/entropy. 
A lot of it. Even at first order POA, probably something like 2+epsilon 
channels worth of it. If they used just periphonic ambisonic, they'd get 
more typical theatrical sounds. In fact, if they really optimized the 
thing for frontal sound, and maybe a supplemental, theatrically minded 
surround track, such as in Dolby Surround, maybe it'd need even less.


But the thing is, ambisonic has always been an excercise in generality, 
regularity, and so virtual auditory reality, with mathematical 
certainty. Above instant efficiency or cheapness. It's never been about 
what is easy, but about being able to look allround, and to perceive the 
same isotropic soundfield, even if you look up or down. The auditory 
idea of what we now call 

Re: [Sursound] about principled rendering of ambisonic to binaural

2022-10-16 Thread Sampo Syreeni

On 2022-09-11, Picinali, Lorenzo wrote:


https://acta-acustica.edpsciences.org/articles/aacus/abs/2022/01/aacus210029/aacus210029.html


Thank you, will read.
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.