[Sursound] is it *necessary* to mix orders together?

2023-06-11 Thread Sampo Syreeni

On 2023-06-12, Sampo Syreeni wrote:

But my emphasis is on the question, if a decode of 3rd *and* 7th 
order information - yielding in one encoded file - would be 
mathematically correct if it comes to the decoding of the higher 
order content. Would there be missing something (maybe an overall 
lower amplitude of the third order content)?


If you do it *wrong*, you'll get spatial aliasing. This is a big part of 
getting the original first order decoding equation right. It won't sound 
right even in the quadraphonic LTI Makita framework which the founders 
of ambisonics were aiming at.


It's even more difficult to do active decoding from there.

In the higher order content, let's talk about it in private, and then in 
public. Because I know about this a bit already. Note: it's probably 
about directional or locational interfence.

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] Is it possible to mix ambisonic encoded information of different order?

2023-06-11 Thread Sampo Syreeni

On 2023-06-01, Jan Jacob Hofmann wrote:

is it possible/ reasonable to mix ambisonic encoded information of 
different order?


It's possible and it's reasonable, and as Fons Adriansen said above, at 
the rather high orders you're talking about, it's not much below 
optimality either. This has also been talked about in the past, with the 
— granted, a bit of a shocking — revelation to me and some others, that 
actually orders mixed this way do *not* automatically decode optimally 
in either decoder.


But theoretically, this ought to be purely a decoding side issue. When 
you're mixing into or in B-format, you're essentially dealing with an 
isotropic approximation of a soundfield, around a central point. That 
approximation is always a physical one, and in ambisonic work, it's 
going to be orthogonal by the basic math. If you want to add extra 
directional accuracy, you'll add orders to your directional 
decomposition. If you can't or won't, then you don't. But in the end, 
the fact that the (3D) Fourier-Bessel series rightly normalized (too) 
preserves the power of point sources, and is an isotropic decomposition 
of an inbound far field, guarantees that the *only* thing you lose in 
lower order is directional accuracy. When going to B-format, the one 
meant to capture the physics, mixing two orders cannot lose anything.


So the real trouble comes when decoding B-format into D-format. If you 
have a set of first order, POA signals, you have one particular, optimal 
equation set for how you'd lay the sound out over your speakers. If you 
had a second order HOA signal, running into something like 5.1, the 
optimal set differs quite a lot, especially in the higher frequencies, 
since the theory doesn't work by easy interference principles there, but 
by second order psychoacoustical ones, coming from the stereo work of 
Makita. Solving the problem optimally becomes rather finicky.


Then, solving it for mixed orders (not usually a term used for this 
situation, but for leaving out certain spherical harmonics, e.g. for 
horizontal, pantophonic work), is even messier. How could we know in 
decoding only, blindly, that we have a superimposition of say first and 
second (arbitrary?) order signals, so that we could apply the optimum 
decoding rule to them all, at the same time?


I've been toying around with this problem for a decade or so, and 
haven't found a satisfactory solution to it all. My intuition says 
this has something to do with non-negative matrix factorization and 
convex optimization, but even if that's it, I'm not quite there yet.


From Dolby Surround and HARPEX -like things I've been toying around with 
doing them in the pure spherical harmonical domain to arbitrary order; a 
generizable infinite order decoder; in DirAC kind of stuff I've been 
toying around with just tensoring the STFT/MDCT-domain with the 
directional Fourier domain, complexly; and then some classical LTI DSP 
statistical learning and information/compression/rate-distortion theory 
on top. In an effort to solve the problem of how to make full spatial 
audio pack well.


And then there was NFC-HOA. I was already making some progress, but that 
totally stopped me. In that one, you an mix several orders of signals, 
but suddenly you can't mix ones of separate radii. Fuck, back to the 
drawing board for me as well. :/


The sound-information (synthesized) is encoded in Ambisonic 7th order 
while the spatial reverberation of that very sound is encoded „only“ 
to third order.


In fact Fons asked you already: why go to such a high order? You'd need 
an extraordinary number of speakers to utilize such a signal. Also, an 
extraordinary computing power and a lot of real life meaasurement of 
your speaaker rig to even align your decoding solution optimally. 
Whereas in low, matched order, you can do it right with a day's 
computation time.


Reason for doing so: My reverberant information comes from several 
directions in space. If these would not have to be encoded all up to 
7th order, it would save some calculation time and computation effort.


They really don't have to. Take a look at Ville Pulkki's DirAC work, 
here in Finland. The gist of it is that it reconstructs both specular 
sources and reverberation, separately. The first part is identified via 
time coherence, averaging, much like Dolby Surround does it in its four 
constrainted channels, and like HARPEX does it better in the ambisonic 
work.


Ville's work however is fully general and frequency dependent in its 
source recognition. And it goes beyond: it actually tries to identify 
reverberant modes from a SoundField, by using the imaginary axis of the 
Fourier transformation in time to recognize reverberant modes. Which has 
also been discussed years before on-list, when Angelo (I think) talked 
about his car interiors.


Also the reverberant information may well be more „blurry“ in respect 
to the actual sound, as it may stay in the background of perception