On 2012-03-06, Fons Adriaensen wrote:

I'd agree with what Sampo wrote in a previous post: it should be possible to find out the encoding by analysing the recordings. This is just a bit of 'applied cryptology', and since everything is assumed to be linear in this case, rather easy cryptology.

So what are the algorithms -- preferably simple -- which allow us to do this? As a preservation freak, I'd like to get at some actual, programmatic, blind solutions, here, so that simply digitizing whatever people have and feeding it through a program (which was not too difficult to develop) will with high accuracy both identify+tag the encoding, and in some cases also decode it, and definitely transcode it losslessly, into a preservable format.

Personally, I would approach this sort of thing as a statistical classification problem. There's neat, powerful math to be used there, ready-made code, and it meshes well with digital signal processing even in its simplest forms (audio signals tend to be Gaussian thanks to the central limit theorem, and so on).

The simplest approach, I think, would be to build a) a model of a generic encoder which includes all of the formats we care about, b) a collection of source material, preferably including at least one pair of pre-encoding and post-encoding signals per format, and c) some machinery which plots out how the signals behave e.g. on the Scheiber sphere. After all, if the different formats distinguish themselves, long term, in a single analysis like that on the Sphere, there's no need to do anything more sophisticated: we just aggregate all of the loci which the signals reach and see which complete encoding locus best fits the resulting trajectory the best.

The code to do that involves two Hilbert transformers, minor complex modulo and binning arithmetic, and a big enough three dimensional (hash?) array. (Combined channel amplitude can be normalized away.)

If that doesn't work, then we might have to build a source model and do some cross-covariance analysis against it. (Long term averaging of the recording side background noise alone ought to bring up the locus.)

Having a source model ought to let us identify the center front and any diffuse channels, as in Dolby MP, while L/R separation would probably be an issue unless the orientation was given as metadata. Up-down, and vertical vs. horizontal, ought to be easy enough with just about any material. So in theory, arbitrary channel orders (with more than two channels as well) ought to be doable to a degree, then.

Next in line, t'd seem that at least separation of diffuse sound from direct sources and other DirAC-like processing could help track the prominent sources, and so to see which kinds of loci they follow. Averaging over time, they again ought to plot out the average encouding locus, and combined with some kind of a (possibly adaptive and frequency sensitive) source model, they could help us finally tell even the evilest, most actively encoded pieces apart.

And if nothing else really works, perhaps we need to build an actual encoder for each format we want to discern, some backwards projecting code through them, and just run them all at the same time to see which one produces the best simulacrum of what we actually have, starting from the best commonal, adaptive, source model? That's already evil, but it does work: I've once used that to blindly identify languages behind unknown character encodings, for example. (The basic framework is simple as hell, but then your Bayesian arithmetic easily bumps into precision trouble. If we have to go this far, we'll have to recruit a numerical analyst as well.)

Fons, does this seem like a research outline, already? :)
--
Sampo Syreeni, aka decoy - [email protected], http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
[email protected]
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to