Hi Stefan, No it is not only you :-), I thought I was clear that these references are totally synthetic. We just try to make sure that they are reproducible, they do sound natural, they are physically-based and they do not rely on any perceptual spatialization methods themselves (ambisonic, panning or anything else). Then we consider that as a plausible reference. As I said we do not have access to an original soundfield, so synthetic is our next best call.
This relates to the difficult question, I believe, of what is the best way to assess transparency in a reproduction method of spatial recordings (compared, for example, to transparency of spatial audio coding with playback of 5.0 material and its spatially compressed version, which is a much easier task since there is a clear reference). For most cases transparency is not of interest, and an overall perceptual quality is more important. However we have done these comparisons in the way I described, published the results and somebody interested can extract their own conclusions. And if they're good for DirAC decoding, then maybe they're good for other decoding approaches. Regards, Archontis > On 05 Jul 2016, at 21:23, Stefan Schreiber <st...@mail.telepac.pt> wrote: > > Politis Archontis wrote: > >> >> We start by setting up a large dense 3D loudspeaker setup in a fully >> anechoic chamber (usually between 25~35 speakers at a distance of ~2.5m), so >> that there is no additional room effect at reproduction. Then we decide on >> the composition of the sound scene (e.g. band, speakers, environmental >> sources), their directions of arrival and the surrounding room >> specifications. We then generate room impulse responses (RIR) using a >> physical room simulator for the specified room and source positions. We end >> up with one RIR for each speaker and for each source in the scene. >> Convolving these with our tests signals and combining the results we end up >> with an auralization of the intended scene. This part uses no spatial sound >> method at all, no panning for example - if a reflection falls between >> loudspeakers it is quantized to the closest one. The final loudspeaker >> signals we consider as the reference case (after listening to it and >> checking if it sounds ok). >> > > Is it only me to notice that these "original scenes" look highly synthetical? > > Maybe good for DirAC encoding/decoding, but a natural recording this is not... > > BR > > Stefan > > P.S.: (Richard Lee ) > >> Some good examples of 'natural' soundfield recordings with loadsa stuff >> happening from all round are Paul Doombusch's Hampi, JH Roy's schoolyard & >> John Leonard's Aran music. >> > > -------------------------------------------------------------------------- > > >> Then we generate our recordings from that reference. either by encoding >> directly to ambisonic signals, by simulating a microphone array recording, >> or by putting a Soundfield or other microphone at the listening spot and >> re-recording the playback. These have been dependent on the study. >> >> Finally the recordings are processed, and decoded back to the loudspeakers, >> usually to a subset of the full setup (e.g. horizontal, discrete surround, >> small 3D setup), or even to the full setup. That allows us to switch >> playback between the reference and the method. >> >> The tests have been usually MUSHRA style, where the listeners are asked to >> judge perceived distance from the reference and various randomized playback >> methods (including a hidden reference and a low quality anchor, used to >> normalize the perceptual scale for each subject). The criteria are a >> combination of timbral distance/colouration, spatial distance, and artifacts >> if any. >> >> I’ve left out various details from the above, but this is the general idea. >> Some publications that have used this approach are: >> >> >> Vilkamo, J., Lokki, T., & Pulkki, V. (2009). Directional Audio Coding: >> Virtual Microphone-Based Synthesis and Subjective Evaluation. Journal of the >> Audio Engineering Society, 57(9), 709–724. >> >> Politis, A., Vilkamo, J., & Pulkki, V. (2015). Sector-Based Parametric Sound >> Field Reproduction in the Spherical Harmonic Domain. IEEE Journal of >> Selected Topics in Signal Processing, 9(5), 852–866. >> >> Politis, A., Laitinen, MV., Ahonen, A., Pulkki, V. (2015). Parametric >> Spatial Audio Processing of Spaced Microphone Array Recordings for >> Multichannel Reproduction. Journal of the Audio Engineering Society 63 (4), >> 216-227 >> >> Vilkamo, J., & Pulkki, V. (2014). Adaptive Optimization of Interchannel >> Coherence. Journal of the Audio Engineering Society, 62(12), 861–869. >> >> Getting the listening test samples and generating recordings or virtual >> recordings from the references would be a lot of work for the time being. >> >> What is easier and I can definitely do is process one or some of the >> recordings you mentioned for your speaker setup, and send you the results >> for listening. There is no reference in this case, but you can compare >> against your preferred decoding method. And it would be interesting for me >> to hear you feedback too. >> >> Best regards, >> Archontis >> >> On 05 Jul 2016, at 09:32, Richard Lee >> <rica...@justnet.com.au<mailto:rica...@justnet.com.au>> wrote: >> >> Can you give us more detail about these tests and perhaps put some of these >> natural recordings on ambisonia.com<http://ambisonia.com>? >> >> The type of soundfield microphone used .. and particularly the accuracy of >> its calibration ... makes a HUGE difference to the 'naturalness' of a >> soundfield recording. >> >> Some good examples of 'natural' soundfield recordings with loadsa stuff >> happening from all round are Paul Doombusch's Hampi, JH Roy's schoolyard & >> John Leonard's Aran music. Musical examples include John Leonards Orfeo >> Trio, Paul Hodges "It was a lover and his lass" and Aaron Heller's (AJH) >> "Pulcinella". The latter has individual soloists popping up in the >> soundfield .. not pasted on, but in a very natural and delicious fashion >> ... as Stravinsky intended. >> >> Also to my experience, and that doesn?t seem to be a very popular view >> yet in ambisonic community, these parametric methods do not only upsample >> or sharpen the image compared to direct first-order decoding, but they >> actually reproduce the natural recording in a way that is closer >> perceptually to how the original sounded, both spatially and in timbre. >> >> Or at least that?s what our listening tests have shown in a number of >> cases and recordings. And the directional sharpening is one effect, but >> also the higher spatial decorrelation that they achieve (or lower >> inter-aural coherence) in reverberant recordings is equally important. >> > > > _______________________________________________ > Sursound mailing list > Sursound@music.vt.edu > https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit > account or options, view archives and so on. _______________________________________________ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.