Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Dear colleagues... To continue the proposal to use certain forms of .AMB as a real-world format for the transport/storage of 3D audio (including music recordings), I would like to hint to some further and important issues involved. A full .AMB decoder would have to be able to decode the nine different combinations of .AMB to different (standard?) loudspeaker configurations, and also to headphones. (The latter would be some important point in my requirement list.) This means there will be plenty of combinations, and some great opportunities to mess things up if anybody wants to implement the 9*9 or so combinations ... :-) It would be advantageous if we would be able to limit .AMB to some CE profile with far fewer combinations! (To cover just FOA won't be enough. We know that FOA has certain limitations and won't be good enough for all applications. Think just of the sweet spot issues.) My impression is that you would have to use at least 3rd order to overcome many/most of the typical FOA problems. Some advantages of TOA, compared to FOA: - much larger sweet spot (not only support for individual listeners; IMO this is very important, as I would like to be able to demonstrate some wonderful recordings to at least one friend, even better to some friends. If you don't have friends, don't bother... :-D ) - angular resolution significantly improved, compared to FOA (improvement of more than factor 2) - improved performance at higher frequencies - we know that FOA has certain problems to present sound from the sides, even if the playback rig would include loudspeakers at direct lateral positions. http://www.acoustics.hut.fi/~ville/papers/pulkkiicmc2001.pdf - Decoding TOA to (ITU) 5.1/7.1 will show much better results than FOA to 5.1/7.1. (Comments? I know that 5.1 is an underspecified irregular array from an Ambisonics TOA perspective, but you can decode this and the results will be better than in the FOA to 5.1 case...) - Improved behaviour at higher frequenciess Altogether, a practical CE format based on Ambisonics and .AMB could be introduced in the following, simplified form I) FOA/ UHJ (3-4 UHJ channels, the proposed backward-compatible form to stereo of FOA...) 3/4 channels (Classical decoders and other decoders, supposed to improve on classical ones...) II) 3rd order horizontal-only and 3h/1p, which you could combine to just 3h1p (1st order vertical) 7/8 channels, or 8 channels (Might still be offered in some UHJ, stereo-compatible fashion; UHJ for 2nd/3rd oder doesn't exist yet, but it can be done.) III) 3h/3p, 16 channels (call this the .AMB master format? Anyway, this is the upper end of Fu-M AMB...) -- 2nd order Ambisonics possibly doesn't offer enough improvement over FOA, so might be cancelled in some CE format - for the sake of simplification. Do you think that 3rd oder Ambisonics would be strong enough for the distribution of real recordings? (This is the decisive point, because if the answer is yes you could convince some people. My personal impression is yes, as I know that TOA is successfully applied in some to many real-world installations, in live concerts etc. On the other hand, it would be nice to hear the feedback of people actually working with TOA.) I am aware that there is no microphone for 2nd or 3rd oder Ambisonics, maybe beside of some experimental designs. If you would like to use TOA as a master/distribution/storage format for music, there should be some TOA/AMB microphone available. But certainly somebody could design one? I believe there could already be some market for an AMB microphone . (The eigenmike doesn't count here. I don't think this should be seen as a microphone designed for real-world music recordings. S/N problems, and many other issues...Has been designed by and for geeks, or should I apologize for this comment anyway? =-O ) Mixing of TOA is already possible, here and today. Best, Stefan P.S.: Note also in this context that the Mpeg is on track to finalize its Mpeg-H 3D Audio framework until beginning of 2015. (The basic CO decoder is already chosen, how it seems.) Mpeg 3D audio is technically cinema surround with height (22.2 style), so there is some basic difference if compared to Ambisonics. (Which has less company support, but offers full-spherical 3D audio even in its classical 4-channel form. ;-) ) UHJ (surround/3D audio) as extension of stereo based files (distribution via Internet, on discs and streaming, including YouTube, Spotify etc.) I like the potential of this idea very much; but it can only move forward with the free availability of freely available encoders and decoders for 2, 3 and 4-channel UHJ, in both standalone and plugin formats. Seeing as how mere 2-channel versions have signally failed to become available at all, I wonder what chance there is. I had hoped that somebody else
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
On Sun, Aug 11, 2013 at 9:21 PM, Stefan Schreiber st...@mail.telepac.ptwrote: Again, the real problem seems to be the lack of available B format decoders. I may be able to help here, as I've recently written a full-featured (dual band, NFC, blah, blah...) Ambisonic decoder engine in Faust, as well as a toolkit to design the decoder configurations (written in MATLAB/Octave). Several sursounders have been beta testing the toolkit and Faust backend with some success. It's all open source, licensed under GNU Affero General Public License version 3, but I could be persuaded to change that in the interest of wider adoption. Faust is a DSP specification language, which compiles to highly optimized C++, and then to VST, AU, LADSPA, Jack, MaxMSP, csound, SuperCollider, etc. I believe it can also target Android and IOS, but I haven't confirmed that personally. Contact me directly (hel...@ai.sri.com) if you want to try it out. Aaron Heller Menlo Park, CA US -- next part -- An HTML attachment was scrubbed... URL: https://mail.music.vt.edu/mailman/private/sursound/attachments/20130813/72153f52/attachment.html ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Paul Hodges wrote: --On 29 July 2013 03:57 +0100 Stefan Schreiber st...@mail.telepac.pt wrote: UHJ (surround/3D audio) as extension of stereo based files (distribution via Internet, on discs and streaming, including YouTube, Spotify etc.) I like the potential of this idea very much; but it can only move forward with the free availability of freely available encoders and decoders for 2, 3 and 4-channel UHJ, in both standalone and plugin formats. Seeing as how mere 2-channel versions have signally failed to become available at all, I wonder what chance there is. I had hoped that somebody else would state the obvious, in the end I have to do this myself... :-) While I would understand the above argument IF UHJ would be some area on its own, my proposal actually implied that you would use (in the end) a B format decoder. You would additionally need an UHJ channel extractor (works on the AAC file/ .M4P/Ogg etc. input ), and secondly the UHJ to B format translator. (The latter is just the application of some formulas which might not be trivial but are known and/or can be deduced. From an IT perspective, this is very little program code. You just have to apply known formulas. This step also doesn't depend a lot on the specific programming language which is used. Mathematics stays mathematics, and the language of mathematical formulas is older than programming languages - which explains why formulas look more or less the same in any programming language - well, if I/you exclude Forth and other exotics :-D ) I would call the two additional steps the UHJ front end for some B format decoder. I know that there would have to be done a lot more work to publish B format programs/plugins/mobile apps etc., and to describe B decoder design. Specifically, I believe that B decoders nowadays should be able to support output via headphones and binaural techniques. Section III of my 1st posting suggests that head-tracking hardware is both available and cheap enough to be applied in real-world products, including future surround capable HT headphones . I mentioned the specific hardware used in the Oculus Rift VR headset, just to give some example for some existing HT chip. (There is plenty of other hardware around.) It might help to set up some open group, which would promote the use and design of B format (HOA? Section II...) decoders: describing the theory behind, offering (open sourced) program code, distributing free solutions etc. (To set up a working open group requires some organisational skills, but it can be done.) Again, the real problem seems to be the lack of available B format decoders. (My proposal is to transport B format over stereo, in some simple description. If so, it is again obvious that you should see the use of UHJ extension channels just as a front end for B format, because this is the format which has to be decoded.) I believe that you should promote the fact that B format is a real 3D audio format, using just 4 channels. This is obviously some intriguing fact. (Note that the spatial 3D resolution of full FOA is actually the same as the spatial 2D resolution of XYW, because Ambisonics is isotropic.) IMO, 2-channel UHJ is something from the past. Don't use this if you could distribute the real thing?! Which means B format, not a reduced form of B format. The use of 3/4 channel UHJ (maybe more channels for higher oders) was suggested to stay compatible with 2-channel audio/stereo files and streams. It has been shown that existing file/container formats would allow the transport of UHJB format over stereo, via at least two different extension techniques. (File extensions, extensions in current container formats) Best regards, Stefan P.S.: Mpeg Surround is also a decoder based design. (MPS encoder/decoder) The same is valid for the future (Mpeg) 3D audio codec, currently in development. I know that they take the topic binaural output via headphones very seriously, you just have to look into their CfP and similar documents... P.S. 2: Like everyone else I wish I had the time myself; but when factoring in the need to learn about DSP programming and modern programming languages, other commitments, and the slowing down of age... Paul Not any single person could do all the programming stuff, at least not anymore. There are just too many different platforms around Nevertheless, B format decoders/apps will be written if Ambisonics is seen as a format which is worth to be implemented. (Or if there is enough music in this format around.) In this sense, I would look to the applications/aspects which are beyond of what is offered by the 5.1 ITU layout. (IMO Ambisonics starts to shine if you factor in the inherent capability to record/encode full-sphere 3D audio. And because you could really not expect that available 3D audio loudspeaker layouts would look about the same everywhere,
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
--On 29 July 2013 03:57 +0100 Stefan Schreiber st...@mail.telepac.pt wrote: UHJ (surround/3D audio) as extension of stereo based files (distribution via Internet, on discs and streaming, including YouTube, Spotify etc.) I like the potential of this idea very much; but it can only move forward with the free availability of freely available encoders and decoders for 2, 3 and 4-channel UHJ, in both standalone and plugin formats. Seeing as how mere 2-channel versions have signally failed to become available at all, I wonder what chance there is. Like everyone else I wish I had the time myself; but when factoring in the need to learn about DSP programming and modern programming languages, other commitments, and the slowing down of age... Paul -- Paul Hodges ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Not sure I see the point of bandwidth-limiting T. It was designed for a world we no longer inhabit. We had issues with it at the time and I don't think the considerations that made it useful for FM apply here. --R On 02/08/2013 17:09, Martin Leese wrote: ... - The UHJ article already mentions that the T channel could be bandwidth-limited. Geoffrey Barton said some time ago that a bandwidth-limited T-channel resulted in some unwelcome compromises in the design of the 3-channel UHJ decoder. This may not be such a problem with software decoders as you could just include two separate decoders, one for 2.5 channels and another for 3. However, this would mean a lot more work. I question whether the gain from band-limiting T is worth the pain. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
On 03/08/2013 13:22, Richard G Elen wrote: Not sure I see the point of bandwidth-limiting T. It was designed for a world we no longer inhabit. We had issues with it at the time I had an experimental IBA 2.5 channel UHJ decoder and FM tuner on loan and set up in my home at the time of experimental broadcasts on London's Capital Radio. In another room, I had my own domestic 2 channel UHJ decoder fed by a Quad FM tuner. This facilitated a reasonable comparison although of course decoders, amplifiers and speakers were different for each set-up. It was generally agreed that the 2 channel UHJ decode sounded better. Listeners present at the time of a live concert broadcast from the Fairfield Hall in Croydon included technical staff from the IBA Crawley Court HQ who had loaned their equipment. -- Peter Carbines ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Richard G Elen wrote: Not sure I see the point of bandwidth-limiting T. It was designed for a world we no longer inhabit. We had issues with it at the time and I don't think the considerations that made it useful for FM apply here. --R Nor do I see any point in this, or did I. Note that I was just laying out some general framework, which is up for discussion. Yes, we have enough bandwidth for 4 full-bandwidt channels (respective to AAC/320kbps). You don't need any bandwidth-limitations for encoding T/Q channels, agreed. (The EBU has tested 5.1 AAC/320kbps, in 2007 or so. According to them, the results were still in the 4 area, which means very good or near-transparent. Which just confirms once more that 3/4 channels won't be a problem. And there were former tests... The MPEG-2 audio tests showed that AAC meets the requirements referred to as transparent for the ITU http://en.wikipedia.org/wiki/ITU at 128 kbit/s for stereo, and 320 kbit/s for 5.1 http://en.wikipedia.org/wiki/5.1 audio. Certainly 90s? Source: http://en.wikipedia.org/wiki/Advanced_Audio_Coding I personally think that the AAC stereo/5.1 rates (128/320 kbps respectively) given above are not quite enough to be considered to be reallt transparent, but at least close (near-transparent). I have recommended higher rates anyway, to be consistent with my own believings. :-) So, the recommendation was 80kbps/channel for symmetric coding 4-channel encoding, or maybe 96kbps/s for L/R and 64kbps for T/Q in the asymmetric 4-channel case. With the help of existing and widely used container formats like .M4p or Ogg, you could get around the 320kbps limit of current stereo AAC decoders. The use of a container extension implies that you would have some .aac stereo file inside, and another container containing ;-) the UHJ extensions. Extract the .aac part from the .m4p container, or set the filepath to the correct location. This prison break solution doesn't work for streaming, though... Just giving a bit more general background...) Best, Stefan P.S.: AAC itself could and should go much higher than 320kbps, in the case of multichannel applications. Unfortunately, even the EBU was not able to test AAC above 320kbps some years ago. I personally believe this is kind of embarassing, because it shows the low status of surround sound. (A fictive log from the EBU test conversations: Oh damned, our AAC 5.1 encoder didn't accept any higher rate than 320kbps! We can't compare to Dolby DD at 448kbps, 'cos look, there are different bitrates and this should not be compared! This is because we tested AAC encoding always for stereo input be4. Don't worry, we will fix this problem in maybe two years.) On 02/08/2013 17:09, Martin Leese wrote: ... - The UHJ article already mentions that the T channel could be bandwidth-limited. Geoffrey Barton said some time ago that a bandwidth-limited T-channel resulted in some unwelcome compromises in the design of the 3-channel UHJ decoder. This may not be such a problem with software decoders as you could just include two separate decoders, one for 2.5 channels and another for 3. However, this would mean a lot more work. I question whether the gain from band-limiting T is worth the pain. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Aaron Heller wrote: Hi Stephan, Please note: AAC/HE-AAC profile 1 uses Spectral Band Replication, which means that top octave information is generated from lower frequency content using hints. I'm unsure of the impact this would have on ambisonic decoding. I guess one could filter out the replicated contents and treat it as a band-limited channel. AAC/HE-AAC profile 2 uses parametric stereo, which is similar to Ogg Vorbis Square Polar Mapping (described here http://xiph.org/vorbis/doc/stereo.html). This destroys phase information and I think would be unstable for ambisonic content. Can it be turned off in the encoder? Aaron Heller (hel...@ai.sri.com) Menlo Park, CA US (HE-AAC, Vorbis, Opus, FLAC) To step things a bit up: http://people.xiph.org/~greg/opus/ha2011/ The comparison of HE-AAC and Opus happens here at 32kbps/channel. (I was talking about transparent or near-transparent bitrates, but if we just talk about streaming or mobile streaming, say at 128kbps, you still have some options... ) Opus is an official Internet (IETF) format, by now. I am not ignoring Opus, FLAC etc., but wrote my first posting (mainly) from an AAC perspective, because we talked about established ways of audio delivery . (The proposed format is basically codec and format agnostic. It is the stereo backward-compatible version of B format at 1st order, which could easily be extended to 3rd order, at least from a theory perspective. Important is that there is always some direct relationship between the B format and the stereo-extension UHJ version. XYWZ LRTQ is therefore the same, the latter a different presentation of XYWZ. You could extend this scheme to Fu-Malham B format up to 3rd order, introducing the corresponding UHJ versions. If you use 3rd order horiz. or 3h1p variants which are backward-compatible to stereo, you don't have more channels than for Dolby/DTS 7.1 variants, and the bitrates won't have to be higher. You could easily have all this at 640kbps, if we talk about 7-8 channels. Note that this is just some framework, not necessarily the 1st step. Don't kill the messenger - i.e. me! Philosophically and mathematically speaking these things have always existed somewhere... Ok, this was maybe just the Platonic view. :-) ) Gregory Maxwell and colleagues: 1. Does the Opus format (sic!) allow 2 audio channels and (in some form) data extensions? (Audio data, I might add.) 2. What will be the container format for Opus? (I heard: Ogg) 3. What is the situation for FLAC? File format? Container format? (It is always best to ask typical questions like these to the format developpers themselves. I didn't ask for the 2.5 channel case, to simplify the following discussions, if they should happen.) Thanks Stefan ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Stefan Schreiber wrote: ... To offer a backward-compatible extension of a UHJ extended AAC stereo file, you would have to include the T and Q audio channels as 3rd or 4th audio stream, somewhere. (Probably you could label such a file as stereo, the first 2 channels being L and R. Include some tags/flags in the header that there are one or two further extension audio channels, which would have to be decoded by a UHJ decoder. The decoder could be an app running on a smartphone, and the output could be a binaural version of the surround or actually LRTQ 3D audio recording.) If this audio channels approach doesn't work, use the data extensions of .mp4. (T and Q are not direct audio channels, so this might actually be the formally correct approach... Because T and Q go into some decoder, as extension data .) The sections quoted above are the key, to my mind. A problem with 3- or 4-channel UHJ is, what do decoders that are unaware of Ambisonics do with the extra channels? With other file formats, they would treat the file as multi-channel and mix all the channels down to stereo. With T and Q included in the mix, this would produce a mishmash. This problem of inadvertent mix down is why I have been pushing for so long (without any success) for a way to specify in multi-channel files the preferred mix down to stereo. See, for example: http://members.tripod.com/martin_leese/Audio/StereoMix_chunk.html Somebody would need to produce AAC test files containing T and T+Q, and see what existing stereo decoders actually do. If existing decoders cannot be made to ignore T and Q (by fiddling with the file format) then the idea of including T and Q is dead. ... - The UHJ article already mentions that the T channel could be bandwidth-limited. Geoffrey Barton said some time ago that a bandwidth-limited T-channel resulted in some unwelcome compromises in the design of the 3-channel UHJ decoder. This may not be such a problem with software decoders as you could just include two separate decoders, one for 2.5 channels and another for 3. However, this would mean a lot more work. I question whether the gain from band-limiting T is worth the pain. Regards, Martin -- Martin J Leese E-mail: martin.leese stanfordalumni.org Web: http://members.tripod.com/martin_leese/ ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Martin Leese wrote: Stefan Schreiber wrote: ... To offer a backward-compatible extension of a UHJ extended AAC stereo file, you would have to include the T and Q audio channels as 3rd or 4th audio stream, somewhere. (Probably you could label such a file as stereo, the first 2 channels being L and R. Include some tags/flags in the header that there are one or two further extension audio channels, which would have to be decoded by a UHJ decoder. The decoder could be an app running on a smartphone, and the output could be a binaural version of the surround or actually LRTQ 3D audio recording.) If this audio channels approach doesn't work, use the data extensions of .mp4. (T and Q are not direct audio channels, so this might actually be the formally correct approach... Because T and Q go into some decoder, as extension data .) The sections quoted above are the key, to my mind. A problem with 3- or 4-channel UHJ is, what do decoders that are unaware of Ambisonics do with the extra channels? With other file formats, they would treat the file as multi-channel and mix all the channels down to stereo. With T and Q included in the mix, this would produce a mishmash. This problem of inadvertent mix down is why I have been pushing for so long (without any success) for a way to specify in multi-channel files the preferred mix down to stereo. See, for example: http://members.tripod.com/martin_leese/Audio/StereoMix_chunk.html This is meta data and channel denomination hell, but actually these problems don't matter so much in our case. Because the preferred downmix is already in LR, you have only one or two more channels which you have to embed without breaking existing decoders (hardware and software). Somebody would need to produce AAC test files containing T and T+Q, and see what existing stereo decoders actually do. If existing decoders cannot be made to ignore T and Q (by fiddling with the file format) then the idea of including T and Q is dead. http://en.wikipedia.org/wiki/Advanced_Audio_Coding AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 coupling or dialog channels, and up to 16 data streams. Probably you would use the data channels, to be on the safe side of backward-compatinilty. This is if you stay in AAC format, which can be used as audio part in some M4P. (M4P is the most general term. MP4 would be some AVC video file with some audio, and maybe additional data like subdata etc. You could have timecodes and so on.) I would propose two solutions anyway, because .aac and .m4a/.m4p offer different possibilities and actually applications. (You could have more than 4 channels in .m4p, if needed.) Best, Stefan P.S.: .M4P is a better format than Wave-EX. It is a container format which allows you to extend both much easier and more flexible, forget all this chunk stuff which might work or not. The ISO doesn some good standardization, once more... ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Hi Stephan, Please note: AAC/HE-AAC profile 1 uses Spectral Band Replication, which means that top octave information is generated from lower frequency content using hints. I'm unsure of the impact this would have on ambisonic decoding. I guess one could filter out the replicated contents and treat it as a band-limited channel. AAC/HE-AAC profile 2 uses parametric stereo, which is similar to Ogg Vorbis Square Polar Mapping (described here http://xiph.org/vorbis/doc/stereo.html). This destroys phase information and I think would be unstable for ambisonic content. Can it be turned off in the encoder? Aaron Heller (hel...@ai.sri.com) Menlo Park, CA US On Fri, Aug 2, 2013 at 9:39 AM, Stefan Schreiber st...@mail.telepac.ptwrote: Martin Leese wrote: Stefan Schreiber wrote: ... To offer a backward-compatible extension of a UHJ extended AAC stereo file, you would have to include the T and Q audio channels as 3rd or 4th audio stream, somewhere. (Probably you could label such a file as stereo, the first 2 channels being L and R. Include some tags/flags in the header that there are one or two further extension audio channels, which would have to be decoded by a UHJ decoder. The decoder could be an app running on a smartphone, and the output could be a binaural version of the surround or actually LRTQ 3D audio recording.) If this audio channels approach doesn't work, use the data extensions of .mp4. (T and Q are not direct audio channels, so this might actually be the formally correct approach... Because T and Q go into some decoder, as extension data .) Somebody would need to produce AAC test files containing T and T+Q, and see what existing stereo decoders actually do. If existing decoders cannot be made to ignore T and Q (by fiddling with the file format) then the idea of including T and Q is dead. Certainly, but I see many ways to achieve this. Note that .aac is one thing, and .m4a and .m4p as container formats are something different. (Because Apple seems to mix these things a bit up, a decoder will play a aac stereo file in any of these variants, and it will be the same thing anyway. Speaking of extensions, it is not always the same thing. ) ... - The UHJ article already mentions that the T channel could be bandwidth-limited. Geoffrey Barton said some time ago that a bandwidth-limited T-channel resulted in some unwelcome compromises in the design of the 3-channel UHJ decoder. This may not be such a problem with software decoders as you could just include two separate decoders, one for 2.5 channels and another for 3. However, this would mean a lot more work. I question whether the gain from band-limiting T is worth the pain. No, I already wrote it is not worth it. (Better to use a lower AAC/HE-AAC bitrate for the full T/Q channel/channels, IMO.) Best, Stefan P.S.: Of course you would have to prove such a concept. If you have at least three ways to fiddle and two ways don't use hidden audio channels at all, things should really work. http://en.wikipedia.org/wiki/**MPEG-4_Part_14http://en.wikipedia.org/wiki/MPEG-4_Part_14 The existence of two different filename extensions, .MP4 and .M4A, for naming audio-only MP4 files has been a source of confusion among users and multimedia playback software. Some file managers, such as Windows Explorer, look up the media type and associated applications of a file based on its filename extension. But since MPEG-4 Part 14 is a container format, MPEG-4 files may contain any number of audio, video, and even subtitle streams, making it impossible to determine the type of streams in an MPEG-4 file based on its filename extension alone. In response, Apple Inc. started using and popularizing the .m4a filename extension, which is used for MP4 containers with audio data in the lossy Advanced Audio Coding (AAC) or its own lossless Apple Lossless (ALAC) formats. Software capable of audio/video playback should recognize files with either .m4a or .mp4 filename extensions, as would be expected, since there are no file format differences between the two. Almost any kind of data can be embedded in MPEG-4 Part 14 files through private streams. A separate hint track is used to include streaming information in the file. Which is the option which leads to = 320kbps mode, as well. (I could figure this out. Not necessary as response to your posting.) -- next part -- An HTML attachment was scrubbed... URL: https://mail.music.vt.edu/**mailman/private/sursound/** attachments/20130802/49823d7a/**attachment.htmlhttps://mail.music.vt.edu/mailman/private/sursound/attachments/20130802/49823d7a/attachment.html __**_ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/**mailman/listinfo/sursoundhttps://mail.music.vt.edu/mailman/listinfo/sursound -- next part -- An
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Apple uses HE-AAC and doesn't use SBR, at least this is my impression. In my understanding, this is a more a tool you can use at low rates, to obtain a (perceptual) improved result. Speaking of 64kbps/channel and above (I showed cases with 80kbps/channel and actually 96kbps for L/R, 64kbps for T/Q), you won't need SBR. Parametric even less. Can it be turned off in the encoder? Of course! Who uses PS (parametric stereo) anyway? Digital AM radio maybe, and then? Don't see so many problems where we don't have any! (HE-AAC is an extension of the AAC toolbox, but would you use low-bitrate techniques for high bitrates? The answer is no... ) Best, Stefan Aaron Heller wrote: Hi Stephan, Please note: AAC/HE-AAC profile 1 uses Spectral Band Replication, which means that top octave information is generated from lower frequency content using hints. I'm unsure of the impact this would have on ambisonic decoding. I guess one could filter out the replicated contents and treat it as a band-limited channel. AAC/HE-AAC profile 2 uses parametric stereo, which is similar to Ogg Vorbis Square Polar Mapping (described here http://xiph.org/vorbis/doc/stereo.html). This destroys phase information and I think would be unstable for ambisonic content. Can it be turned off in the encoder? Aaron Heller (hel...@ai.sri.com) Menlo Park, CA US On Fri, Aug 2, 2013 at 9:39 AM, Stefan Schreiber st...@mail.telepac.ptwrote: Martin Leese wrote: Stefan Schreiber wrote: ... To offer a backward-compatible extension of a UHJ extended AAC stereo file, you would have to include the T and Q audio channels as 3rd or 4th audio stream, somewhere. (Probably you could label such a file as stereo, the first 2 channels being L and R. Include some tags/flags in the header that there are one or two further extension audio channels, which would have to be decoded by a UHJ decoder. The decoder could be an app running on a smartphone, and the output could be a binaural version of the surround or actually LRTQ 3D audio recording.) If this audio channels approach doesn't work, use the data extensions of .mp4. (T and Q are not direct audio channels, so this might actually be the formally correct approach... Because T and Q go into some decoder, as extension data .) Somebody would need to produce AAC test files containing T and T+Q, and see what existing stereo decoders actually do. If existing decoders cannot be made to ignore T and Q (by fiddling with the file format) then the idea of including T and Q is dead. Certainly, but I see many ways to achieve this. Note that .aac is one thing, and .m4a and .m4p as container formats are something different. (Because Apple seems to mix these things a bit up, a decoder will play a aac stereo file in any of these variants, and it will be the same thing anyway. Speaking of extensions, it is not always the same thing. ) ... - The UHJ article already mentions that the T channel could be bandwidth-limited. Geoffrey Barton said some time ago that a bandwidth-limited T-channel resulted in some unwelcome compromises in the design of the 3-channel UHJ decoder. This may not be such a problem with software decoders as you could just include two separate decoders, one for 2.5 channels and another for 3. However, this would mean a lot more work. I question whether the gain from band-limiting T is worth the pain. No, I already wrote it is not worth it. (Better to use a lower AAC/HE-AAC bitrate for the full T/Q channel/channels, IMO.) Best, Stefan P.S.: Of course you would have to prove such a concept. If you have at least three ways to fiddle and two ways don't use hidden audio channels at all, things should really work. http://en.wikipedia.org/wiki/**MPEG-4_Part_14http://en.wikipedia.org/wiki/MPEG-4_Part_14 The existence of two different filename extensions, .MP4 and .M4A, for naming audio-only MP4 files has been a source of confusion among users and multimedia playback software. Some file managers, such as Windows Explorer, look up the media type and associated applications of a file based on its filename extension. But since MPEG-4 Part 14 is a container format, MPEG-4 files may contain any number of audio, video, and even subtitle streams, making it impossible to determine the type of streams in an MPEG-4 file based on its filename extension alone. In response, Apple Inc. started using and popularizing the .m4a filename extension, which is used for MP4 containers with audio data in the lossy Advanced Audio Coding (AAC) or its own lossless Apple Lossless (ALAC) formats. Software capable of audio/video playback should recognize files with either .m4a or .mp4 filename extensions, as would be expected, since there are no file format differences between the two. Almost any kind of data can be embedded in MPEG-4 Part 14 files through private streams. A separate hint track is used to include
Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio
Michael Chapman wrote: (Continuation of: The commercial future of Ambisonics, 15/5/2013) [ ... ] If this is a proposed standard, then I would say: -BHJ (2 channel) should not be used I do agree. (Unless for legacy reasons, because sometimes the B format source might actually have been lost. Which means the BHJ version is all what rests from your Ambisonic recording or mix.) -SHJ (2.5 channel) should not be used (is bandwidth really a problem, except for radio, these days?) Just some ammendment to my former posting: The proposed UHJ extended stereo file standard (AAC etc.) would certainly work well for DAB radio, which actually would imply DAB+ in this case. (DAB+ is using AAC/HE-AAC, within DAB spectrum and modulation standards.) However, DAB (and generally speaking, digital terrestrial radio) has not exactly been some roaring success, and this after so many years of existence... From a consumer-perspective, it doesn't add anything really new to FM, because FM and DAB both (functionally) offer the same thing: stereo broadcast. (Ok, you have the possibility to transmit more radio channels. And much more if you lower the transmission quality, which tends to happen because the radio station will love to pay less for less used bandwidth. In the end, people might think that good old FM radio sounds actually better than most DAB stations, and...) If you include the capability to offer surround/3D audio transmission, DAB+ would offer something more than FM. You could say this would present some real advantage, especially if you see how many people listen to radio via headphones. Radio stations have experimented with UHJ, Stereo Surround, Dolby Digital and (parametric) Mpeg Surround, the most recent attempt to transmit surround sound via terrestrial radio. I am modest enough to state that the UHJ/AAC proposal is actually better than anything else above... thanks to the combined power of modern audio codecs (DAB+ uses HE-AAC) and UHJ/LRTQ! (This was just a bit of the necessary PR work, for the lurking radio broadcasters on this list which might have to convince some bean counters... Why should you invest some money and actually some time into anything at all ? The best or most practical solution is always if you - as a bean counter or actually manager - decide that some idea doesn't have any merit, or is just not practical. Then you don't have to work, which is a huge advantage! Philosophically speaking, this is the principle of least effort . You have thought about something in a thorough way and during a long time, but... Don't go there! In fact, close the case, and archive the project and recordings somewhere...:-) This is of course how things should not be. End of the small philosophical investigation.) I am aware that a lot of radio transmission/reception happens (nowadays) via Internet transmission. This form of radio broadcast belongs also to the area of Internet streaming, which I did mention. -THJ (3 channel) should only be used if the original material is three channel -PHJ (4 channel) is preferred Yes, but only if you have some real 3D audio recording, which means the Z channel is not empty. It would need some neat little standalone UHJ--B-format decoders writing ... (But that _could_ (unideally) progress whilst people beenfitted from just L/R.) Anyone able to comment on why the UHJ (2 channel) buttons on Ambisonia were (?)never made active? If you can offer the real thing (3- or 4-channel Ambisonics) within the current transmission channels/frameworks, forget about 2-channel UHJ. (Which had been developped to fit Ambisonic/surround transmission into the then existing analogue distribution models, which were all 2-channel based.) The 2.5/SHJ variant was an experiment by the broadcasters which didn't make it to many if any radio listeners. (The existing SHJ decoders were probably bought only by the radio stations/broadcasters themselves. I actually have never heard of any UHJ decoder capable to decode more than 2 channels. In our case we don't have the problem to install a base of analogue decoders, because we definitively talk about some decoder programs/apps, i.e. software. Any UHJ decoder software should be able to decode BHJ, but IMO mainly for legacy reasons. Add THJ and PHJ, as new standard case. As Michael writes, you would transcode UHJ to B format, so to WXYZ presentation. And some smartphone app would have to decode this to some binaural representation. To state the obvious: We would use UHJ/LRQT just because to become backward-compatible to stereo. And we transmit B format over extended LR stereo , including one or two additional audio streams. Which from the perspective of existing file and/or container formats might be labeled as extension data streams .) II Yes, we are hitting our head on the ceiling with FuMa. Personally, though, if we are to move on I would
[Sursound] Two new approaches for the distribution of surround sound/3D audio
(Continuation of: The commercial future of Ambisonics, 15/5/2013) Dear colleagues, following the recent standardization of 3D audio by Mpeg (ISO/IEC 23008-3) and related activities, I have come to the conclusion that the (older) B format up to 3rd order might need some updates. However, I also came to the conclusion that FOA (first order Ambisonics) could be easily included into all current distribution models for audio in the Internet, which are (to 99.98%) stereo-based. We nearly have been there, in the above cited thread! (The commercial future of Ambisonics) I will start with this part, because you can see this as an own format. Which might be the perfect bridge or transition format for future surround/3D audio (3DA) formats... I. UHJ (surround/3D audio) as extension of stereo based files (distribution via Internet, on discs and streaming, including YouTube, Spotify etc.) a) As Richard Elen (and me) have suggested, you could distribute surround sound and 3D audio as (relativey simple) extension of (UHJ encoded) stereo files. You would have to add to a stereo file (.aac file, for example) a 3rd audio channel, OR two audio channels, as an extension audio stream. The restriction was that these extension would have to fit into the current distribution models, say downloadable AAC files via iTunes. Contrary to my/our first impressions, this is firstly possible (this has already been pretty clear), and secondly feasible without any serious drawbacks . Which will be shown... b) Technically speaking, you would have to distribute the (downsampled) stereo file of FOA, which contains some surround information, and the one or two audio extension streams. This is of course UHJ, brought into some AAC extension scheme. http://en.wikipedia.org/wiki/Ambisonic_UHJ_format Although UHJ permits the use of up to four channels (carrying full-sphere with-height surround), only the 2-channel variant is in current use (as it is compatible with currently-available 2-channel media). In Ambisonics, UHJ is also known as C-Format. (Small potential problem: UHJ was developed by the Ambisonic team, incorporating work done by the BBC (on their quadraphonic system, Matrix H) and Duane Cooper (on Nippon Columbia's UD-4/UMX quadraphonic system) and others, and building on the then-current version of Ambisonics, System 45J. The initials indicate some of sources incorporated into the system: U from Universal (UD-4); H from Matrix H; and J from System 45J. This means you might think about an update of UHJ, to achieve more consistency between B format and the UHJ scheme. Or you might leave things how they are defined, for historical reasons. In any case, you have to be aware of this... ) Although an hierarchically extended version of UHJ stereo has been tested in the area of FM broadcasting, nobody hass tried to distribute UHJ (hierarchically) extended stereo files via the Internet. Which is just a head-banging fact... Or maybe there are some deeper reasons?! If a third channel (T) is available, this can be used to give improved localisation accuracy to the planar surround effect when decoded via a 3-channel UHJ decoder. The third channel does not have to have full audio bandwidth for this purpose, leading to the possibility of so-called 2½-channel systems, where the third channel is bandwidth-limited to 5 kHz. The third channel can be broadcast via FM radio, for example, by means of phase-quadrature modulation. This configuration was tested by the Independent Broadcasting Authority (IBA) in the United Kingdom as a method of broadcasting surround recordings. 2½ or 3-channel UHJ delivers the same accuracy as 3-channel (WXY) B-Format Adding a fourth channel (Q) to the UHJ system allows the encoding of full surround sound with height, known as Periphony, with a level of accuracy identical to 4-channel B-Format. c) UHJ extended AAC files AAC allows up to 16 audio channels, and can include 16 data channels. (I believe that .aac as a file format is just .m4a, or .mp4.) To offer a backward-compatible extension of a UHJ extended AAC stereo file, you would have to include the T and Q audio channels as 3rd or 4th audio stream, somewhere. (Probably you could label such a file as stereo, the first 2 channels being L and R. Include some tags/flags in the header that there are one or two further extension audio channels, which would have to be decoded by a UHJ decoder. The decoder could be an app running on a smartphone, and the output could be a binaural version of the surround or actually LRTQ 3D audio recording.) If this audio channels approach doesn't work, use the data extensions of .mp4. (T and Q are not direct audio channels, so this might actually be the formally correct approach... Because T and Q go into some decoder, as extension data .) d) Bitrate limits Whereas Apple uses 256 kbps (VBR) as current standard (they have used