Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-09-08 Thread Stefan Schreiber

Dear colleagues...

To continue the proposal to use  certain  forms of .AMB as a 
real-world format for the transport/storage of 3D audio (including music 
recordings), I would like to hint to some further and important issues 
involved.


A full .AMB decoder would have to be able to decode the nine different 
combinations of .AMB to different (standard?) loudspeaker 
configurations, and also to headphones. (The latter would be some 
important point in my requirement list.) This means there will be 
plenty of combinations, and some great opportunities to mess things up 
if anybody wants to implement the 9*9 or so combinations ...   :-)


It would be advantageous if we would be able to limit .AMB to some  CE 
profile  with far fewer combinations!


(To cover just FOA won't be enough. We know that FOA has certain 
limitations and won't be good enough for all applications. Think just of 
the sweet spot issues.)


My impression is that you would have to use  at least  3rd order to 
overcome many/most of the typical FOA  problems.


Some advantages of TOA, compared to FOA:

- much larger sweet spot (not only support for individual listeners; IMO 
this is very important, as I would like to be able to demonstrate some 
wonderful recordings to at least one friend, even better to some 
friends. If you don't have friends, don't bother...   :-D )


- angular resolution significantly improved, compared to FOA 
(improvement of more than factor 2)


- improved performance at higher frequencies

- we know that FOA has certain problems to present sound from the 
sides, even if the playback rig would include loudspeakers at direct 
lateral positions.


http://www.acoustics.hut.fi/~ville/papers/pulkkiicmc2001.pdf

- Decoding TOA to (ITU) 5.1/7.1 will show much better results than FOA 
to 5.1/7.1.   (Comments? I know that 5.1 is an underspecified irregular 
array from an Ambisonics TOA perspective, but you can decode this and 
the results will be better than in the FOA to 5.1 case...)


- Improved behaviour at higher frequenciess


Altogether, a practical CE format based on Ambisonics and .AMB could 
be introduced in the following, simplified form


I) FOA/ UHJ (3-4 UHJ channels, the proposed backward-compatible form to 
stereo of FOA...)


3/4 channels

(Classical decoders and other decoders, supposed to improve on classical 
ones...)


II) 3rd order horizontal-only  and 3h/1p, which you could combine to 
just 3h1p (1st order vertical)


7/8 channels,  or 8 channels

(Might still be offered in some UHJ, stereo-compatible fashion; UHJ 
for 2nd/3rd oder doesn't exist yet, but it can be done.)


III) 3h/3p, 16 channels (call this the .AMB master format? Anyway, this 
is the upper end of Fu-M AMB...)


--

2nd order Ambisonics possibly doesn't offer enough improvement over FOA, 
so might be cancelled in some CE format -  for the sake of simplification.



Do you think that 3rd oder Ambisonics would be strong enough for the 
distribution of real recordings?
(This is the decisive point, because if the answer is yes you could 
convince some people.
My personal impression is yes, as I know that TOA is successfully 
applied in some to many real-world installations, in live concerts etc. 
On the other hand, it would be nice to hear the feedback of people 
actually working with TOA.)


I am aware that there is no microphone for 2nd or 3rd oder Ambisonics, 
maybe beside of some experimental designs. If you would like to use TOA 
as a master/distribution/storage format  for music, there  should  be 
some  TOA/AMB microphone  available.
But certainly somebody could design one? I believe there could already 
be some market for an   AMB  microphone . (The eigenmike doesn't count 
here. I don't think this should be seen as a microphone designed for 
real-world music recordings. S/N problems, and many other issues...Has 
been designed by and for geeks, or should I apologize for this comment 
anyway?   =-O )


Mixing of TOA is already possible, here and today.

Best,

Stefan

P.S.: Note also in this context that the Mpeg is on track to finalize 
its Mpeg-H 3D Audio framework until beginning of 2015. (The basic  CO 
decoder  is already chosen, how it seems.)


Mpeg 3D audio is technically cinema surround with height (22.2 style), 
so there is some basic difference if compared to Ambisonics. (Which has 
less company support, but offers full-spherical 3D audio even in its 
classical 4-channel form.   ;-) )








UHJ (surround/3D audio) as extension of stereo based files
(distribution via Internet, on discs and streaming, including
YouTube, Spotify etc.)
  



I like the potential of this idea very much; but it can only move
forward with the free availability of freely available encoders and
decoders for 2, 3 and 4-channel UHJ, in both standalone and plugin
formats.  Seeing as how mere 2-channel versions have signally failed to
become available at all, I wonder what chance there is.   



I had hoped that somebody else 

Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-14 Thread Aaron Heller
On Sun, Aug 11, 2013 at 9:21 PM, Stefan Schreiber st...@mail.telepac.ptwrote:


 Again, the real problem seems to be the lack of available B format
 decoders.


I may be able to help here, as I've recently written a full-featured (dual
band, NFC, blah, blah...) Ambisonic decoder engine in Faust, as well as a
toolkit to design the decoder configurations (written in MATLAB/Octave).
 Several sursounders have been beta testing the toolkit and Faust backend
with some success.  It's all open source, licensed under GNU Affero General
Public License version 3, but I could be persuaded to change that in the
interest of wider adoption.

Faust is a DSP specification language, which compiles to highly optimized
C++, and then to VST, AU, LADSPA, Jack, MaxMSP, csound, SuperCollider, etc.
I believe it can also target Android and IOS, but I haven't confirmed that
personally.

Contact me directly (hel...@ai.sri.com) if you want to try it out.

Aaron  Heller
Menlo Park, CA  US
-- next part --
An HTML attachment was scrubbed...
URL: 
https://mail.music.vt.edu/mailman/private/sursound/attachments/20130813/72153f52/attachment.html
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-11 Thread Stefan Schreiber

Paul Hodges wrote:


--On 29 July 2013 03:57 +0100 Stefan Schreiber st...@mail.telepac.pt
wrote:

 


UHJ (surround/3D audio) as extension of stereo based files
(distribution via Internet, on discs and streaming, including
YouTube, Spotify etc.)
   



I like the potential of this idea very much; but it can only move
forward with the free availability of freely available encoders and
decoders for 2, 3 and 4-channel UHJ, in both standalone and plugin
formats.  Seeing as how mere 2-channel versions have signally failed to
become available at all, I wonder what chance there is.  
 



I had hoped that somebody else would state the obvious, in the end I 
have to do this myself...   :-)


While I would understand the above argument IF UHJ would be some area on 
its own, my proposal actually implied that you would use (in the end)  a 
B format decoder.


You would  additionally  need an UHJ channel extractor (works on the 
AAC file/ .M4P/Ogg etc.  input  ), and secondly the UHJ to B format 
translator. (The latter is just the application of some formulas which 
might not be trivial but are known and/or can be deduced. From an IT 
perspective, this is very little program code. You just have to apply 
known formulas. This step also doesn't depend a lot on the specific 
programming language which is used. Mathematics stays mathematics, and 
the language of mathematical formulas is older than programming 
languages - which explains why formulas look more or less the same in 
any programming language - well, if I/you exclude Forth and other 
exotics :-D )


I would call the two additional steps the   UHJ front end  for some B 
format decoder.


I know that there would have to be done a lot more work to publish B 
format programs/plugins/mobile apps etc.,  and to describe B decoder 
design. Specifically, I believe that B decoders nowadays  should  be 
able to support output via headphones and binaural techniques. Section 
III of my 1st posting suggests that head-tracking hardware is both 
available and cheap enough to be applied in real-world products, 
including future  surround capable HT headphones . I mentioned the 
specific hardware used in the Oculus Rift VR headset, just to give some 
example for some existing HT chip. (There is plenty of other hardware 
around.)


It might help to set up some open group, which would promote the use and 
design of B format (HOA? Section II...) decoders: describing the theory 
behind, offering (open sourced) program code, distributing free 
solutions etc. (To set up a working open group requires some 
organisational skills, but it can be done.)


Again, the real problem seems to be the lack of available B format 
decoders. (My proposal is to transport B format over stereo, in some 
simple description. If so, it is again obvious that you should see the 
use of UHJ extension channels just as a front end for B format, because 
this is the format which has to be decoded.)


I believe that you should promote the fact that B format is a real 3D 
audio format, using just 4 channels.  This is obviously some intriguing 
fact. (Note that the spatial 3D resolution of full FOA is actually the 
same as the spatial 2D resolution of XYW, because Ambisonics is isotropic.)


IMO, 2-channel UHJ  is something from the past. Don't use this if you 
could distribute the real thing?! Which means B format, not a reduced 
form of B format. The use of 3/4 channel UHJ (maybe more channels for 
higher oders) was suggested to stay compatible with 2-channel 
audio/stereo files and streams. It has been shown that existing 
file/container formats would allow the transport of  UHJB format 
 over stereo, via at least two different extension techniques. (File 
extensions, extensions in current container formats)



Best regards,

Stefan

P.S.: Mpeg Surround is also a decoder based design. (MPS encoder/decoder)
The same is valid for the  future (Mpeg) 3D audio codec, currently in 
development. I know that they take the topic binaural output via 
headphones very seriously, you just have to look into their CfP and 
similar documents...



P.S. 2:


Like everyone else I wish I had the time myself; but when factoring in
the need to learn about DSP programming and modern programming
languages, other commitments, and the slowing down of age...

Paul

 

Not any single person could do all the programming stuff, at least not 
anymore. There are just too many different platforms around


Nevertheless, B format decoders/apps will be written if Ambisonics is 
seen as a format which is worth to be implemented. (Or if there is 
enough music in this format around.)
In this sense, I would look to the applications/aspects which are 
beyond of what is offered by the 5.1 ITU layout. (IMO Ambisonics 
starts to shine if you factor in the inherent capability to 
record/encode  full-sphere 3D audio. And because you could really not 
expect that available 3D audio loudspeaker layouts would look about the 
same everywhere, 

Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-07 Thread Paul Hodges
--On 29 July 2013 03:57 +0100 Stefan Schreiber st...@mail.telepac.pt
wrote:

 UHJ (surround/3D audio) as extension of stereo based files
 (distribution via Internet, on discs and streaming, including
 YouTube, Spotify etc.)

I like the potential of this idea very much; but it can only move
forward with the free availability of freely available encoders and
decoders for 2, 3 and 4-channel UHJ, in both standalone and plugin
formats.  Seeing as how mere 2-channel versions have signally failed to
become available at all, I wonder what chance there is.  

Like everyone else I wish I had the time myself; but when factoring in
the need to learn about DSP programming and modern programming
languages, other commitments, and the slowing down of age...

Paul

-- 
Paul Hodges


___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-03 Thread Richard G Elen
Not sure I see the point of bandwidth-limiting T. It was designed for a 
world we no longer inhabit. We had issues with it at the time and I 
don't think the considerations that made it useful for FM apply here.


--R

On 02/08/2013 17:09, Martin Leese wrote:

...

- The UHJ article already mentions that the T channel could be
bandwidth-limited.

Geoffrey Barton said some time ago that a
bandwidth-limited T-channel resulted in some
unwelcome compromises in the design of the
3-channel UHJ decoder.  This may not be
such a problem with software decoders as you
could just include two separate decoders, one
for 2.5 channels and another for 3.  However,
this would mean a lot more work.

I question whether the gain from band-limiting
T is worth the pain.



___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-03 Thread Peter Carbines

On 03/08/2013 13:22, Richard G Elen wrote:

Not sure I see the point of bandwidth-limiting T. It was designed for a
world we no longer inhabit. We had issues with it at the time


I had an experimental IBA 2.5 channel UHJ decoder and FM tuner on loan 
and set up in my home at the time of experimental broadcasts on London's 
Capital Radio. In another room, I had my own domestic 2 channel UHJ 
decoder fed by a Quad FM tuner. This facilitated a reasonable comparison 
although of course decoders, amplifiers and speakers were different for 
each set-up.


It was generally agreed that the 2 channel UHJ decode sounded better.
Listeners present at the time of a live concert broadcast from the 
Fairfield Hall in Croydon included technical staff from the IBA Crawley 
Court HQ who had loaned their equipment.


--
Peter Carbines



___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-03 Thread Stefan Schreiber

Richard G Elen wrote:

Not sure I see the point of bandwidth-limiting T. It was designed for 
a world we no longer inhabit. We had issues with it at the time and I 
don't think the considerations that made it useful for FM apply here.


--R



Nor do I see any point in this, or did I.  Note that I was just laying 
out some general framework, which is up for discussion. Yes, we have 
enough bandwidth for 4 full-bandwidt channels (respective to 
AAC/320kbps). You don't need any bandwidth-limitations for encoding T/Q 
channels, agreed.


(The EBU has tested 5.1 AAC/320kbps, in 2007 or so. According to them, 
the results were still in the 4 area, which means very good or 
near-transparent. Which just confirms once more that 3/4 channels 
won't be a problem.


And there were former tests...

The MPEG-2 audio tests showed that AAC meets the requirements referred 
to as transparent for the ITU http://en.wikipedia.org/wiki/ITU at 
128 kbit/s for stereo, and 320 kbit/s for 5.1 
http://en.wikipedia.org/wiki/5.1 audio.



Certainly 90s?

Source:
http://en.wikipedia.org/wiki/Advanced_Audio_Coding

I personally think that the AAC stereo/5.1 rates (128/320 kbps 
respectively) given above are not quite enough to be considered to be 
reallt transparent, but at least close (near-transparent). I have 
recommended higher rates anyway, to be consistent with my own 
believings. :-)
So, the recommendation was 80kbps/channel for symmetric coding 
4-channel encoding, or maybe 96kbps/s for L/R and 64kbps for T/Q in the 
asymmetric 4-channel case. With the help of existing and widely used 
container formats like .M4p or Ogg, you could get around the 320kbps 
limit of current stereo AAC decoders. The use of a container extension 
implies that you would have some .aac stereo file inside, and another 
container containing ;-)  the UHJ extensions. Extract the .aac part from 
the .m4p container, or set the filepath to the correct location. This 
prison break solution doesn't work for streaming, though...


Just giving a bit more general background...)

Best,

Stefan

P.S.: AAC itself could and should go much higher than 320kbps, in the 
case of multichannel applications. Unfortunately, even the EBU was  not 
able  to test AAC above 320kbps some years ago. I personally believe 
this is kind of embarassing, because it shows the low status of 
surround sound. (A fictive log from the EBU test conversations: Oh 
damned, our AAC 5.1 encoder didn't accept any higher rate than 320kbps! 
We can't compare to Dolby DD at 448kbps, 'cos look, there are different 
bitrates and this should not be compared!  This is because we tested AAC 
encoding always for stereo input be4. Don't worry, we will fix this 
problem in maybe two years.)






On 02/08/2013 17:09, Martin Leese wrote:


...


- The UHJ article already mentions that the T channel could be
bandwidth-limited.


Geoffrey Barton said some time ago that a
bandwidth-limited T-channel resulted in some
unwelcome compromises in the design of the
3-channel UHJ decoder.  This may not be
such a problem with software decoders as you
could just include two separate decoders, one
for 2.5 channels and another for 3.  However,
this would mean a lot more work.

I question whether the gain from band-limiting
T is worth the pain.




___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound



___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-03 Thread Stefan Schreiber

Aaron Heller wrote:


Hi Stephan,

Please note:

AAC/HE-AAC profile 1 uses Spectral Band Replication, which means that top
octave information is generated from lower frequency content using hints.
I'm unsure of the impact this would have on ambisonic decoding.   I guess
one could filter out the replicated contents and treat it as a band-limited
channel.

AAC/HE-AAC profile 2 uses parametric stereo, which is similar to Ogg Vorbis
Square Polar Mapping (described here http://xiph.org/vorbis/doc/stereo.html).
This destroys phase information and I think would be unstable for
ambisonic content.  Can it be turned off in the encoder?

Aaron Heller (hel...@ai.sri.com)
Menlo Park, CA  US
 


(HE-AAC, Vorbis, Opus, FLAC)

To step things a bit up:

http://people.xiph.org/~greg/opus/ha2011/

The comparison of HE-AAC and Opus happens here at 32kbps/channel. (I was 
talking about transparent or near-transparent bitrates, but if we just 
talk about streaming or mobile streaming, say at 128kbps, you still have 
some options... )


Opus is an official Internet (IETF) format, by now. I am not ignoring 
Opus, FLAC etc., but wrote my first posting (mainly) from an AAC 
perspective, because we talked about  established ways of audio 
delivery .


(The proposed format is basically codec and format agnostic. It is the 
stereo backward-compatible version of B format at 1st order, which could 
easily be extended to 3rd order, at least from a theory perspective. 
Important is that there is always some direct relationship between the B 
format and the stereo-extension  UHJ version. XYWZ    LRTQ is 
therefore the same, the latter a different presentation of XYWZ. You 
could extend this scheme to Fu-Malham B format up to 3rd order, 
introducing the corresponding UHJ versions. If you use 3rd order 
horiz. or 3h1p variants which are backward-compatible to stereo, you 
don't have more channels than for Dolby/DTS 7.1 variants, and the 
bitrates won't have to be higher. You could easily have all this at 
640kbps, if we talk about 7-8 channels. Note that this is just some 
framework, not necessarily the 1st step. Don't kill the messenger - i.e. 
me! Philosophically and mathematically speaking these things have always 
existed somewhere... Ok, this was maybe just the Platonic view.  :-) )


Gregory Maxwell and colleagues:
1. Does the Opus  format (sic!) allow 2 audio channels and (in some 
form) data extensions? (Audio data, I might add.)


2. What will be the container format for Opus? (I heard: Ogg)

3. What is the situation for FLAC? File format? Container format?

(It is always best to ask typical questions like these to the format 
developpers themselves. I didn't ask for the 2.5 channel case, to 
simplify the following discussions, if they should happen.)


Thanks

Stefan

___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-02 Thread Martin Leese
Stefan Schreiber wrote:
...
 To offer a backward-compatible extension of a  UHJ extended  AAC
 stereo file, you would have to include the T and Q audio channels as 3rd
 or 4th audio stream, somewhere. (Probably you could label such a file
 as stereo, the first 2 channels being L and R. Include some tags/flags
 in the header that there are one or two further  extension  audio
 channels, which would have to be decoded by a UHJ decoder. The decoder
 could be an app running on a smartphone, and the output could be a
 binaural version of the surround or actually LRTQ 3D audio recording.)

 If this audio channels approach doesn't work, use the data
 extensions of .mp4. (T and Q are not direct audio channels, so this
 might actually  be the formally correct approach... Because T and Q go
 into some decoder, as extension  data .)

The sections quoted above are the key, to my
mind.  A problem with 3- or 4-channel UHJ is,
what do decoders that are unaware of
Ambisonics do with the extra channels?  With
other file formats, they would treat the file as
multi-channel and mix all the channels down to
stereo.  With T and Q included in the mix, this
would produce a mishmash.

This problem of inadvertent mix down is why I
have been pushing for so long (without any
success) for a way to specify in multi-channel
files the preferred mix down to stereo.  See,
for example:
http://members.tripod.com/martin_leese/Audio/StereoMix_chunk.html

Somebody would need to produce AAC test
files containing T and T+Q, and see what
existing stereo decoders actually do.  If existing
decoders cannot be made to ignore T and Q
(by fiddling with the file format) then the idea of
including T and Q is dead.

...
 - The UHJ article already mentions that the T channel could be
 bandwidth-limited.

Geoffrey Barton said some time ago that a
bandwidth-limited T-channel resulted in some
unwelcome compromises in the design of the
3-channel UHJ decoder.  This may not be
such a problem with software decoders as you
could just include two separate decoders, one
for 2.5 channels and another for 3.  However,
this would mean a lot more work.

I question whether the gain from band-limiting
T is worth the pain.

Regards,
Martin
-- 
Martin J Leese
E-mail: martin.leese  stanfordalumni.org
Web: http://members.tripod.com/martin_leese/
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-02 Thread Stefan Schreiber

Martin Leese wrote:


Stefan Schreiber wrote:
...
 


To offer a backward-compatible extension of a  UHJ extended  AAC
stereo file, you would have to include the T and Q audio channels as 3rd
or 4th audio stream, somewhere. (Probably you could label such a file
as stereo, the first 2 channels being L and R. Include some tags/flags
in the header that there are one or two further  extension  audio
channels, which would have to be decoded by a UHJ decoder. The decoder
could be an app running on a smartphone, and the output could be a
binaural version of the surround or actually LRTQ 3D audio recording.)

If this audio channels approach doesn't work, use the data
extensions of .mp4. (T and Q are not direct audio channels, so this
might actually  be the formally correct approach... Because T and Q go
into some decoder, as extension  data .)
   



The sections quoted above are the key, to my
mind.  A problem with 3- or 4-channel UHJ is,
what do decoders that are unaware of
Ambisonics do with the extra channels?  With
other file formats, they would treat the file as
multi-channel and mix all the channels down to
stereo.  With T and Q included in the mix, this
would produce a mishmash.

This problem of inadvertent mix down is why I
have been pushing for so long (without any
success) for a way to specify in multi-channel
files the preferred mix down to stereo.  See,
for example:
http://members.tripod.com/martin_leese/Audio/StereoMix_chunk.html
 



This is meta data and channel denomination hell, but actually these 
problems don't matter so much in our case. Because the preferred 
downmix is already in LR, you have only one or two more channels which 
you have to embed without breaking existing decoders (hardware and 
software).



Somebody would need to produce AAC test
files containing T and T+Q, and see what
existing stereo decoders actually do.





 If existing
decoders cannot be made to ignore T and Q
(by fiddling with the file format) then the idea of
including T and Q is dead.
 



http://en.wikipedia.org/wiki/Advanced_Audio_Coding

AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio 
channels in one stream plus 16 low frequency effects (LFE, limited to 
120 Hz) channels, up to 16 coupling or dialog channels, and up to 16 
data streams.


Probably you would use the data channels, to be on the safe side of 
backward-compatinilty. This is if you stay in AAC format, which can be 
used as audio part in some M4P. (M4P is the most general term. MP4 would 
be some AVC video file with some audio, and maybe additional data like 
subdata etc. You could have timecodes and so on.)



I would propose two solutions anyway, because .aac and .m4a/.m4p offer 
different possibilities and actually applications. (You could have more 
than 4 channels in .m4p, if needed.)


Best,

Stefan

P.S.: .M4P is a better format than Wave-EX. It is a container format 
which allows you to extend both much easier and more flexible, forget 
all this chunk stuff which might work or not.


The ISO doesn some good standardization, once more...
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-02 Thread Aaron Heller
Hi Stephan,

Please note:

AAC/HE-AAC profile 1 uses Spectral Band Replication, which means that top
octave information is generated from lower frequency content using hints.
 I'm unsure of the impact this would have on ambisonic decoding.   I guess
one could filter out the replicated contents and treat it as a band-limited
channel.

AAC/HE-AAC profile 2 uses parametric stereo, which is similar to Ogg Vorbis
Square Polar Mapping (described here http://xiph.org/vorbis/doc/stereo.html).
 This destroys phase information and I think would be unstable for
ambisonic content.  Can it be turned off in the encoder?

Aaron Heller (hel...@ai.sri.com)
Menlo Park, CA  US


On Fri, Aug 2, 2013 at 9:39 AM, Stefan Schreiber st...@mail.telepac.ptwrote:

 Martin Leese wrote:

  Stefan Schreiber wrote:
 ...


 To offer a backward-compatible extension of a  UHJ extended  AAC
 stereo file, you would have to include the T and Q audio channels as 3rd
 or 4th audio stream, somewhere. (Probably you could label such a file
 as stereo, the first 2 channels being L and R. Include some tags/flags
 in the header that there are one or two further  extension  audio
 channels, which would have to be decoded by a UHJ decoder. The decoder
 could be an app running on a smartphone, and the output could be a
 binaural version of the surround or actually LRTQ 3D audio recording.)

 If this audio channels approach doesn't work, use the data
 extensions of .mp4. (T and Q are not direct audio channels, so this
 might actually  be the formally correct approach... Because T and Q go
 into some decoder, as extension  data .)







 Somebody would need to produce AAC test
 files containing T and T+Q, and see what
 existing stereo decoders actually do.  If existing
 decoders cannot be made to ignore T and Q
 (by fiddling with the file format) then the idea of
 including T and Q is dead.



 Certainly, but I see many ways to achieve this.

 Note that .aac is one thing, and .m4a and .m4p as container formats are
 something different. (Because Apple seems to mix these things a bit up, a
 decoder will play a aac stereo file in any of these variants, and it will
 be the same thing anyway. Speaking of extensions, it is not always the same
 thing. )


  ...


 - The UHJ article already mentions that the T channel could be
 bandwidth-limited.



 Geoffrey Barton said some time ago that a
 bandwidth-limited T-channel resulted in some
 unwelcome compromises in the design of the
 3-channel UHJ decoder.  This may not be
 such a problem with software decoders as you
 could just include two separate decoders, one
 for 2.5 channels and another for 3.  However,
 this would mean a lot more work.

 I question whether the gain from band-limiting
 T is worth the pain.



 No, I already wrote it is not worth it. (Better to use a lower AAC/HE-AAC
 bitrate for the full T/Q channel/channels, IMO.)


 Best,

 Stefan

 P.S.: Of course you would have to prove such a concept. If you have at
 least three ways to fiddle and two ways don't use hidden audio channels
 at all, things should really work.

 http://en.wikipedia.org/wiki/**MPEG-4_Part_14http://en.wikipedia.org/wiki/MPEG-4_Part_14

 The existence of two different filename extensions, .MP4 and .M4A, for
 naming audio-only MP4 files has been a source of confusion among users and
 multimedia playback software. Some file managers, such as Windows Explorer,
 look up the media type and associated applications of a file based on its
 filename extension. But since MPEG-4 Part 14 is a container format, MPEG-4
 files may contain any number of audio, video, and even subtitle streams,
 making it impossible to determine the type of streams in an MPEG-4 file
 based on its filename extension alone. In response, Apple Inc. started
 using and popularizing the .m4a filename extension, which is used for MP4
 containers with audio data in the lossy Advanced Audio Coding (AAC) or its
 own lossless Apple Lossless (ALAC) formats. Software capable of audio/video
 playback should recognize files with either .m4a or .mp4 filename
 extensions, as would be expected, since there are no file format
 differences between the two.

  Almost any kind of data can be embedded in MPEG-4 Part 14 files through
 private streams. A separate hint track is used to include streaming
 information in the file.



 Which is the option which leads to = 320kbps mode, as well. (I could
 figure this out. Not necessary as response to your posting.)

 -- next part --
 An HTML attachment was scrubbed...
 URL: https://mail.music.vt.edu/**mailman/private/sursound/**
 attachments/20130802/49823d7a/**attachment.htmlhttps://mail.music.vt.edu/mailman/private/sursound/attachments/20130802/49823d7a/attachment.html
 

 __**_
 Sursound mailing list
 Sursound@music.vt.edu
 https://mail.music.vt.edu/**mailman/listinfo/sursoundhttps://mail.music.vt.edu/mailman/listinfo/sursound

-- next part --
An 

Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-08-02 Thread Stefan Schreiber

Apple uses HE-AAC and  doesn't  use SBR, at least this is my impression.

In my understanding, this is a more a tool you can use at low rates, to 
obtain a (perceptual) improved result. Speaking of 64kbps/channel and 
above (I showed cases with 80kbps/channel and actually 96kbps for L/R, 
64kbps for T/Q), you won't need SBR.


Parametric even less.


Can it be turned off in the encoder?



Of course!

Who uses PS (parametric stereo) anyway? Digital AM radio maybe, and then?

Don't see so many problems where we don't have any!  (HE-AAC is an 
extension of the AAC toolbox, but would you use low-bitrate techniques 
for high bitrates? The answer is no... )


Best,

Stefan


Aaron Heller wrote:


Hi Stephan,

Please note:

AAC/HE-AAC profile 1 uses Spectral Band Replication, which means that top
octave information is generated from lower frequency content using hints.
I'm unsure of the impact this would have on ambisonic decoding.   I guess
one could filter out the replicated contents and treat it as a band-limited
channel.

AAC/HE-AAC profile 2 uses parametric stereo, which is similar to Ogg Vorbis
Square Polar Mapping (described here http://xiph.org/vorbis/doc/stereo.html).
This destroys phase information and I think would be unstable for
ambisonic content.  Can it be turned off in the encoder?

Aaron Heller (hel...@ai.sri.com)
Menlo Park, CA  US


On Fri, Aug 2, 2013 at 9:39 AM, Stefan Schreiber st...@mail.telepac.ptwrote:

 


Martin Leese wrote:

Stefan Schreiber wrote:
   


...


 


To offer a backward-compatible extension of a  UHJ extended  AAC
stereo file, you would have to include the T and Q audio channels as 3rd
or 4th audio stream, somewhere. (Probably you could label such a file
as stereo, the first 2 channels being L and R. Include some tags/flags
in the header that there are one or two further  extension  audio
channels, which would have to be decoded by a UHJ decoder. The decoder
could be an app running on a smartphone, and the output could be a
binaural version of the surround or actually LRTQ 3D audio recording.)

If this audio channels approach doesn't work, use the data
extensions of .mp4. (T and Q are not direct audio channels, so this
might actually  be the formally correct approach... Because T and Q go
into some decoder, as extension  data .)


   





Somebody would need to produce AAC test
files containing T and T+Q, and see what
existing stereo decoders actually do.  If existing
decoders cannot be made to ignore T and Q
(by fiddling with the file format) then the idea of
including T and Q is dead.



 


Certainly, but I see many ways to achieve this.

Note that .aac is one thing, and .m4a and .m4p as container formats are
something different. (Because Apple seems to mix these things a bit up, a
decoder will play a aac stereo file in any of these variants, and it will
be the same thing anyway. Speaking of extensions, it is not always the same
thing. )


...
   

 


- The UHJ article already mentions that the T channel could be
bandwidth-limited.


   


Geoffrey Barton said some time ago that a
bandwidth-limited T-channel resulted in some
unwelcome compromises in the design of the
3-channel UHJ decoder.  This may not be
such a problem with software decoders as you
could just include two separate decoders, one
for 2.5 channels and another for 3.  However,
this would mean a lot more work.

I question whether the gain from band-limiting
T is worth the pain.


 


No, I already wrote it is not worth it. (Better to use a lower AAC/HE-AAC
bitrate for the full T/Q channel/channels, IMO.)


Best,

Stefan

P.S.: Of course you would have to prove such a concept. If you have at
least three ways to fiddle and two ways don't use hidden audio channels
at all, things should really work.

http://en.wikipedia.org/wiki/**MPEG-4_Part_14http://en.wikipedia.org/wiki/MPEG-4_Part_14

The existence of two different filename extensions, .MP4 and .M4A, for
naming audio-only MP4 files has been a source of confusion among users and
multimedia playback software. Some file managers, such as Windows Explorer,
look up the media type and associated applications of a file based on its
filename extension. But since MPEG-4 Part 14 is a container format, MPEG-4
files may contain any number of audio, video, and even subtitle streams,
making it impossible to determine the type of streams in an MPEG-4 file
based on its filename extension alone. In response, Apple Inc. started
using and popularizing the .m4a filename extension, which is used for MP4
containers with audio data in the lossy Advanced Audio Coding (AAC) or its
own lossless Apple Lossless (ALAC) formats. Software capable of audio/video
playback should recognize files with either .m4a or .mp4 filename
extensions, as would be expected, since there are no file format
differences between the two.

Almost any kind of data can be embedded in MPEG-4 Part 14 files through
   


private streams. A separate hint track is used to include 

Re: [Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-07-31 Thread Stefan Schreiber

Michael Chapman wrote:


(Continuation of: The commercial future of Ambisonics, 15/5/2013)

   


[ ... ]



If this is a proposed standard, then I would say:
-BHJ (2 channel) should not be used
 

I do agree. (Unless for legacy reasons, because sometimes the B format 
source might actually have been lost. Which means the BHJ version is all 
what rests from your Ambisonic recording or mix.)



-SHJ (2.5 channel) should not be used (is bandwidth really a problem,
except for radio, these days?)
 



Just some ammendment to my former posting: The proposed UHJ extended 
stereo file standard (AAC etc.)  would certainly work well for DAB 
radio, which actually would imply DAB+ in this case. (DAB+ is using 
AAC/HE-AAC, within DAB spectrum and modulation standards.)


However, DAB (and generally speaking, digital terrestrial radio) has not 
exactly been some roaring success, and this after so many years of 
existence...  From a consumer-perspective, it doesn't add anything 
really new to FM, because FM and DAB both (functionally) offer the 
same thing: stereo broadcast. (Ok, you have the possibility to transmit 
more radio channels. And much more if you lower the transmission 
quality, which tends to happen because the radio station will love to 
pay less for less used bandwidth. In the end, people might think that 
good old FM radio sounds actually better than most DAB stations, and...)


If you include the capability to offer surround/3D audio transmission, 
DAB+ would offer something more than FM. You could say this would 
present some real advantage, especially if you see how many people 
listen to radio via headphones. Radio stations have experimented with 
UHJ, Stereo Surround, Dolby Digital and (parametric)  Mpeg Surround, the 
most recent attempt to transmit surround sound via terrestrial radio.


I am modest enough to state that the UHJ/AAC proposal is actually better 
than anything else above... thanks to the combined power of modern audio 
codecs (DAB+ uses HE-AAC) and UHJ/LRTQ!


(This was just a bit of the necessary PR work, for the lurking radio 
broadcasters on this list which might have to convince some bean 
counters...  Why should you invest some money and actually some time 
into anything  at all ?  The best or most practical  solution is 
always if you - as a bean counter or actually manager - decide that some 
idea doesn't have any merit, or is just not practical. Then you don't 
have to work, which is a huge advantage! Philosophically speaking, this 
is the principle of  least effort . You have thought about something 
in a thorough way and during a long time, but... Don't go there! In 
fact, close the case, and archive the project and recordings 
somewhere...:-)  This is of course how things should  not  be.
End of the small philosophical investigation.)



I am aware that a lot of  radio transmission/reception happens 
(nowadays) via Internet transmission. This form of radio broadcast 
belongs also to the area of Internet streaming, which I did mention.



-THJ (3 channel) should only be used if the original material is three
channel
-PHJ (4 channel) is preferred
 



Yes, but only if you have some real 3D audio recording, which means the 
Z channel is not empty.



It would need some neat little standalone UHJ--B-format decoders writing ...
(But that _could_ (unideally) progress whilst people beenfitted from just
L/R.)

Anyone able to comment on why the UHJ (2 channel) buttons on Ambisonia
were (?)never made active?
 

If you can offer the real thing (3- or 4-channel Ambisonics) within the 
current transmission  channels/frameworks, forget about 2-channel UHJ. 
(Which had been developped to fit Ambisonic/surround transmission into 
the then existing analogue distribution models, which were all 2-channel 
based.)
The 2.5/SHJ variant was an experiment by the broadcasters which didn't 
make it to many if any radio listeners. (The existing SHJ decoders were 
probably bought only by the radio stations/broadcasters themselves. I 
actually have  never  heard of  any UHJ decoder capable to decode  
more than 2 channels. In our case we don't have the problem to install a 
base of analogue decoders, because we definitively talk about some 
decoder programs/apps, i.e. software. Any UHJ decoder software should be 
able to decode BHJ, but IMO mainly for legacy reasons. Add THJ and 
PHJ, as new standard case. As Michael writes, you would transcode UHJ 
to B format, so to WXYZ presentation. And some smartphone app would have 
to decode this to some binaural representation. To state the obvious: We 
would use UHJ/LRQT just because to become backward-compatible to stereo. 
And we transmit B format over  extended LR stereo , including one or 
two additional audio streams. Which from the perspective of existing 
file and/or container formats might be labeled as  extension data 
streams .)



II
Yes, we are hitting our head on the ceiling with FuMa.
Personally, though, if we are to move on I would 

[Sursound] Two new approaches for the distribution of surround sound/3D audio

2013-07-28 Thread Stefan Schreiber

(Continuation of: The commercial future of Ambisonics, 15/5/2013)


Dear colleagues,

following the recent standardization of 3D audio by Mpeg (ISO/IEC 
23008-3) and related activities, I have come to the conclusion that the 
(older) B format up to 3rd order might need some updates.


However, I also came  to the conclusion that FOA (first order 
Ambisonics) could be easily included into all current distribution 
models for audio in the Internet, which are (to 99.98%) stereo-based. 
We nearly have been there,  in the above cited thread! (The 
commercial future of Ambisonics)


I will start with this part, because you can see this as an own format. 
Which might be the perfect bridge or transition format for future 
surround/3D audio (3DA) formats...



I. UHJ (surround/3D audio) as extension of stereo based files 
(distribution via Internet, on discs and streaming, including YouTube, 
Spotify etc.)


a) As Richard Elen (and me) have suggested, you could distribute 
surround sound and 3D audio as (relativey simple) extension of (UHJ 
encoded) stereo files. You would have to add to a stereo file (.aac 
file, for example) a 3rd audio channel, OR two audio channels, as an 
extension audio stream. The restriction was that these extension would 
have to fit into the current distribution models, say downloadable AAC 
files via iTunes.


Contrary to my/our first impressions, this is firstly  possible (this 
has already been pretty clear), and secondly feasible  without any 
serious drawbacks . Which will be shown...


b) Technically speaking, you would have to distribute the 
(downsampled) stereo file of FOA, which contains some surround 
information, and the one or two audio extension streams.


This is of course UHJ, brought into some AAC extension scheme.

http://en.wikipedia.org/wiki/Ambisonic_UHJ_format

Although UHJ permits the use of up to four channels (carrying 
full-sphere with-height surround), only the 2-channel variant is in 
current use (as it is compatible with currently-available 2-channel 
media). In Ambisonics, UHJ is also known as C-Format.



(Small potential problem:

UHJ was developed by the Ambisonic team, incorporating work done by the 
BBC (on their quadraphonic system, Matrix H) and Duane Cooper (on Nippon 
Columbia's UD-4/UMX quadraphonic system) and others, and building on the 
then-current version of Ambisonics, System 45J. The initials indicate 
some of sources incorporated into the system: U from Universal (UD-4); H 
from Matrix H; and J from System 45J.


This means you  might  think about an update of UHJ, to achieve more 
consistency between B format and the UHJ scheme. Or you might leave 
things how they are defined,  for historical reasons. In any case, you 
have to be aware of this...

)


Although an hierarchically extended version of UHJ stereo has been 
tested in the area of FM broadcasting, nobody hass tried to distribute 
UHJ (hierarchically) extended stereo files via the Internet. Which is 
just a head-banging fact... Or maybe there are some deeper reasons?!


If a third channel (T) is available, this can be used to give improved 
localisation accuracy to the planar surround effect when decoded via a 
3-channel UHJ decoder. The third channel does not have to have full 
audio bandwidth for this purpose, leading to the possibility of 
so-called 2½-channel systems, where the third channel is 
bandwidth-limited to 5 kHz. The third channel can be broadcast via FM 
radio, for example, by means of phase-quadrature modulation. This 
configuration was tested by the Independent Broadcasting Authority 
(IBA) in the United Kingdom as a method of broadcasting surround 
recordings. 2½ or 3-channel UHJ delivers the same accuracy as 
3-channel (WXY) B-Format



Adding a fourth channel (Q) to the UHJ system allows the encoding of 
full surround sound with height, known as Periphony, with a level of 
accuracy identical to 4-channel B-Format.




c) UHJ extended AAC files

AAC allows up to 16 audio channels, and can include 16 data channels. (I 
believe that .aac as a  file format  is just .m4a, or  .mp4.)


To offer a backward-compatible extension of a  UHJ extended  AAC 
stereo file, you would have to include the T and Q audio channels as 3rd 
or 4th audio stream, somewhere. (Probably you could label such a file 
as stereo, the first 2 channels being L and R. Include some tags/flags 
in the header that there are one or two further  extension  audio 
channels, which would have to be decoded by a UHJ decoder. The decoder 
could be an app running on a smartphone, and the output could be a 
binaural version of the surround or actually LRTQ 3D audio recording.)


If this audio channels approach doesn't work, use the data 
extensions of .mp4. (T and Q are not direct audio channels, so this 
might actually  be the formally correct approach... Because T and Q go 
into some decoder, as extension  data .)



d) Bitrate limits

Whereas Apple uses 256 kbps (VBR) as current standard (they have used