Hi all,
a quick and dirty text version of my investigation into the mdct. (the
equations are in TeX form)
check out the PS,html version for proper equations
on www.cryogen.com/mikecheng
my site also includes the source codes for various mdct's as well as
some general fft stuff.
later
mike
-
The Modified Discrete Cosine Transform (MDCT) and MPEG Audio encoding.
Mike Cheng ([EMAIL PROTECTED]) [version 1.0]
1 Overview
The MDCT is a lapped orthogonal transform used in signal processing.
More traditional block transforms for signal processing (such as the
DCT or FFT) suffer from blocking artefacts arising because of the
independent processing of each block [2]. Lapped transforms, however,
have a 50% overlap between successive blocks which results in much
reduced artefacing.
2 The MDCT in MPEG Audio Encoding
In MPEG audio encoding (layer III) the MDCT is used is of size N=36.
For 36 input samples (coming from the polyphase filterbank) it produces
18 output values.
If the input samples are in array x_{i} then the samples are first
windowed by a function dependent on the block type involved and the
results placed in z_{i}.
For block type = 0 (normal block)
z_{i}=x_{i}\sin (\frac{\pi }{36}(i+\frac{1}{2}))\qquad for\quad i=0\ldots 35
For block type = 1 (start block)
z_{i}=x_{i}\sin (\frac{\pi }{36}(i+\frac{1}{2}))\qquad for\quad i=0\ldots 17
z_{i}=x_{i}\qquad for\quad i=18\ldots 23
z_{i}=x_{i}\sin (\frac{\pi }{12}(i-18+\frac{1}{2}))\qquad for\quad i=24\ldots 29
z_{i}=0\qquad for\quad i=30\ldots 35
For block type = 3 (stop block)
z_{i}=0\qquad for\quad i=0\ldots 5
z_{i}=x_{i}\sin (\frac{\pi }{12}(i-6+\frac{1}{2}))\qquad for\quad i=6\ldots 11
z_{i}=x_{i}\qquad for\quad i=12\ldots 17
z_{i}=x_{i}\sin (\frac{\pi }{36}(i+\frac{1}{2}))\qquad for\quad i=18\ldots 35
For block type =2 (short block), the 36 samples are first sectioned
into 3 overlapping blocks.
y_{i}^{(k)}=x_{i+6(k+1)}\qquad \mbox \{for\}\quad i=0\ldots 11\quad k=0\ldots 2
z_{i}^{(k)}=y_{i}^{(k)}\sin (\frac{\pi }{12}(i+\frac{1}{2}))\qquad \mbox
{for}\quad i=0\ldots 11\quad k=0\ldots 2
Once the input samples are windowed, the MDCT is performed according
to the following equation.
x_{i}=\sum ^{n-1}_{k=0}z_{k}\cos (\frac{\pi
}{2n}(2k+1+\frac{n}{2})(2i+1))\qquad \mbox {for}\quad i=0\ldots \frac{n}{2}-1
3 Fast Implementations of the MDCT
3.1 ISO Dist10 Code
The original ISO dist10 source code implements the MDCT algorithm as-is.
This requires about 1300 additions and 650 multiplications.
3.2 LAME
For the LAME project [7] the MDCT was unrolled by hand (by
[EMAIL PROTECTED]).
By exploiting the symmetry of the trig factors involved in the calculation,
successive multiplication operations by the same factor were replaced
with additions. This bought the computation count down to 244 multiplies
and 324 additions and cut the MDCT() time by about 70%.
3.3 Sporer, Brandenburg and Edler
Sporer et al [3], present a fast implementation of the MDCT. Unfortunately
for MPEG Audio Layer III encoding, the algorithm presented is for
MDCTs of a size which is divisible by 8 (see Step 2 of method).
3.4 MDCT via an FFT
Bosse Lincoln's project [4] implements a MDCT after the work by Gluth
[5]. This seems to be an MDCT of low complexity. The algorithm involves
basically 3 steps
1. Re-organize the N inputs and do a bit of pre-twiddling and rotation
2. Do a complex FFT of size N/4
3. Do a bit of post-twiddling and re-ordering.
From the FFTW [6] source code, an optimized 9 (i.e. 36/4) element complex
FFT takes 40 multiplies and 80 additions.Taking into account the pre-
and post- calculations, this implementation was found to have about
the same computation demands as the LAME MDCT. (Bosse's implementation
can easily be optimized by building tables for trig values)
4 Conclusion
The LAME code is a messy but reasonably fast implementation of the
MDCT. An FFT-based MDCT is about the same level of computational complexity
as the LAME implementation.
Faster implementations may be possible using a technique similar to
Sporer's [3], but it may be difficult to find one for N\not =2^{n}.
5 References
(Excuse the style, I haven't done a bib file for it)
[1] ISO 11172-3 Information Technology - Coding of moving pictures
and associated audio for digital storage media at up to about 1.5Mbit/s.
Part 3: Audio.
[2] Malvar, H.S. Signal Processing with Lapped Transforms, 1991.
[3] Sporer, Th., et al, The use of multirate filter banks for coding
of high quality digital audio, 6th European Signal Processing Conference
v1 pp211-214, Amsterdam, 1992
(http://www.lte.e-technik.uni-erlangen.de/~spo/eusipco.corrected.ps)
[4] Lincoln, Bosse., MUS420 Project, 1997 http://www-ccrma.stanford.edu/~bosse/
[5] Gluth, R. Regular FFT-related transform kernels for DCT/DST-based
polyphase filter banks, in ICASSP 1991 v3 pp2205-8
[6] FFTW - Fastest Fourier Transform in the West. http://www.fftw.org/
[7]