Re: [music-dsp] FIR blog post & interactive demo

Ethan Duni Wed, 18 Mar 2020 11:09:14 -0700

Hi Eric

I'm sure your filterbank EQ sounds fine. Aliasing should be contained to a
very low level if appropriate windows/overlap are used and the filter
response isn't pushed to any extremes.


But, zero-phase (offline) processing is straightforward to achieve with
FIR. You just do a linear-phase design, and then compensate the output by
exactly that delay. Opinions differ as to whether linear phase is
preferable to minimum phase in audio terms, due to issues like pre-echo.
This is a moot point for mild EQ settings, which is why you most often see
linear/zero phase EQs used in mastering contexts.

Zero-phase (offline) IIR can be also be achieved by filtering in two
passes: one forward in time, and the other backwards in time. However this
requires the filter design to be a "square root" of the desired response,
so there is some extra work compared to the FIR flavor.

But, note that linear phase FIR requires the roots to come in reciprocal
pairs, which is an equivalent "squared response" constraint. I.e., you can
design a linear phase FIR by starting with a min-phase FIR of half the
length (and square root of the desired response), and then convolving it
with its time-reversal, just as in the IIR case.

Ethan

On Sun, Mar 15, 2020 at 6:06 PM Zhiguang Eric Zhang <zez...@nyu.edu> wrote:

>
> Hi Ethan,
>
>
> It's been a few years since I've ran or heard this FFT filterbank EQ.  I
> do remember it being quite clean, indeed, I chose to work on it precisely
> because I realized that it could be designed to be zero-phase (meaning no
> phase distortion like you get from traditional FIR/IIR eqs).
>
> The 'perfect reconstruction' dogma is tricky because you have to remember
> that we are also discussing this in the context of audio coding and
> compression, where the coefficients are necessarily changed in order to get
> coding gain and also quantization noise (which is masked through
> application of the psychoacoustic model, etc).
>
> I urge you to take a look or even run the algorithm in my github with
> audio if you want to hear whether or not there is quantization noise from
> this FFT EQ or not (from changing the coefficients, etc).
>
>
> cheers,
> Eric Z
> https://www.github.com/kardashevian
>
> On Fri, Mar 13, 2020 at 6:18 PM Ethan Duni <ethan.d...@gmail.com> wrote:
>
>> On Thu, Mar 12, 2020 at 9:35 PM robert bristow-johnson <
>> r...@audioimagination.com> wrote:
>>
>>>  i am not always persuaded that the analysis window is preserved in the
>>> frequency-domain modification operation.
>>
>>
>> It definitely is *not* preserved under modification, generally.
>>
>> The Perfect Reconstruction condition assumes that there is no
>> modification to the coefficients. It's just a basic guarantee that the
>> filterbank is actually able to reconstruct the signal to begin with. The
>> details of the windows/zero-padding determine exactly what happens to all
>> of the block processing artifacts when you modify things.
>>
>> if it's a phase vocoder and you do the Miller Puckette thing and apply
>>> the same phase to a entire spectral peak, then supposedly the window shape
>>> is preserved on each sinusoidal component.
>>
>>
>> Even that is only approximate IIRC, in that it assumes well-separated
>> sinusoids or similar?
>>
>> The larger point being that preserving window shape under modification is
>> an exceptional case that requires special handling.
>>
>> for analysis, there might be other properties of the window that is more
>>> important than being complementary.
>>>
>>
>> That's true enough: this isn't as crucial in analysis-only as it is for
>> synthesis. Although, I do consider Parseval to be pretty bedrock in terms
>> of DSP intuition, and would not want to introduce frame-rate modulations
>> into analysis without a clear reason (of which there are many good
>> examples, don't get me wrong).
>>
>>
>>> if, after analysis, i am modifying each Gaussian pulse and inverse DFT
>>> back to the time domain, i will have a Gaussian window effectively on the
>>> output frame.  by multiplying by a Hann window and dividing by the original
>>> Gaussian window, the result has a Hann window shape and that should be
>>> complementary in the overlap-add.
>>>
>>
>> So, a relevant distinction here is whether an STFT filterbank uses
>> matching analysis and synthesis windows. The PR condition is that their
>> product obeys COLA.
>>
>> In the vanilla case, the analysis and synthesis windows are constrained
>> to match (actually they're time-reversals of one another, but that only
>> matters for asymmetric windows). Then, the PR condition is COLA on the
>> square of the (common) window, and the appropriate window is of "square
>> root" type, such as cosine. This is a "balanced" design, in that the
>> analyzer and synthesizer play equal roles in the windowing.
>>
>> Note that this matching constraint removes many degrees of freedom from
>> the window design. In general, for mismatched analysis and synthesis
>> windows, the PR condition is very "loose." For example, you can use
>> literally anything you want for the analysis window, provided the values
>> are finite and non-zero (negative is okay!). Then you can pick any COLA
>> window, and solve for the synthesis window as their ratio. In this way you
>> can design PR filterbanks with arbitrarily bad performance :P
>>
>> So for the mismatched case, we need some additional design principle(s)
>> to drive the window designs. Offhand, there seem to be two notable
>> approaches to this. One is that rectangular "windows" are desired on the
>> synthesis side in order to accommodate zero-padding/fast convolution type
>> operation. Then, the analysis window is whatever COLA window you care to
>> use for analysis purposes. As discussed, this is only appropriate for when
>> the modification is constrained to be a length-K FIR kernel.
>>
>> The other is like your Gaussian example where you want to use a
>> particular window for analysis/modification reasons, and then need to
>> square that with the PR condition on the synthesis side. The downside here
>> is that the resulting synthesis windows are not as well behaved in terms of
>> suppressing block processing artifacts. They tend to become
>> heavy-shouldered, exhibit regions of amplification, etc. This can be worth
>> it, but only if you gain enough from the analysis/modification properties.
>>
>>
>>> > Rectangular windows are a perfectly valid choice here, albeit one with
>>> poor sidelobe suppression.
>>>
>>> but it doesn't matter with overlap-add fast convolution.  somehow, the
>>> sidelobe effects come out in the wash, because we can insure (to finite
>>> precision) the correctness of the output with a time-domain analysis.
>>>
>>
>> Right, the rectangular windows are not being used for spectral estimation
>> in the fast convolution context, so their spectral properties are
>> irrelevant. They just represent a time-domain accounting of what the
>> circular convolution is doing.
>>
>>
>>> so you're oversampling in the frequency domain because you're
>>> zero-padding in the time domain.
>>>
>>
>> Correct, zero-padding in the time domain is equivalent to upsampling in
>> the frequency domain.
>>
>>
>>> > Note that this equivalence happens because we are adding an additional
>>> time-variant stage (zero-padding/raw OLA), to explicitly correct for the
>>> time-variant effects of the underlying DFT operation. This is the block
>>> processing analog of upsampling a scalar signal by K so that we can apply
>>> an order-K polynomial nonlinearity without aliasing.
>>>
>>> where is this polynomial nonlinearity?  and i am still not groking the
>>> upsampling.
>>>
>>
>> There's no nonlinearity in STFT, I'm just making an analogy. We have some
>> process that we know will produce a finite amount of "out of band" output,
>> and so we "upsample" by exactly that amount to avoid aliasing. Just as a
>> polynomial nonlinearity requires frequency-domain headroom (upsampling), so
>> does a circular convolution require time-domain headroom (zero-padding).
>> It's the same basic engineering idea, just applied to different flavors of
>> aliasing (non-linearity vs time-variance).
>>
>>
>>> i think so, but i wonder if i am understanding your "upsampling" and
>>> zero-padding.  we're zero-padding in the time domain (making the length
>>> longer to the DFT length N).  the factor of length increase in the time
>>> domain is the factor of oversampling in the frequency domain.  a lot of
>>> people make the sub-optimal decision to simply pad zeros equal to length of
>>> the frame of audio.  then doubling the length would be like inserting
>>> oversampled frequency DFT bins between each of the "original" spectral
>>> points.
>>>
>>
>> Right, so as I mentioned before there are two sources of oversampling
>> available. In a critically sampled design, the DFT size would equal the hop
>> size. This is the minimum required to represent the signal, and anything on
>> top of that represents redundancy in the filterbank representation (and so,
>> additional overhead). Of course, that implies there is no overlap, the
>> "windows" are rectangular, spectral analysis properties are poor, aliasing
>> is not suppressed in synthesis, etc.
>>
>> So the first layer of oversampling is overlap/windowing. This is powerful
>> because more data goes into each transform, and the windows explicitly
>> attenuate edge effects and control the spectral properties. Still, this
>> represents redundant overhead relative to the minimum require to represent
>> the signal. The other downside of overlap is that it corresponds to
>> latency, and reduces time resolution of the transform representation.
>>
>> (Generally, it is possible to critically sample in the overlapping case,
>> and this is where MDCT comes in. In this case time-domain aliasing occurs
>> even if you *don't* modify the coefficients, and you must design the system
>> to satisfy TDAC in addition to PR. But this is more for coding and less for
>> analysis/pvoc/filtering).
>>
>> The less powerful form of oversampling is zero-padding. Since there is no
>> additional data going into the transforms, there is no latency cost or real
>> increase in frequency resolution (it just interpolates the DFT of the
>> non-zero-padded case). What this buys is processing margin for applying
>> modifications in the frequency domain. And it can be helpful for analysis
>> if you want increased frequency resolution for some estimation task.
>>
>> Ethan
>> _______________________________________________
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.columbia.edu_mailman_listinfo_music-2Ddsp&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=w_CiiFx8eb9uUtrPcg7_DA&m=7-icaWMcTwvXvF8Fcyx4gvyGBTfkv7w6pBLgyaf_xKI&s=PG12KeNDPno_1FBQ3Sthty-j2SvtPXxFsi4E2xsDCxA&e=
>
> _______________________________________________
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp

_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] FIR blog post & interactive demo

Reply via email to