Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Evan Balster
Ross' explanation is solid.

Robert:  R refers to range ("delay line" size, one could say) and N refers
to signal length.


Clarifying about the weird linear/binary search:  It's necessary to
maintain O(N) complexity for processing N samples.  This took me a while to
grasp:

If we do a backward linear search (as in the original Lemire algorithm) we
get that O(N) complexity but get an O(R) worst case per sample.  Lemire's
paper explains why the former is the case.

If we do a binary search, we get O(log2(R)) worst case per sample, but the
total complexity is O(N*log2(R)) in the worst case where the signal is
monotonically decreasing---because our wedge will be size R at all times.

If we do a linear search over the first log2(R) samples, though, we get the
best of both worlds:  The Lemire-Fenn algorithm.  The reasoning is that
once we've done log2(R) steps of the linear search, performing a log2(R)
binary search only multiplies the work by a constant factor.  Think of this
as "clamping" the worst-case performance of the search to log2(R) without
increasing the complexity of cheaper search-and-prune operations.

(If anyone can explain that better, I invite them to!)


Re:  Using STL for DSP, I largely agree (though I'll often do it when I can
effectively control allocations).  The GitHub code is suitable for
high-level use and as a reference implementation.

– Evan Balster
creator of imitone 

On Sat, Sep 3, 2016 at 12:42 AM, Ross Bencina 
wrote:

> On 3/09/2016 2:14 PM, robert bristow-johnson wrote:
>
>> and in discussing iterators, says nothing about push_back()
>> and the like.
>>
>
> push_back(), push_front(), pop_back(), pop_front() are methods generally
> available on queue-like containers.
>
>
> from online i can get an idea, but it seems to me to be a
>> LIFO stack, not a FIFO buffer.  so a sample value gets pushed onto this
>> thing and pop_back() pops it off, but how does the oldest sample get
>> pushed off the front?  what makes this vector a circular FIFO thingie?
>>
>
> What you're missing is that the code to drop the oldest samples is omitted
> from my example entirely, and is in a separate file in Evan's code.
>
> You're kinda right though, there's a LIFO process that deals with the
> incoming data, and old samples drop off the far end of the queue (FIFO)
> when they age beyond the window size. The running maximum is always at the
> far end of the queue (since the sequence in the queue is decreasing)
>
> In my code, /front/ is the oldest value and /back/ is the newest value.
> The queue only contains the decresing segments, so it's a discontiguous
> history -- the oldest value in the queue is not usually windowSize samples
> old, it's probably newer than that.
>
> Each queue entry is a pair (value, index). The index is some integer that
> counts input samples.
>
> During decreasing segments, (value, index) are pushed on the back of the
> queue. During increasing segments, samples are trimmed off the back. (This
> is the LIFO part)
>
> Samples are dropped off the front when index is older than the current
> input index (This is the FIFO part).
>
>
> that said, the "10 lines of code" is deceptive.  it's 10 lines of code
>> with function calls.  you gotta count the cost of the function calls.
>>
>
> I agree that it's opaque. But a modern C++ compiler will inline everything
> and optimize away most of the overhead. I know this sounds like fanboy
> talk, but the C++ optimizers are surprisingly good these days.
>
> Personally I steer clear of STL for dsp code.
>
>
> now, Ross, this is Evan's diagram (from earlier today), but maybe you
>> can help me:
>>
>>
>> [image: Inline image 3]
>> http://interactopia.com/archive/images/lemire_algorithm.png
>>
>
> I read that with time running from left to right.
>
> The newest samples are added on the right, the oldest samples are dropped
> from the left.
>
> The red segments are the portions stored in the running max history buffer.
>
>
> The algorithm can safely forget anything in grey because it has been
>>> "shadowed" by newer maximum or minimum values.
>>>
>> >
>
>> what is not shown on the diagram is what happens when the current
>> running max value expires (or falls off the edge of the delay buffer)?
>>  when the current max value falls offa the edge, what must you do to
>> find the new maximum over the past N samples?
>>
>
> Let's call the edge "front". That's the leftmost sample in Evan's picture.
> So we expire that one, it falls of the edge, it's no longer there. Notice
> how the stored (red) samples are all in a decreasing sequence? So the new
> maximum over N samples is just the new "front".
>
>
> you would have to be riding the slope down on the left side of the peak
>> that just fell offa the edge and then how do you compare that value to
>> any peaks (that are lower for the moment) that have not yet expired?
>>
>
> If they're all guaranteed lower, you don't need to compare them.
>
> It would be impossible for them to be high

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 3/09/2016 5:00 PM, Evan Balster wrote:

Robert:  R refers to range ("delay line" size, one could say) and N
refers to signal length.


In that case R, is what I've been calling windowSize. and when I say 
O(1) I mean Evan's O(N).


Ross.
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread robert bristow-johnson



�
sorry to have to get to the basics, but there are no *two* length parameters to 
this alg. �there is only one.
� �define the streaming real-time input sequence as x[n]. �the length of this 
signal is infinite.
� �output of running max alg is
y[n]. �the length of this signal is infinite.
which is it?:
� �y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }
or
� �y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }
�

i've been under the impression it's the first one. using "N". �earlier 
i had been under the impression that you're processing R samples at a time, 
like processing samples in blocks of R samples. now i have no idea.

�

�

r b-j



 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Ross Bencina" 

Date: Sat, September 3, 2016 3:36 am

To: music-dsp@music.columbia.edu

--



> On 3/09/2016 5:00 PM, Evan Balster wrote:

>> Robert: R refers to range ("delay line" size, one could say) and N

>> refers to signal length.

>

> In that case R, is what I've been calling windowSize. and when I say

> O(1) I mean Evan's O(N).

>

> Ross.

> ___

> dupswapdrop: music-dsp mailing list

> music-dsp@music.columbia.edu

> https://lists.columbia.edu/mailman/listinfo/music-dsp

>

>





--
�


r b-j � � � � � � � � � � �r...@audioimagination.com
�


"Imagination is more important than knowledge."
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Evan Balster
Samples do not need to be processed in blocks.  You can push and pop them
one by one with no change in cost.  R would be the parameter you're
interested in---the length of the window---while N is the total number of
samples processed.

I describe N separate from R for various reasons.  One of these is
that you *don't
need* to drop samples from the front of the wedge to use the algorithm.
Provided sufficient storage, you can compute min/max information for a
signal of arbitrary length without taking more than O(N) total time.  (The
other reason is that it isn't conventional to think in terms of infinity
when describing algorithm complexity.)

A Lemire wedge describing the range (t-R, t] has all the information you
need to find the min and/or max in any range (t-R+n, t) where 0=R,
the complexity will be proportional to N in the worst case.  If your range
encompasses the whole signal, R equals N.  Regardless, the average
computation per sample is bounded by O(1) even in the worst case.  The
worst-case for an individual sample is O(log2(R)).

– Evan Balster
creator of imitone 

On Sat, Sep 3, 2016 at 11:49 AM, robert bristow-johnson <
r...@audioimagination.com> wrote:

>
>
> sorry to have to get to the basics, but there are no *two* length
> parameters to this alg.  there is only one.
>
>define the streaming real-time input sequence as x[n].  the length of
> this signal is infinite.
>
>output of running max alg is y[n].  the length of this signal is
> infinite.
>
> which is it?:
>
>y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }
>
> or
>
>y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }
>
> i've been under the impression it's the first one. using "N".  earlier i
> had been under the impression that you're processing R samples at a time,
> like processing samples in blocks of R samples. now i have no idea.
>
>
> r b-j
>
>
>  Original Message 
> Subject: Re: [music-dsp] efficient running max algorithm
> From: "Ross Bencina" 
> Date: Sat, September 3, 2016 3:36 am
> To: music-dsp@music.columbia.edu
> --
>
>
> > On 3/09/2016 5:00 PM, Evan Balster wrote:
> >> Robert: R refers to range ("delay line" size, one could say) and N
> >> refers to signal length.
> >
> > In that case R, is what I've been calling windowSize. and when I say
> > O(1) I mean Evan's O(N).
> >
> > Ross.
> > ___
> > dupswapdrop: music-dsp mailing list
> > music-dsp@music.columbia.edu
> > https://lists.columbia.edu/mailman/listinfo/music-dsp
> >
> >
>
>
> --
>
>
>
>
> r b-j  r...@audioimagination.com
>
>
>
>
> "Imagination is more important than knowledge."
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread robert bristow-johnson



�
okay, so it appears, all of this time, that i have mixed up the roles of N and 
R.




 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Evan Balster" 

Date: Sat, September 3, 2016 1:40 pm

To: "robert bristow-johnson" 

music-dsp@music.columbia.edu

--



> Samples do not need to be processed in blocks. You can push and pop them

> one by one with no change in cost. R would be the parameter you're

> interested in---the length of the window---while N is the total number of

> samples processed.
so for a streaming alg, like a limiter/compressor that runs at 96 kHz and has 
been running for hours, what would N be?
like N = infinity?


> I describe N separate from R for various reasons. One of these is

> that you *don't need*�

> to drop samples from the front of the wedge to use the algorithm.
if N = infinity, that will require a lotta memory.
and in a running max, the maximum is only a function of x[n] to x[n-R+1]. �how 
is it that you don't forget about x[n-R] and x[n-R-1] and x[n-R-2]?


> Provided sufficient storage, you can compute min/max information for a

> signal of arbitrary length without taking more than O(N) total time. (The

> other reason is that it isn't conventional to think in terms of infinity

> when describing algorithm complexity.)
i can't accept that at all.
especially for this mailing list. �we are discussing algs that run on music and 
audio signals. �sample rates might be as low as 22.05 kHz and might be as high 
as 192 kHz. �if it's not streaming
real-time, then the sound file might likely be 10 minutes long. �or a half 
hour. �it's gonna be a pretty big N. �much, much bigger than R.
a **running** max is not the same idea as the max of an entire file (like for 
normalization purposes). �a **running** max is about
what's been happening for precisely the most current R samples, x[n] to 
x[n-R+1]. �x[n-R] has fallen offa the edge of the delay line and we don't care 
about it. �in fact, it must not contribute to the output y[n]. �it *must* be 
forgotten about. �now, we get to have the history of
what has been happening in the past, then for each and every new sample, x[n], 
that comes in, what is the most efficient way to find the max value x[n-n0] for 
the most current R samples.
"N" doesn't come into consideration. �it has nothing to do with the algorithm, 
*unless*
perhaps, there is some efficiency that can be realized if processing samples in 
blocks of samples (where i think the blocksize is much less than R).
i would consider, for a good limiter or compressor running at 96 kHz, that R 
might be as high as 4K or 8 K or 16K. �maybe longer. �but
that is why we need it to be O(log2(R)) complexity (per sample). O(R) would be 
nasty for an alg running real-time.

>

> A Lemire wedge describing the range (t-R, t] has all the information you

> need to find the min and/or max in any range (t-R+n, t) where 0 just search for the first sample in the wedge that is not older than t+n.
is t+n in the future? �do you mean t-T+n? �what is "t"?


> For this reason, one Lemire wedge can do the work of several. Dropping

> samples from the front allows us to limit R and mitigate the worst-case for

> "spikes" in the algorithm.
it can't possibly do better than O(log2(R)) in these spikes. �if that is the 
case, it is no better than the binary tree.


> If you process R samples, the total computational complexity will be

> proportional to R in the worst case.
R is the length of the window. �what do you mean by "processing R samples"? 
�this is why i have been so confused by the semantics all of this time. �i am 
processing samples, either one at a time, or in blocks of, say, B
samples. �other than an overall delay of 2B samples (due to double buffering 
those blocks), the output of the algorithm must be completely agnostic about 
the value of B. �but if B>1 somehow makes it more efficient, either in the 
worst case, or in the average case, that would be of
interest.
�
> If you process N samples, where N>=R,
> the complexity will be proportional to N in the worst case. If your range

> encompasses the whole signal, R equals N. Regardless, the average

> computation per sample is bounded by O(1) even in the worst case. The

> worst-case for an individual sample is O(log2(R)).
for a real-time "running" alg doing a "running max" how can it be any different 
than O(log2(R))?
let's say that R is exactly 10,000 and it's a real-time alg running for an 
indefinite length of time (hours) at
44.1 kHz. �our interest is strictly
� y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-] } �
for all integer n. �what does this Lemire alg do for us that the binary tree 
alg (as was posted in July with about 11 lines of C code and no function calls) 
does not do?
�how is it more efficient in the worst case?
how is it more efficient in something other than the

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Evan Balster
> i can't accept that at all.

> especially for this mailing list.  we are discussing algs that run on
music and audio signals.  sample rates might be as low as 22.05 kHz and
might be as high as 192 kHz.  if it's not streaming real-time, then the
sound file might likely be 10 minutes long.  or a half hour.  it's gonna be
a pretty big N.  much, much bigger than R.
I didn't say it was practical to run the algorithm without dropping
samples.  Just that it was possible --- and what that implies provides
important insights into the algorithm's capabilities.  For instance. the
window size can be changed in real-time by popping and pushing at different
rates.

While it's true that in the world of DSP almost nobody uses algorithms
whose complexity is not linear with respect to signal length, it's still
important for purposes of demonstrating computational complexity.  This
algorithm does its job; I mention signal length simply for the purpose of
demonstrating that for sufficiently long sections of signal R ceases to
affect its computational complexity.


> it can't possibly do better than O(log2(R)) in these spikes.  if that is
the case, it is no better than the binary tree.

The worst-case complexity for a single sample is logarithmic, like the
binary tree.  But the worst case over a sequence of samples becomes linear,
provided the sequence is longer than the wedge.

If we perform a binary tree search at every sample over R
monotonically-decreasing samples, our worst-case performance over R samples
is O(R log2(R)), because the wedge's size is always R and we're performing
a binary search every sample.  This is sub-optimal and even the original
Lemire algorithm with an O(N) search can provide higher efficiency.  Please,
read his paper. 

If we use Ethan Fenn's search, the worst-case performance over any R
samples is O(R).  *Beyond a size of R samples, worst-case complexity for
processing a block of size N is O(N)*, or constant per-sample (which is why
I keep mentioning N).  I've explained the reasoning here to the best of my
ability; if you doubt it, instrument the code to count comparator calls and
see for yourself.

– Evan Balster
creator of imitone 

On Sat, Sep 3, 2016 at 1:45 PM, robert bristow-johnson <
r...@audioimagination.com> wrote:

>
>
> okay, so it appears, all of this time, that i have mixed up the roles of N
> and R.
>
>
>
>  Original Message 
> Subject: Re: [music-dsp] efficient running max algorithm
> From: "Evan Balster" 
> Date: Sat, September 3, 2016 1:40 pm
> To: "robert bristow-johnson" 
> music-dsp@music.columbia.edu
> --
>
> > Samples do not need to be processed in blocks. You can push and pop them
> > one by one with no change in cost. R would be the parameter you're
> > interested in---the length of the window---while N is the total number of
> > samples processed.
>
> so for a streaming alg, like a limiter/compressor that runs at 96 kHz and
> has been running for hours, what would N be?
>
> like N = infinity?
>
>
> > I describe N separate from R for various reasons. One of these is
> > that you *don't need*
> > to drop samples from the front of the wedge to use the algorithm.
>
> if N = infinity, that will require a lotta memory.
>
> and in a running max, the maximum is only a function of x[n] to x[n-R+1].
>  how is it that you don't forget about x[n-R] and x[n-R-1] and x[n-R-2]?
>
>
> > Provided sufficient storage, you can compute min/max information for a
> > signal of arbitrary length without taking more than O(N) total time. (The
> > other reason is that it isn't conventional to think in terms of infinity
> > when describing algorithm complexity.)
>
> i can't accept that at all.
>
> especially for this mailing list.  we are discussing algs that run on
> music and audio signals.  sample rates might be as low as 22.05 kHz and
> might be as high as 192 kHz.  if it's not streaming real-time, then the
> sound file might likely be 10 minutes long.  or a half hour.  it's gonna be
> a pretty big N.  much, much bigger than R.
>
> a **running** max is not the same idea as the max of an entire file (like
> for normalization purposes).  a **running** max is about what's been
> happening for precisely the most current R samples, x[n] to x[n-R+1].
>  x[n-R] has fallen offa the edge of the delay line and we don't care about
> it.  in fact, it must not contribute to the output y[n].  it *must* be
> forgotten about.  now, we get to have the history of what has been
> happening in the past, then for each and every new sample, x[n], that comes
> in, what is the most efficient way to find the max value x[n-n0] for the
> most current R samples.
>
> "N" doesn't come into consideration.  it has nothing to do with the
> algorithm, *unless* perhaps, there is some efficiency that can be realized
> if processing samples in blocks o

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread robert bristow-johnson







 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Evan Balster" 

Date: Sat, September 3, 2016 3:24 pm

To: "robert bristow-johnson" 

music-dsp@music.columbia.edu

--



>> i can't accept that at all.

>

>> especially for this mailing list. we are discussing algs that run on

>> music and audio signals. sample rates might be as low as 22.05 kHz and

>> might be as high as 192 kHz. if it's not streaming real-time, then the

>> sound file might likely be 10 minutes long. or a half hour. it's gonna be

>> a pretty big N. much, much bigger than R.


> I didn't say it was practical to run the algorithm without dropping

> samples. Just that it was possible --- and what that implies provides

> important insights into the algorithm's capabilities. For instance. the

> window size can be changed in real-time by popping and pushing at different

> rates.

>

> While it's true that in the world of DSP almost nobody uses algorithms

> whose complexity is not linear with respect to signal length, it's still

> important for purposes of demonstrating computational complexity. This

> algorithm does its job; I mention signal length simply for the purpose of

> demonstrating that for sufficiently long sections of signal R ceases to

> affect its computational complexity.

>

>

>> it can't possibly do better than O(log2(R)) in these spikes. if that is

> the case, it is no better than the binary tree.

>

> The worst-case complexity for a single sample is logarithmic, like the

> binary tree. But the worst case over a sequence of samples becomes linear,

> provided the sequence is longer than the wedge.

>
again, can you bring this home to some music-dsp coder who wants a running max 
in order to do a compressor/limiter or some kinda envelope follower?
what does it mean for a "sequence of samples"? �do you mean a block of samples 
that is much smaller than R samples and
that exist in the context of a stream of samples or in the context of a large 
sound file? �this running max must extend over R samples, even if the block 
length (i will call "B") is much smaller. �are you saying that the worst case 
complexity is better than O(B log2(R))? �if
it is not better, i fail to see what the point is.

> If we perform a binary tree search at every sample over R

> monotonically-decreasing samples,
this implies to me a *very* low-frequency content. �i remember someone here 
telling us it didn't matter if it were low or high-frequency content.
> our worst-case performance over R samples

> is O(R log2(R)), because the wedge's size is always R and we're performing

> a binary search every sample. This is sub-optimal and even the original

> Lemire algorithm with an O(N) search can provide higher efficiency. Please,

> read his paper. 
is "Algorithm 1" on page 5 a sufficient description? �again, for scholarly 
purposes, he really needs to portray the algorithm in terms other than a 
specific language (and what is worse, is in terms of specific library
utility of a specific language). �the pseudeo-code must make references to what 
pseudo-code has always done, otherwise his audience will be particularly 
parochial.

> If we use Ethan Fenn's search, the worst-case performance over any R

> samples is O(R). *Beyond a size of R samples, worst-case complexity for

> processing a block of size N is O(N)*, or constant per-sample (which is why

> I keep mentioning N).
not a function of R? �so let's say that N=10^9 and R=10^4? �is the complexity 
no different than if N=10^9 and R=10^3? �is not the cost of processing those N 
samples dependent on R also?
�
if R >> N, and you process a block of N
samples and then process another (adjacent) block of N samples and then process 
another (adjacent) block of N samples, what is the complexity *per* *sample*?? 
�is it not proportional to log2(R)? �and does it get reduced (on a *per* 
*sample* basis) as N increases? �if the answer is
"yes" and the running max transcends the blocks of N samples (that is the 
output depends only on R), *then* this is useful. �but if not, it is not useful 
for audio processing.
�
> I've explained the reasoning here to the best of my
> ability; if you doubt it, instrument the code to count comparator calls and

> see for yourself.
if i were to test this out myself, i need to understand it enough to write C 
code (or asm code) to do it.
so far, i am still unimpressed with the Lemire thing. �i don't, so far, see the 
improvement over the binary tree, which is O(log2(R)) per sample for worst
case and a stream or sound file of infinite size.



--
r b-j � � � � � � � � � � �r...@audioimagination.com
"Imagination is more important than knowledge."
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.c

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 4/09/2016 2:49 AM, robert bristow-johnson wrote:

sorry to have to get to the basics, but there are no *two* length
parameters to this alg.  there is only one.

   define the streaming real-time input sequence as x[n].  the length of
this signal is infinite.

   output of running max alg is y[n].  the length of this signal is
infinite.

which is it?:

   y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }

or

   y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }



i've been under the impression it's the first one. using "N".  earlier i
had been under the impression that you're processing R samples at a
time, like processing samples in blocks of R samples. now i have no idea.


I agree that Evan's setup is unusual and not what you'd use for 
streaming real-time processing.


For what it's worth, in my descriptions of my code:

y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-windowSize+1] }

The history may contain a maximum of windowSize elements (or 
windowSize-1 depending on the implementation details).


I don't think I mentioned processing samples in blocks, but I don't 
think we can usefully analyse the practical complexity of this algorithm 
without discussing block-wise processing.


A worst-case windowSize trimming event (LIFO end of queue) can only 
possibly happen every windowSize samples(*). This reduces the worst-case 
complexity for most samples in a block. Hence block-based processing 
will always (not just on average) yield amortization benefits. If the 
block size is larger than the windowSize, the original algorithm will 
run in O(1) worst-case time per sample, otherwise it will be O(k) where 
k is some function of windowSize and blockSize that I am not prepared to 
calculate before coffee.


(*) Because once windowSize samples have been trimmed, it will take 
another windowSize input samples before the queue is at maximum capacity 
again (and this, only in the decreasing ramp case).


Ross.







___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 4/09/2016 6:27 AM, robert bristow-johnson wrote:

if i were to test this out myself, i need to understand it enough to
write C code (or asm code) to do it.


The paper provides all of the information that you need for the basic 
implementation (which I recommend to start with). If there is something 
in the paper that is unclear, we can try to explain it.


Use a queue of (sample, index) pairs. The input (LIFO) end of the queue 
is where you do trimming based on new input samples. The output (FIFO) 
end of the queue is where you read off the max value and trim values 
that are older than the window size.




so far, i am still unimpressed with the Lemire thing.  i don't, so far,
see the improvement over the binary tree, which is O(log2(R)) per sample
for worst case and a stream or sound file of infinite size.


If your processing block size is 64, and your max window size is 64, the 
original Lemire algorithm is worst-case O(1) per sample. There is a 
constant time component associated with the FIFO process, and a maximum 
of two worst-case O(windowSize) trimming events per processing buffer. 
Since blockSize == windowSize, these amortize to give _guaranteed_ 
worst-case O(1) per block. If the block size is larger, the constant 
term goes down slightly. If the block size is smaller, then the constant 
term goes up.


For sample-by sample processing, the naive algorithm is worst-case 
O(windowSize) per sample. For Ethan's version, the worst case is 
O(log2(windowSize)). I think the benefits for sample-by-sample 
processing are unclear.


Ross.

___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread robert bristow-johnson



�




 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Ross Bencina" 

Date: Sat, September 3, 2016 10:16 pm

To: r...@audioimagination.com

music-dsp@music.columbia.edu

--



> On 4/09/2016 2:49 AM, robert bristow-johnson wrote:

>> sorry to have to get to the basics, but there are no *two* length

>> parameters to this alg. there is only one.

>>

>> define the streaming real-time input sequence as x[n]. the length of

>> this signal is infinite.

>>

>> output of running max alg is y[n]. the length of this signal is

>> infinite.

>>

>> which is it?:

>>

>> y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }

>>

>> or

>>

>> y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }

>

>> i've been under the impression it's the first one. using "N". earlier i

>> had been under the impression that you're processing R samples at a

>> time, like processing samples in blocks of R samples. now i have no idea.
okay, from Evan's earlier response it's R. �"R" is the "windowSize" below and 
is "L" or "window_width" in this stack exchange (where i also posted the binary 
tree
code i am using as a reference)�http://dsp.stackexchange.com/a/32252/4346 . 
�"findMaxSample()" processes one sample (accepts one sample of input and 
returns one sample of the running max output).

>

> I agree that Evan's setup is unusual and not what you'd use for

> streaming real-time processing.

>

> For what it's worth, in my descriptions of my code:

>

> y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-windowSize+1] }

>

> The history may contain a maximum of windowSize elements (or

> windowSize-1 depending on the implementation details).

>

> I don't think I mentioned processing samples in blocks, but I don't

> think we can usefully analyse the practical complexity of this algorithm

> without discussing block-wise processing.

>
well, from a sample-by-sample POV, i couldn't understand how it could get 
better than O(log2(windowSize)) per sample. �i am still open-minded about 
processing samples in blocks. �but i still don't see the trick that gets you 
better than�O(log2(windowSize)) per sample.
�unless there are significantly fewer peaks or maxima than windowSize. �then i 
can sorta see how you can keep track of the peaks and ride the slopes on both 
the left and right, and when the maximum peak falls off the edge, you will have 
to do a binary tree search (which is
O(log2(numberPeaks)) on the record of peaks. �it's still a pain in the ass how 
to deal with a varying number of peaks, and of course, it's fatal if the number 
of peaks exceeds the space you have allocated for it. �(i know that is not a 
problem for the STL coder, just keep pushing them
peaks back.) �and numberPeaks is proportional to windowSize (and the highest 
expected frequency) so the computational burden is data dependent.

> A worst-case windowSize trimming event (LIFO end of queue) can only

> possibly happen every windowSize samples(*). This reduces the worst-case

> complexity for most samples in a block. Hence block-based processing

> will always (not just on average) yield amortization benefits.
again, while this is what i wanna be open-minded about, i am not so sure. 
�couldn't there be a binary tree search more often than once per block? �that 
will change how much gets amortized.
> If
the�block size is larger than the windowSize,
let's say it's not. �i think of block sizes as, say, 32 or 64 samples tops. 
�windowSize can be, say, 4K or 8K or 16K.
> the original algorithm will
> run in O(1) worst-case time per sample, otherwise it will be O(k) where

> k is some function of windowSize and blockSize that I am not prepared to

> calculate before coffee.
well, this is where i think the rubber meets the road. �i think the worst case 
ain't gonna be too much better than O(log2(windowSize)) per sample even with 
amortization over the block.
�
r
b-j
�
�
 Original Message 
Subject: Re: [music-dsp] efficient running max algorithm
From: "Ross Bencina" 

Date: Sat, September 3, 2016 10:30 pm

To: r...@audioimagination.com

music-dsp@music.columbia.edu

--



> On 4/09/2016 6:27 AM, robert bristow-johnson wrote:

>> if i were to test this out myself, i need to understand it enough to

>> write C code (or asm code) to do it.

>

> The paper provides all of the information that you need for the basic

> implementation (which I recommend to start with). If there is something

> in the paper that is unclear, we can try to explain it.

>

> Use a queue of (sample, index) pairs. The input (LIFO) end of the queue

> is where you do trimming based on new input samples. The output (FIFO)

> end of the queue is where you read off the max value and trim values

> that are older than the windo

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 4/09/2016 1:42 PM, robert bristow-johnson wrote:

i think the worst case ain't gonna be too much better than
O(log2(windowSize)) per sample even with amortization over the block.


You can think that if you like, but I don't think the worst case is that 
bad. I have given examples. If you would like to provide a 
counter-example, feel free. But first you probably need to understand 
the algorithm well enough to implement it.


I'm exiting this discussion now. Talking in vague terms isn't getting us 
anywhere. Much as I would like to, I don't have time to write a new 
implementation and a paper to convince you.


Sorry,

Ross.
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp