Re: [music-dsp] efficient running max algorithm

2016-09-04 Thread Ethan Fenn
Wow, my very own hyphenated algorithm. :o)

Thanks for sharing this, Evan, it's impressive how concise the code ends up
being.

I'd echo the general feeling (which you also express in your Future Work
section) that for DSP work it really wants to be implemented using a ring
buffer. Neither std::vector nor std::deque is very good for the purpose...
vector because popping an element off the front is going to be slow, and
deque because it's probably going to be allocating and deallocating memory
all the time for no good reason.

Sort of an aside, but I think an STL-style ring buffer, possibly using
positive indices to look up things from the front and negative to look from
the back, would be awesome. Maybe I'll try putting one together that's
compatible with the code you've written, if I manage to find some
time. That would be a useful thing to have in general.

Robert:

well, this is where i think the rubber meets the road.  i think the worst
> case ain't gonna be too much better than O(log2(windowSize)) per sample
> even with amortization over the block.


It's a good question, but I'm not sure what kind of answer you're hoping to
get. I think the Lemire algorithm is not just theoretically interesting but
practical, for two reasons: 1) algorithms that store data in sorted arrays
rather than trees tend to be really fast, because they're cache coherent
and the CPU doesn't have to spend cycles following pointers around; and 2)
one of the basic insights behind the Lemire algorithm is that with a wedge
structure, half of the time all you have to do is push a sample on the end
of your array and carry on; this is obviously preferable to having to
update some kind of tree structure for every sample coming in.

That said, it's really not possible to answer the question about what
algorithm will have a better worst case per block on given hardware, with a
given window size and block size, without trying it out. You can try it and
see what happens, but you don't have to if you don't want to.

-Ethan




On Sun, Sep 4, 2016 at 2:42 AM, Evan Balster  wrote:

> I think the fact that binary searches play into both the Lemire-Fenn
> algorithm and the binary-search algorithm is causing some confusion.
> Remember that the original Lemire algorithm, with its simple linear search
> and worst-case O(windowSize) per sample time, *still* manages O(1) per
> sample over any block of size larger than windowSize, and will thus
> outperform the binary search algorithm in terms of total complexity.
>
> Understanding why that is the case --- IE, why the O(windowSize) wedge
> search can outperform the O(log2(windowSize)) heap search --- is strictly
> prerequisite to understanding why the Lemire-Fenn algorithm is more
> scalable than the binary search.
>
> It boils down to the fact that the actual number of comparisons performed
> in the wedge search is proportional to how many items are being removed
> from the wedge --- and each of those is less work we'll need to do next
> time.
>
> I believe the worst case for processing N samples with windowSize R is
> roughly O(N+R) for the original Lemire algorithm---because in the worst
> case we'll have to prune that many samples.  As N approaches infinity,
> (N+R)/N approaches our amortized complexity: 1.
>
>
> The trick of the Lemire-Fenn algorithm is that it behaves as the Lemire
> algorithm, but caps the per-sample complexity by switching over to binary
> search just as the linear search surpasses O(log2(windowSize)) steps.  This
> change reduces the complexity class for sample-by-sample processing without
> increasing the complexity in any case.
>
> – Evan Balster
> creator of imitone 
>
> On Sat, Sep 3, 2016 at 11:05 PM, Ross Bencina 
> wrote:
>
>> On 4/09/2016 1:42 PM, robert bristow-johnson wrote:
>>
>>> i think the worst case ain't gonna be too much better than
>>> O(log2(windowSize)) per sample even with amortization over the block.
>>>
>>
>> You can think that if you like, but I don't think the worst case is that
>> bad. I have given examples. If you would like to provide a counter-example,
>> feel free. But first you probably need to understand the algorithm well
>> enough to implement it.
>>
>> I'm exiting this discussion now. Talking in vague terms isn't getting us
>> anywhere. Much as I would like to, I don't have time to write a new
>> implementation and a paper to convince you.
>>
>> Sorry,
>>
>>
>> Ross.
>> ___
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>>
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 4/09/2016 1:42 PM, robert bristow-johnson wrote:

i think the worst case ain't gonna be too much better than
O(log2(windowSize)) per sample even with amortization over the block.


You can think that if you like, but I don't think the worst case is that 
bad. I have given examples. If you would like to provide a 
counter-example, feel free. But first you probably need to understand 
the algorithm well enough to implement it.


I'm exiting this discussion now. Talking in vague terms isn't getting us 
anywhere. Much as I would like to, I don't have time to write a new 
implementation and a paper to convince you.


Sorry,

Ross.
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread robert bristow-johnson



�




 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Ross Bencina" <rossb-li...@audiomulch.com>

Date: Sat, September 3, 2016 10:16 pm

To: r...@audioimagination.com

music-dsp@music.columbia.edu

--



> On 4/09/2016 2:49 AM, robert bristow-johnson wrote:

>> sorry to have to get to the basics, but there are no *two* length

>> parameters to this alg. there is only one.

>>

>> define the streaming real-time input sequence as x[n]. the length of

>> this signal is infinite.

>>

>> output of running max alg is y[n]. the length of this signal is

>> infinite.

>>

>> which is it?:

>>

>> y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }

>>

>> or

>>

>> y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }

>

>> i've been under the impression it's the first one. using "N". earlier i

>> had been under the impression that you're processing R samples at a

>> time, like processing samples in blocks of R samples. now i have no idea.
okay, from Evan's earlier response it's R. �"R" is the "windowSize" below and 
is "L" or "window_width" in this stack exchange (where i also posted the binary 
tree
code i am using as a reference)�http://dsp.stackexchange.com/a/32252/4346 . 
�"findMaxSample()" processes one sample (accepts one sample of input and 
returns one sample of the running max output).

>

> I agree that Evan's setup is unusual and not what you'd use for

> streaming real-time processing.

>

> For what it's worth, in my descriptions of my code:

>

> y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-windowSize+1] }

>

> The history may contain a maximum of windowSize elements (or

> windowSize-1 depending on the implementation details).

>

> I don't think I mentioned processing samples in blocks, but I don't

> think we can usefully analyse the practical complexity of this algorithm

> without discussing block-wise processing.

>
well, from a sample-by-sample POV, i couldn't understand how it could get 
better than O(log2(windowSize)) per sample. �i am still open-minded about 
processing samples in blocks. �but i still don't see the trick that gets you 
better than�O(log2(windowSize)) per sample.
�unless there are significantly fewer peaks or maxima than windowSize. �then i 
can sorta see how you can keep track of the peaks and ride the slopes on both 
the left and right, and when the maximum peak falls off the edge, you will have 
to do a binary tree search (which is
O(log2(numberPeaks)) on the record of peaks. �it's still a pain in the ass how 
to deal with a varying number of peaks, and of course, it's fatal if the number 
of peaks exceeds the space you have allocated for it. �(i know that is not a 
problem for the STL coder, just keep pushing them
peaks back.) �and numberPeaks is proportional to windowSize (and the highest 
expected frequency) so the computational burden is data dependent.

> A worst-case windowSize trimming event (LIFO end of queue) can only

> possibly happen every windowSize samples(*). This reduces the worst-case

> complexity for most samples in a block. Hence block-based processing

> will always (not just on average) yield amortization benefits.
again, while this is what i wanna be open-minded about, i am not so sure. 
�couldn't there be a binary tree search more often than once per block? �that 
will change how much gets amortized.
> If
the�block size is larger than the windowSize,
let's say it's not. �i think of block sizes as, say, 32 or 64 samples tops. 
�windowSize can be, say, 4K or 8K or 16K.
> the original algorithm will
> run in O(1) worst-case time per sample, otherwise it will be O(k) where

> k is some function of windowSize and blockSize that I am not prepared to

> calculate before coffee.
well, this is where i think the rubber meets the road. �i think the worst case 
ain't gonna be too much better than O(log2(windowSize)) per sample even with 
amortization over the block.
�
r
b-j
�
�
 Original Message 
Subject: Re: [music-dsp] efficient running max algorithm
From: "Ross Bencina" <rossb-li...@audiomulch.com>

Date: Sat, September 3, 2016 10:30 pm

To: r...@audioimagination.com

music-dsp@music.columbia.edu

--



> On 4/09/2016 6:27 AM, robert bristow-johnson wrote:

>> if i were to test this out myself, i need to understand it enough to

>> write C code (or asm code) to do it.

>

> The paper provides all of the information that you need for the basic

> implementation (

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 4/09/2016 6:27 AM, robert bristow-johnson wrote:

if i were to test this out myself, i need to understand it enough to
write C code (or asm code) to do it.


The paper provides all of the information that you need for the basic 
implementation (which I recommend to start with). If there is something 
in the paper that is unclear, we can try to explain it.


Use a queue of (sample, index) pairs. The input (LIFO) end of the queue 
is where you do trimming based on new input samples. The output (FIFO) 
end of the queue is where you read off the max value and trim values 
that are older than the window size.




so far, i am still unimpressed with the Lemire thing.  i don't, so far,
see the improvement over the binary tree, which is O(log2(R)) per sample
for worst case and a stream or sound file of infinite size.


If your processing block size is 64, and your max window size is 64, the 
original Lemire algorithm is worst-case O(1) per sample. There is a 
constant time component associated with the FIFO process, and a maximum 
of two worst-case O(windowSize) trimming events per processing buffer. 
Since blockSize == windowSize, these amortize to give _guaranteed_ 
worst-case O(1) per block. If the block size is larger, the constant 
term goes down slightly. If the block size is smaller, then the constant 
term goes up.


For sample-by sample processing, the naive algorithm is worst-case 
O(windowSize) per sample. For Ethan's version, the worst case is 
O(log2(windowSize)). I think the benefits for sample-by-sample 
processing are unclear.


Ross.

___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 4/09/2016 2:49 AM, robert bristow-johnson wrote:

sorry to have to get to the basics, but there are no *two* length
parameters to this alg.  there is only one.

   define the streaming real-time input sequence as x[n].  the length of
this signal is infinite.

   output of running max alg is y[n].  the length of this signal is
infinite.

which is it?:

   y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }

or

   y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }



i've been under the impression it's the first one. using "N".  earlier i
had been under the impression that you're processing R samples at a
time, like processing samples in blocks of R samples. now i have no idea.


I agree that Evan's setup is unusual and not what you'd use for 
streaming real-time processing.


For what it's worth, in my descriptions of my code:

y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-windowSize+1] }

The history may contain a maximum of windowSize elements (or 
windowSize-1 depending on the implementation details).


I don't think I mentioned processing samples in blocks, but I don't 
think we can usefully analyse the practical complexity of this algorithm 
without discussing block-wise processing.


A worst-case windowSize trimming event (LIFO end of queue) can only 
possibly happen every windowSize samples(*). This reduces the worst-case 
complexity for most samples in a block. Hence block-based processing 
will always (not just on average) yield amortization benefits. If the 
block size is larger than the windowSize, the original algorithm will 
run in O(1) worst-case time per sample, otherwise it will be O(k) where 
k is some function of windowSize and blockSize that I am not prepared to 
calculate before coffee.


(*) Because once windowSize samples have been trimmed, it will take 
another windowSize input samples before the queue is at maximum capacity 
again (and this, only in the decreasing ramp case).


Ross.







___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread robert bristow-johnson







 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Evan Balster" <e...@imitone.com>

Date: Sat, September 3, 2016 3:24 pm

To: "robert bristow-johnson" <r...@audioimagination.com>

music-dsp@music.columbia.edu

--



>> i can't accept that at all.

>

>> especially for this mailing list. we are discussing algs that run on

>> music and audio signals. sample rates might be as low as 22.05 kHz and

>> might be as high as 192 kHz. if it's not streaming real-time, then the

>> sound file might likely be 10 minutes long. or a half hour. it's gonna be

>> a pretty big N. much, much bigger than R.


> I didn't say it was practical to run the algorithm without dropping

> samples. Just that it was possible --- and what that implies provides

> important insights into the algorithm's capabilities. For instance. the

> window size can be changed in real-time by popping and pushing at different

> rates.

>

> While it's true that in the world of DSP almost nobody uses algorithms

> whose complexity is not linear with respect to signal length, it's still

> important for purposes of demonstrating computational complexity. This

> algorithm does its job; I mention signal length simply for the purpose of

> demonstrating that for sufficiently long sections of signal R ceases to

> affect its computational complexity.

>

>

>> it can't possibly do better than O(log2(R)) in these spikes. if that is

> the case, it is no better than the binary tree.

>

> The worst-case complexity for a single sample is logarithmic, like the

> binary tree. But the worst case over a sequence of samples becomes linear,

> provided the sequence is longer than the wedge.

>
again, can you bring this home to some music-dsp coder who wants a running max 
in order to do a compressor/limiter or some kinda envelope follower?
what does it mean for a "sequence of samples"? �do you mean a block of samples 
that is much smaller than R samples and
that exist in the context of a stream of samples or in the context of a large 
sound file? �this running max must extend over R samples, even if the block 
length (i will call "B") is much smaller. �are you saying that the worst case 
complexity is better than O(B log2(R))? �if
it is not better, i fail to see what the point is.

> If we perform a binary tree search at every sample over R

> monotonically-decreasing samples,
this implies to me a *very* low-frequency content. �i remember someone here 
telling us it didn't matter if it were low or high-frequency content.
> our worst-case performance over R samples

> is O(R log2(R)), because the wedge's size is always R and we're performing

> a binary search every sample. This is sub-optimal and even the original

> Lemire algorithm with an O(N) search can provide higher efficiency. Please,

> read his paper. <https://arxiv.org/abs/cs/0610046>
is "Algorithm 1" on page 5 a sufficient description? �again, for scholarly 
purposes, he really needs to portray the algorithm in terms other than a 
specific language (and what is worse, is in terms of specific library
utility of a specific language). �the pseudeo-code must make references to what 
pseudo-code has always done, otherwise his audience will be particularly 
parochial.

> If we use Ethan Fenn's search, the worst-case performance over any R

> samples is O(R). *Beyond a size of R samples, worst-case complexity for

> processing a block of size N is O(N)*, or constant per-sample (which is why

> I keep mentioning N).
not a function of R? �so let's say that N=10^9 and R=10^4? �is the complexity 
no different than if N=10^9 and R=10^3? �is not the cost of processing those N 
samples dependent on R also?
�
if R >> N, and you process a block of N
samples and then process another (adjacent) block of N samples and then process 
another (adjacent) block of N samples, what is the complexity *per* *sample*?? 
�is it not proportional to log2(R)? �and does it get reduced (on a *per* 
*sample* basis) as N increases? �if the answer is
"yes" and the running max transcends the blocks of N samples (that is the 
output depends only on R), *then* this is useful. �but if not, it is not useful 
for audio processing.
�
> I've explained the reasoning here to the best of my
> ability; if you doubt it, instrument the code to count comparator calls and

> see for yourself.
if i were to test this out myself, i need to understand it enough to write C 
code (or asm code) to do it.
so far, i am still unimpressed with the Lemire thing. �i don't, so far, see the 
improvement over the binary tree, which is O(log2(R)) per sample for worst
case

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Evan Balster
> i can't accept that at all.

> especially for this mailing list.  we are discussing algs that run on
music and audio signals.  sample rates might be as low as 22.05 kHz and
might be as high as 192 kHz.  if it's not streaming real-time, then the
sound file might likely be 10 minutes long.  or a half hour.  it's gonna be
a pretty big N.  much, much bigger than R.
I didn't say it was practical to run the algorithm without dropping
samples.  Just that it was possible --- and what that implies provides
important insights into the algorithm's capabilities.  For instance. the
window size can be changed in real-time by popping and pushing at different
rates.

While it's true that in the world of DSP almost nobody uses algorithms
whose complexity is not linear with respect to signal length, it's still
important for purposes of demonstrating computational complexity.  This
algorithm does its job; I mention signal length simply for the purpose of
demonstrating that for sufficiently long sections of signal R ceases to
affect its computational complexity.


> it can't possibly do better than O(log2(R)) in these spikes.  if that is
the case, it is no better than the binary tree.

The worst-case complexity for a single sample is logarithmic, like the
binary tree.  But the worst case over a sequence of samples becomes linear,
provided the sequence is longer than the wedge.

If we perform a binary tree search at every sample over R
monotonically-decreasing samples, our worst-case performance over R samples
is O(R log2(R)), because the wedge's size is always R and we're performing
a binary search every sample.  This is sub-optimal and even the original
Lemire algorithm with an O(N) search can provide higher efficiency.  Please,
read his paper. <https://arxiv.org/abs/cs/0610046>

If we use Ethan Fenn's search, the worst-case performance over any R
samples is O(R).  *Beyond a size of R samples, worst-case complexity for
processing a block of size N is O(N)*, or constant per-sample (which is why
I keep mentioning N).  I've explained the reasoning here to the best of my
ability; if you doubt it, instrument the code to count comparator calls and
see for yourself.

– Evan Balster
creator of imitone <http://imitone.com>

On Sat, Sep 3, 2016 at 1:45 PM, robert bristow-johnson <
r...@audioimagination.com> wrote:

>
>
> okay, so it appears, all of this time, that i have mixed up the roles of N
> and R.
>
>
>
>  Original Message ----------------
> Subject: Re: [music-dsp] efficient running max algorithm
> From: "Evan Balster" <e...@imitone.com>
> Date: Sat, September 3, 2016 1:40 pm
> To: "robert bristow-johnson" <r...@audioimagination.com>
> music-dsp@music.columbia.edu
> --
>
> > Samples do not need to be processed in blocks. You can push and pop them
> > one by one with no change in cost. R would be the parameter you're
> > interested in---the length of the window---while N is the total number of
> > samples processed.
>
> so for a streaming alg, like a limiter/compressor that runs at 96 kHz and
> has been running for hours, what would N be?
>
> like N = infinity?
>
>
> > I describe N separate from R for various reasons. One of these is
> > that you *don't need*
> > to drop samples from the front of the wedge to use the algorithm.
>
> if N = infinity, that will require a lotta memory.
>
> and in a running max, the maximum is only a function of x[n] to x[n-R+1].
>  how is it that you don't forget about x[n-R] and x[n-R-1] and x[n-R-2]?
>
>
> > Provided sufficient storage, you can compute min/max information for a
> > signal of arbitrary length without taking more than O(N) total time. (The
> > other reason is that it isn't conventional to think in terms of infinity
> > when describing algorithm complexity.)
>
> i can't accept that at all.
>
> especially for this mailing list.  we are discussing algs that run on
> music and audio signals.  sample rates might be as low as 22.05 kHz and
> might be as high as 192 kHz.  if it's not streaming real-time, then the
> sound file might likely be 10 minutes long.  or a half hour.  it's gonna be
> a pretty big N.  much, much bigger than R.
>
> a **running** max is not the same idea as the max of an entire file (like
> for normalization purposes).  a **running** max is about what's been
> happening for precisely the most current R samples, x[n] to x[n-R+1].
>  x[n-R] has fallen offa the edge of the delay line and we don't care about
> it.  in fact, it must not contribute to the output y[n].  it *must* be
> forgotten about.  now, we get to have the history of what has been
> happening in the past, then for each and every new sa

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Evan Balster
Samples do not need to be processed in blocks.  You can push and pop them
one by one with no change in cost.  R would be the parameter you're
interested in---the length of the window---while N is the total number of
samples processed.

I describe N separate from R for various reasons.  One of these is
that you *don't
need* to drop samples from the front of the wedge to use the algorithm.
Provided sufficient storage, you can compute min/max information for a
signal of arbitrary length without taking more than O(N) total time.  (The
other reason is that it isn't conventional to think in terms of infinity
when describing algorithm complexity.)

A Lemire wedge describing the range (t-R, t] has all the information you
need to find the min and/or max in any range (t-R+n, t) where 0<n=R,
the complexity will be proportional to N in the worst case.  If your range
encompasses the whole signal, R equals N.  Regardless, the average
computation per sample is bounded by O(1) even in the worst case.  The
worst-case for an individual sample is O(log2(R)).

– Evan Balster
creator of imitone <http://imitone.com>

On Sat, Sep 3, 2016 at 11:49 AM, robert bristow-johnson <
r...@audioimagination.com> wrote:

>
>
> sorry to have to get to the basics, but there are no *two* length
> parameters to this alg.  there is only one.
>
>define the streaming real-time input sequence as x[n].  the length of
> this signal is infinite.
>
>output of running max alg is y[n].  the length of this signal is
> infinite.
>
> which is it?:
>
>y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] }
>
> or
>
>y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-R+1] }
>
> i've been under the impression it's the first one. using "N".  earlier i
> had been under the impression that you're processing R samples at a time,
> like processing samples in blocks of R samples. now i have no idea.
>
>
> r b-j
>
>
> ------------ Original Message 
> Subject: Re: [music-dsp] efficient running max algorithm
> From: "Ross Bencina" <rossb-li...@audiomulch.com>
> Date: Sat, September 3, 2016 3:36 am
> To: music-dsp@music.columbia.edu
> --
>
>
> > On 3/09/2016 5:00 PM, Evan Balster wrote:
> >> Robert: R refers to range ("delay line" size, one could say) and N
> >> refers to signal length.
> >
> > In that case R, is what I've been calling windowSize. and when I say
> > O(1) I mean Evan's O(N).
> >
> > Ross.
> > ___
> > dupswapdrop: music-dsp mailing list
> > music-dsp@music.columbia.edu
> > https://lists.columbia.edu/mailman/listinfo/music-dsp
> >
> >
>
>
> --
>
>
>
>
> r b-j  r...@audioimagination.com
>
>
>
>
> "Imagination is more important than knowledge."
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Evan Balster
Ross' explanation is solid.

Robert:  R refers to range ("delay line" size, one could say) and N refers
to signal length.


Clarifying about the weird linear/binary search:  It's necessary to
maintain O(N) complexity for processing N samples.  This took me a while to
grasp:

If we do a backward linear search (as in the original Lemire algorithm) we
get that O(N) complexity but get an O(R) worst case per sample.  Lemire's
paper explains why the former is the case.

If we do a binary search, we get O(log2(R)) worst case per sample, but the
total complexity is O(N*log2(R)) in the worst case where the signal is
monotonically decreasing---because our wedge will be size R at all times.

If we do a linear search over the first log2(R) samples, though, we get the
best of both worlds:  The Lemire-Fenn algorithm.  The reasoning is that
once we've done log2(R) steps of the linear search, performing a log2(R)
binary search only multiplies the work by a constant factor.  Think of this
as "clamping" the worst-case performance of the search to log2(R) without
increasing the complexity of cheaper search-and-prune operations.

(If anyone can explain that better, I invite them to!)


Re:  Using STL for DSP, I largely agree (though I'll often do it when I can
effectively control allocations).  The GitHub code is suitable for
high-level use and as a reference implementation.

– Evan Balster
creator of imitone 

On Sat, Sep 3, 2016 at 12:42 AM, Ross Bencina 
wrote:

> On 3/09/2016 2:14 PM, robert bristow-johnson wrote:
>
>> and in discussing iterators, says nothing about push_back()
>> and the like.
>>
>
> push_back(), push_front(), pop_back(), pop_front() are methods generally
> available on queue-like containers.
>
>
> from online i can get an idea, but it seems to me to be a
>> LIFO stack, not a FIFO buffer.  so a sample value gets pushed onto this
>> thing and pop_back() pops it off, but how does the oldest sample get
>> pushed off the front?  what makes this vector a circular FIFO thingie?
>>
>
> What you're missing is that the code to drop the oldest samples is omitted
> from my example entirely, and is in a separate file in Evan's code.
>
> You're kinda right though, there's a LIFO process that deals with the
> incoming data, and old samples drop off the far end of the queue (FIFO)
> when they age beyond the window size. The running maximum is always at the
> far end of the queue (since the sequence in the queue is decreasing)
>
> In my code, /front/ is the oldest value and /back/ is the newest value.
> The queue only contains the decresing segments, so it's a discontiguous
> history -- the oldest value in the queue is not usually windowSize samples
> old, it's probably newer than that.
>
> Each queue entry is a pair (value, index). The index is some integer that
> counts input samples.
>
> During decreasing segments, (value, index) are pushed on the back of the
> queue. During increasing segments, samples are trimmed off the back. (This
> is the LIFO part)
>
> Samples are dropped off the front when index is older than the current
> input index (This is the FIFO part).
>
>
> that said, the "10 lines of code" is deceptive.  it's 10 lines of code
>> with function calls.  you gotta count the cost of the function calls.
>>
>
> I agree that it's opaque. But a modern C++ compiler will inline everything
> and optimize away most of the overhead. I know this sounds like fanboy
> talk, but the C++ optimizers are surprisingly good these days.
>
> Personally I steer clear of STL for dsp code.
>
>
> now, Ross, this is Evan's diagram (from earlier today), but maybe you
>> can help me:
>>
>>
>> [image: Inline image 3]
>> http://interactopia.com/archive/images/lemire_algorithm.png
>>
>
> I read that with time running from left to right.
>
> The newest samples are added on the right, the oldest samples are dropped
> from the left.
>
> The red segments are the portions stored in the running max history buffer.
>
>
> The algorithm can safely forget anything in grey because it has been
>>> "shadowed" by newer maximum or minimum values.
>>>
>> >
>
>> what is not shown on the diagram is what happens when the current
>> running max value expires (or falls off the edge of the delay buffer)?
>>  when the current max value falls offa the edge, what must you do to
>> find the new maximum over the past N samples?
>>
>
> Let's call the edge "front". That's the leftmost sample in Evan's picture.
> So we expire that one, it falls of the edge, it's no longer there. Notice
> how the stored (red) samples are all in a decreasing sequence? So the new
> maximum over N samples is just the new "front".
>
>
> you would have to be riding the slope down on the left side of the peak
>> that just fell offa the edge and then how do you compare that value to
>> any peaks (that are lower for the moment) that have not yet expired?
>>
>
> If they're all guaranteed lower, you don't need to compare them.
>
> It would be 

Re: [music-dsp] efficient running max algorithm

2016-09-03 Thread Ross Bencina

On 3/09/2016 2:14 PM, robert bristow-johnson wrote:

and in discussing iterators, says nothing about push_back()
and the like.


push_back(), push_front(), pop_back(), pop_front() are methods generally 
available on queue-like containers.




from online i can get an idea, but it seems to me to be a
LIFO stack, not a FIFO buffer.  so a sample value gets pushed onto this
thing and pop_back() pops it off, but how does the oldest sample get
pushed off the front?  what makes this vector a circular FIFO thingie?


What you're missing is that the code to drop the oldest samples is 
omitted from my example entirely, and is in a separate file in Evan's code.


You're kinda right though, there's a LIFO process that deals with the 
incoming data, and old samples drop off the far end of the queue (FIFO) 
when they age beyond the window size. The running maximum is always at 
the far end of the queue (since the sequence in the queue is decreasing)


In my code, /front/ is the oldest value and /back/ is the newest value. 
The queue only contains the decresing segments, so it's a discontiguous 
history -- the oldest value in the queue is not usually windowSize 
samples old, it's probably newer than that.


Each queue entry is a pair (value, index). The index is some integer 
that counts input samples.


During decreasing segments, (value, index) are pushed on the back of the 
queue. During increasing segments, samples are trimmed off the back. 
(This is the LIFO part)


Samples are dropped off the front when index is older than the current 
input index (This is the FIFO part).




that said, the "10 lines of code" is deceptive.  it's 10 lines of code
with function calls.  you gotta count the cost of the function calls.


I agree that it's opaque. But a modern C++ compiler will inline 
everything and optimize away most of the overhead. I know this sounds 
like fanboy talk, but the C++ optimizers are surprisingly good these days.


Personally I steer clear of STL for dsp code.



now, Ross, this is Evan's diagram (from earlier today), but maybe you
can help me:


[image: Inline image 3]
http://interactopia.com/archive/images/lemire_algorithm.png


I read that with time running from left to right.

The newest samples are added on the right, the oldest samples are 
dropped from the left.


The red segments are the portions stored in the running max history buffer.



The algorithm can safely forget anything in grey because it has been
"shadowed" by newer maximum or minimum values.

>

what is not shown on the diagram is what happens when the current
running max value expires (or falls off the edge of the delay buffer)?
 when the current max value falls offa the edge, what must you do to
find the new maximum over the past N samples?


Let's call the edge "front". That's the leftmost sample in Evan's 
picture. So we expire that one, it falls of the edge, it's no longer 
there. Notice how the stored (red) samples are all in a decreasing 
sequence? So the new maximum over N samples is just the new "front".




you would have to be riding the slope down on the left side of the peak
that just fell offa the edge and then how do you compare that value to
any peaks (that are lower for the moment) that have not yet expired?


If they're all guaranteed lower, you don't need to compare them.

It would be impossible for them to be higher, because the LIFO process 
at the input end of the queue ensures that all samples in the history 
form a decreasing sequence.



 they have to be other wedges, right?  with their own record of height
and location, right?  the only thing shown on that diagram is what
happens when new maxima come into the delay line on the left and the
running max value increases.  it does not show how it decreases and the
running max must decrease eventually.


If the history is non-empty, the running max decreases every time you 
pop a sample off the front. As I said above, that wouldn't necessarily 
happen at every step. For example, if a new global max arrives, the 
queue would be completely flushed, and that global max would remain the 
"front" value in the queue for windowSize samples.




i cannot see how this strategy works without keeping a record of *every*
local maxima.  both its value and its location.


It keeps a record of a decreasing monotonic sequence of inputs, 
including their locations.




that record would also
be FIFO and the size of that record would be large if there were a lotta
peaks (like a high frequency signal), but each entry would be
chronologically placed.


Correct.



any local maximum between the current running max and the "runner up"
(which is more current) can be ignored.  perhaps this means that you
only need to note where the current running max is (both location and
value) and the runner-up that is more current.  when the current max
falls offa the edge on the right, you have to ride the left slope of
that peak (which is at the maximum delay location) down until it's the
value 

Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread robert bristow-johnson



 Original Message 
Subject: Re: [music-dsp] efficient running max algorithm
From: "Ross Bencina" <rossb-li...@audiomulch.com>
Date: Fri, September 2, 2016 10:19 pm
To: music-dsp@music.columbia.edu
--

> On 3/09/2016 3:12 AM, Evan Balster wrote:
>> Just a few clarifications:
>>
>> - Local maxima and first difference don't really matter. The maximum
>> wedge describes global maxima for all intervals [t-x, t], where 
>> x=[R-1..0].can you clarify N vs. R? �is N the length of the delay line and 
>> for each new input
sample we want the output to be the max value for the most current N input 
samples?� �y[n] = max{ x[n], x[n-1], x[n-2], ... x[n-N+1] } ?and you are 
processing R samples at a time? �what difference does R make? �it is not a 
specification of the running max delay buffer.>�Your STL version makes 
the�algorithm clearer.this, i think, spells out the difference in perspectives 
between C and C++ coders.my (c) 1986 Stroustrup book doesn't really say a thing 
about STL and Vectors. �and in discussing iterators, says nothing about 
push_back() and the like. �from online i can get an idea,
but it seems to me to be a LIFO stack, not a FIFO buffer. �so a sample value 
gets pushed onto this thing and pop_back() pops it off, but how does the oldest 
sample get pushed off the front? �what makes this vector a circular FIFO 
thingie?C++ sucks. �and for someone not competent in C++ STL, the code is 
opaque.that said, the
"10 lines of code" is deceptive. �it's 10 lines of code with function calls. 
�you gotta count the cost of the function calls. �again, the binary tree code 
in C (i posted July 18) has *no* function calls (except for a single malloc() 
in the initialization) and has about 11
lines of code, not counting blank lines or lines with just a curly brace. �no 
calls to any method that has who knows what for an overhead.now, Ross, this is 
Evan's
diagram (from earlier today), but maybe you can help me:

[image: Inline image 3]

http://interactopia.com/archive/images/lemire_algorithm.png



> The algorithm can safely forget anything in grey because it has been

> "shadowed" by newer maximum or minimum values.

�what is not shown on the diagram is what happens when the current running max 
value expires (or falls off the edge of the delay buffer)? �when the current max
value falls offa the edge, what must you do to find the new maximum over the 
past N samples?you would have to be riding the slope down on the left side of 
the peak that
just fell offa the edge and then how do you compare that value to any peaks 
(that are lower for the moment) that have not yet expired? �they have to be 
other wedges, right? �with their own record of height and location, right? �the 
only thing shown on that diagram is what happens when
new maxima come into the delay line on the left and the running max value 
increases. �it does not show how it decreases and the running max must decrease 
eventually.i
cannot see how this strategy works without keeping a record of *every* local 
maxima. �both its value and its location. �that record would also be FIFO and 
the size of that record would be large if there were a lotta peaks (like a high 
frequency signal), but each entry would be
chronologically placed.any local maximum between the current running max and 
the "runner up" (which is more current) can be ignored. �perhaps this means
that you only need to note where the current running max is (both location and 
value) and the runner-up that is more current. �when the current max falls offa 
the edge on the right, you have to ride the left slope of that peak (which is 
at the maximum delay location) down until it's the value
of the runner up and then all samples between the new running max and the end 
of the delay buffer can be ignored. �but here is the question, when the runner 
up is promoted to the running max, how is the *new* runner up found? �that 
could be anywhere between the buffer input (on the left
side) and the new running max. �if you are not already performing a binary tree 
on that , i cannot see how finding the new runner up is cheap, unless you have 
a record of every local max between the buffer input (on the left side) and the 
new running max. �and each time a new local max is
detected, you have to do the O(log2(something)) binary tree thing.
--
r b-j � � � � � � � � � � �r...@audioimagination.com
"Imagination is more important than knowledge."___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread Ross Bencina

On 3/09/2016 3:12 AM, Evan Balster wrote:

Just a few clarifications:

- Local maxima and first difference don't really matter.  The maximum
wedge describes global maxima for all intervals [t-x, t], where x=[R-1..0].


I think it's interesting to look at the algorithm from different 
perspectives. It's clear from your diagram that the backwards trimming 
happens when the signal is at a new local maximum (value is increasing)


As for first difference: it's a shorthand for "increasing" or 
"decreasing". My implementation compares to the previous sample to 
switch between two different behaviors on the input:


if (current > previous_) { // increasing
 (trim loop goes here)
 if (U_.empty())
  return current
 else
  return U_.front().value
} else { // decreasing
 U_.push_back({current, index})
 // (*)
 return U_.front().value // front being oldest sample
}

previous_ = current

I also add the the test to maintain the window size into this loop, 
which saves a couple of comparisons since e.g. at (*) we know that the

history buffer is non-empty.

Anyhow, just another perspective. I prefer my way for implementation 
because it makes the branches explicit. Your STL version makes the 
algorithm clearer.




- If two equal samples are encountered, the algorithm can forget about
the older of them.


I guess so. So the first comparison above should be >=.

Ross.
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread Evan Balster
For one wedge, the worst case is a monotonic signal, which will saturate
the wedge with R samples.

For two wedges, the worst case is the situation where the absolute distance
from previous samples to the latest sample is monotonically decreasing.  In
this case the wedges will contain R+1 samples between them (because the
latest sample always appears in both).

– Evan Balster
creator of imitone 

On Fri, Sep 2, 2016 at 12:36 PM, Ethan Duni  wrote:

> Right aren't monotonic signals the worst case here? Or maybe not, since
> they're worst for one wedge, but best for the other?
>
> Ethan D
>
> On Fri, Sep 2, 2016 at 10:12 AM, Evan Balster  wrote:
>
>> Just a few clarifications:
>>
>> - Local maxima and first difference don't really matter.  The maximum
>> wedge describes global maxima for all intervals [t-x, t], where x=[R-1..0].
>>
>> - If two equal samples are encountered, the algorithm can forget about
>> the older of them.
>>
>> It has been very helpful for my purposes to visualize the algorithm thus:
>>  Imagine drawing lines rightward from all points on the signal.  Those
>> which extend to the present time without intersecting other parts of the
>> signal form the minimum and maximum wedge.  The newest sample, and only the
>> newest sample, exists in both wedges.
>>
>> [image: Inline image 3]
>> http://interactopia.com/archive/images/lemire_algorithm.png
>>
>> The algorithm can safely forget anything in grey because it has been
>> "shadowed" by newer maximum or minimum values.
>>
>> – Evan Balster
>> creator of imitone 
>>
>> On Fri, Sep 2, 2016 at 1:50 AM, Ross Bencina 
>> wrote:
>>
>>> On 2/09/2016 4:37 PM, Ross Bencina wrote:
>>>
 When the first difference is positive, the history is trimmed. This is
 the only time any kind of O(N) or O(log2(N)) operation is performed.
 First difference positive implies that a new local maximum is achieved:
 in this case, all of the most recent history samples that are less than
 the new maximum are trimmed.

>>>
>>> Correction:
>>>
>>> [The new local maximum dominates all
 history up to the most recent *history sample* that exceeds it, which is
 retained.]

>>>
>>> I had said "most recent local maximum" but the history contains a
>>> monotonic non-increasing sequence.
>>>
>>> Ross.
>>>
>>> ___
>>> dupswapdrop: music-dsp mailing list
>>> music-dsp@music.columbia.edu
>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>
>>>
>>
>> ___
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread Ethan Duni
Right aren't monotonic signals the worst case here? Or maybe not, since
they're worst for one wedge, but best for the other?

Ethan D

On Fri, Sep 2, 2016 at 10:12 AM, Evan Balster  wrote:

> Just a few clarifications:
>
> - Local maxima and first difference don't really matter.  The maximum
> wedge describes global maxima for all intervals [t-x, t], where x=[R-1..0].
>
> - If two equal samples are encountered, the algorithm can forget about the
> older of them.
>
> It has been very helpful for my purposes to visualize the algorithm thus:
>  Imagine drawing lines rightward from all points on the signal.  Those
> which extend to the present time without intersecting other parts of the
> signal form the minimum and maximum wedge.  The newest sample, and only the
> newest sample, exists in both wedges.
>
> [image: Inline image 3]
> http://interactopia.com/archive/images/lemire_algorithm.png
>
> The algorithm can safely forget anything in grey because it has been
> "shadowed" by newer maximum or minimum values.
>
> – Evan Balster
> creator of imitone 
>
> On Fri, Sep 2, 2016 at 1:50 AM, Ross Bencina 
> wrote:
>
>> On 2/09/2016 4:37 PM, Ross Bencina wrote:
>>
>>> When the first difference is positive, the history is trimmed. This is
>>> the only time any kind of O(N) or O(log2(N)) operation is performed.
>>> First difference positive implies that a new local maximum is achieved:
>>> in this case, all of the most recent history samples that are less than
>>> the new maximum are trimmed.
>>>
>>
>> Correction:
>>
>> [The new local maximum dominates all
>>> history up to the most recent *history sample* that exceeds it, which is
>>> retained.]
>>>
>>
>> I had said "most recent local maximum" but the history contains a
>> monotonic non-increasing sequence.
>>
>> Ross.
>>
>> ___
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>>
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread Evan Balster
Just a few clarifications:

- Local maxima and first difference don't really matter.  The maximum wedge
describes global maxima for all intervals [t-x, t], where x=[R-1..0].

- If two equal samples are encountered, the algorithm can forget about the
older of them.

It has been very helpful for my purposes to visualize the algorithm thus:
 Imagine drawing lines rightward from all points on the signal.  Those
which extend to the present time without intersecting other parts of the
signal form the minimum and maximum wedge.  The newest sample, and only the
newest sample, exists in both wedges.

[image: Inline image 3]
http://interactopia.com/archive/images/lemire_algorithm.png

The algorithm can safely forget anything in grey because it has been
"shadowed" by newer maximum or minimum values.

– Evan Balster
creator of imitone 

On Fri, Sep 2, 2016 at 1:50 AM, Ross Bencina 
wrote:

> On 2/09/2016 4:37 PM, Ross Bencina wrote:
>
>> When the first difference is positive, the history is trimmed. This is
>> the only time any kind of O(N) or O(log2(N)) operation is performed.
>> First difference positive implies that a new local maximum is achieved:
>> in this case, all of the most recent history samples that are less than
>> the new maximum are trimmed.
>>
>
> Correction:
>
> [The new local maximum dominates all
>> history up to the most recent *history sample* that exceeds it, which is
>> retained.]
>>
>
> I had said "most recent local maximum" but the history contains a
> monotonic non-increasing sequence.
>
> Ross.
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread Ross Bencina

On 2/09/2016 4:37 PM, Ross Bencina wrote:

When the first difference is positive, the history is trimmed. This is
the only time any kind of O(N) or O(log2(N)) operation is performed.
First difference positive implies that a new local maximum is achieved:
in this case, all of the most recent history samples that are less than
the new maximum are trimmed.


Correction:


[The new local maximum dominates all
history up to the most recent *history sample* that exceeds it, which is
retained.]


I had said "most recent local maximum" but the history contains a 
monotonic non-increasing sequence.


Ross.
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-09-02 Thread Ross Bencina

Hello Robert,

> i think i understand the algorithm.

Your description seems quite distant from the algorithm as I understand it.

Considering only running max:

In effect, the running max keeps a history of the decreasing 
sub-sequences of the input.


When the first difference of the input is non-positive, every input 
sample is added to the history. e.g. for a strictly decreasing input, 
the algorithm behaves as a FIFO queue.


When the first difference is positive, the history is trimmed. This is 
the only time any kind of O(N) or O(log2(N)) operation is performed.
First difference positive implies that a new local maximum is achieved: 
in this case, all of the most recent history samples that are less than 
the new maximum are trimmed. [The new local maximum dominates all 
history up to the most recent local maximum that exceeds it, which is 
retained.]


The trimming can only occur once for a given history sample, so in the 
case of a strictly increasing input, only the most recent input would be 
present in the history queue. A similar situation would arise if the 
input was a high-frequency increasing amplitude signal. In this case the 
algorithm is O(1) because there is only one history sample to compare 
to/trim.


If a high frequency decreasing amplitude signal is input, then every 
second input would be retained in the history. For this type of input 
signal, a new local maximum would never cause old history to be trimmed 
[new local max would be less than all previous local maxes] -- therefore 
the trimming is a single comparison, i.e. O(1). In this case, you 
correctly observe that the memory requirements are similar to a history 
keeping tree.


The current maximum is always the oldest value that has been retained in 
the history. It does not require any search in the queue. For example, 
in a strictly increasing sequence there is only one value in the history 
(the most recent). In a strictly decreasing sequence, the whole input 
history (up to the window size) would be in the queue.


I believe that the worst case input would be a descending sawtooth with 
period windowSize-1. During the ramp segment, the algorithm would run in 
constant time, but at the jump, the whole history would have to be 
trimmed: that's where the worst-case O(N) or O(log2(N)) comes in: each 
time that a whole sequence of descending history has to be purged.


There can only ever be N trim operations per N input samples, so the 
amortized performance depends on how big your processing block size is. 
The O(N) or O(log2(N)) worst case spike gets spread over the processing 
block size -- there can never be more than one such spike per N input 
samples. This is presumably better than O(log2(N)) per sample for a tree 
based algorithm.


Ross.


On 2/09/2016 2:52 PM, robert bristow-johnson wrote:



i think i understand the algorithm.  let's assume running max.  you
will, for each wedge, keep a record of the height and location of the
maximum.

and for only two wedges, the wedge that is falling of the edge of the
delay buffer and the new wedge being formed from the new incoming
samples, only on those two wedges you need to monitor and update on a
sample-by-sample basis. as a wedge is created on the front end and as it
falls offa the back end, you will have to update the value and location
of the maxima of each of those two wedges.

each time the first derivative (the difference between the most current
two samples) crosses zero from a positive first difference to a negative
first difference (which is when you have a maximum), you have to record
a new wedge, right?

and then you do a binary max tree for the entire set of wedge maxima?
 is that how it works?

the latter part is O(log2(N)) worst case.  are you expecting a
computation savings over the straight binary search tree because the
number of wedges are expected to be a lot less than the buffer length?

if it's a really high frequency signal, you'll have lotsa wedges.  then
i think there won't be that much difference between the O(log2(N)) worst
case of this Lemire thing and the O(log2(N)) worst case of a straight
binary tree.  and the latter has **very** simple self-contained
code (which was previously posted).

i just wanna know, most specifically, what is the expectation of gain of
speed (or reduction of execution time) of this Lemire alg over the
simpler binary tree alg.  it doesn't seem to me to be a lot better for
memory requirements.


--

r b-j  r...@audioimagination.com

"Imagination is more important than knowledge."



 Original Message 
Subject: Re: [music-dsp] efficient running max algorithm
From: "Evan Balster" <e...@imitone.com>
Date: Fri, September 2, 2016 12:12 am
To: music-dsp@music.columbia.edu
--


Hello, all ---

Reviving this topic to mention I've created an STL-compat

Re: [music-dsp] efficient running max algorithm

2016-09-01 Thread Evan Balster
Hey, Robert ---

Check out Lemire's paper, or my summary in the third message in this
thread.  When computing both minimum and maximum over a range of size R and
over N samples, the total computational complexity is bounded by O(N) ---
that is, amortized constant time per sample --- while the worst case for
any sample is O(log2(R)) --- or just O(R) in Lemire's original algorithm.

If you do a binary search every sample, you get O(log2(R)) worst case per
sample but you lose the amortized constant time characteristic.  To keep
that, you need to use a hybrid search as proposed by Ethan here and
implemented in my code.

High frequencies won't really make a difference.  In any case the combined
sizes of the two wedges will never exceed R+1, and will typically be much
smaller.  Feel free to customize the unit test in the GitHub to your
satisfaction if you're doubtful.

– Evan Balster
creator of imitone <http://imitone.com>

On Thu, Sep 1, 2016 at 11:52 PM, robert bristow-johnson <
r...@audioimagination.com> wrote:

>
>
> i think i understand the algorithm.  let's assume running max.  you will,
> for each wedge, keep a record of the height and location of the maximum.
>
> and for only two wedges, the wedge that is falling of the edge of the
> delay buffer and the new wedge being formed from the new incoming samples,
> only on those two wedges you need to monitor and update on a
> sample-by-sample basis. as a wedge is created on the front end and as it
> falls offa the back end, you will have to update the value and location of
> the maxima of each of those two wedges.
>
> each time the first derivative (the difference between the most current
> two samples) crosses zero from a positive first difference to a negative
> first difference (which is when you have a maximum), you have to record a
> new wedge, right?
>
> and then you do a binary max tree for the entire set of wedge maxima?  is
> that how it works?
>
> the latter part is O(log2(N)) worst case.  are you expecting a computation
> savings over the straight binary search tree because the number of wedges
> are expected to be a lot less than the buffer length?
>
> if it's a really high frequency signal, you'll have lotsa wedges.  then i
> think there won't be that much difference between the O(log2(N)) worst case
> of this Lemire thing and the O(log2(N)) worst case of a straight binary
> tree.  and the latter has **very** simple self-contained code (which was
> previously posted).
>
> i just wanna know, most specifically, what is the expectation of gain of
> speed (or reduction of execution time) of this Lemire alg over the simpler
> binary tree alg.  it doesn't seem to me to be a lot better for memory
> requirements.
>
>
> --
>
> r b-j  r...@audioimagination.com
>
> "Imagination is more important than knowledge."
>
>
>
>  Original Message 
> Subject: Re: [music-dsp] efficient running max algorithm
> From: "Evan Balster" <e...@imitone.com>
> Date: Fri, September 2, 2016 12:12 am
> To: music-dsp@music.columbia.edu
> --
>
> > Hello, all ---
> >
> > Reviving this topic to mention I've created an STL-compatible header
> > implementing what I'm calling the "Lemire-Fenn" algorithm for rolling
> > minimum and maximum:
> >
> > https://github.com/EvanBalster/STL_mono_wedge
> >
> > This was a little pet project I undertook today to familiarize myself
> with
> > Github; the algorithm itself is about ten lines of code, but I've
> > documented it obsessively and run it through a series of unit tests. It's
> > compatible with std::vector, std::deque, and (I imagine) any
> STL-compliant
> > ringbuffer class exposing random-access iterators.
> >
> > Feel free to share any feedback or thoughts about this.
> >
> > – Evan Balster
> > creator of imitone <http://imitone.com>
> >
> > On Thu, Jul 21, 2016 at 10:16 AM, Evan Balster <e...@imitone.com> wrote:
> >
> >> Ahh, smart! Thanks for the insight; I understand now.
> >>
> >> It occurs to me that you could slightly tighten the bound on per-sample
> >> work, because it's not necessary to include the elements of the linear
> >> search in the binary one. The worst-case would then be O(J) per sample,
> >> where J=log2(w-J). But this is a very marginal improvement and it's
> >> difficult to write out the bound in a clearer way.
> >>
> >> – Evan Balster
> >> creator of imitone <http://imitone.com>
> >>
> >> On Thu, Jul 21, 2016 at 7:40 AM,

Re: [music-dsp] efficient running max algorithm

2016-09-01 Thread robert bristow-johnson



�
i think i understand the algorithm. �let's assume running max. �you will, for 
each wedge, keep a record of the height and location of the maximum.
and for only two wedges, the wedge that is falling of the edge of the delay 
buffer and the new wedge being formed from
the new incoming samples, only on those two wedges you need to monitor and 
update on a sample-by-sample basis. as a wedge is created on the front end and 
as it falls offa the back end, you will have to update the value and location 
of the maxima of each of those two wedges.
each time the first
derivative (the difference between the most current two samples) crosses zero 
from a positive first difference to a negative first difference (which is when 
you have a maximum), you have to record a new wedge, right?
and then you do a binary max tree for the entire set of wedge maxima?
�is that how it works?
the latter part is O(log2(N)) worst case. �are you expecting a computation 
savings over the straight binary search tree because the number of wedges are 
expected to be a lot less than the buffer length?
if it's a really high frequency signal, you'll have
lotsa wedges. �then i think there won't be that much difference between 
the�O(log2(N)) worst case of this Lemire thing and the�O(log2(N)) worst case of 
a straight binary tree. �and the latter has **very** simple self-contained 
code�(which was previously posted).
i just
wanna know, most specifically, what is the expectation of gain of speed (or 
reduction of execution time) of this Lemire alg over the simpler binary tree 
alg. �it doesn't seem to me to be a lot better for memory requirements.

--
r b-j � � � � � � � � � � �r...@audioimagination.com
"Imagination is more important than knowledge."




 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Evan Balster" <e...@imitone.com>

Date: Fri, September 2, 2016 12:12 am

To: music-dsp@music.columbia.edu

--



> Hello, all ---

>

> Reviving this topic to mention I've created an STL-compatible header

> implementing what I'm calling the "Lemire-Fenn" algorithm for rolling

> minimum and maximum:

>

> https://github.com/EvanBalster/STL_mono_wedge

>

> This was a little pet project I undertook today to familiarize myself with

> Github; the algorithm itself is about ten lines of code, but I've

> documented it obsessively and run it through a series of unit tests. It's

> compatible with std::vector, std::deque, and (I imagine) any STL-compliant

> ringbuffer class exposing random-access iterators.

>

> Feel free to share any feedback or thoughts about this.

>

>  Evan Balster

> creator of imitone <http://imitone.com>

>

> On Thu, Jul 21, 2016 at 10:16 AM, Evan Balster <e...@imitone.com> wrote:

>

>> Ahh, smart! Thanks for the insight; I understand now.

>>

>> It occurs to me that you could slightly tighten the bound on per-sample

>> work, because it's not necessary to include the elements of the linear

>> search in the binary one. The worst-case would then be O(J) per sample,

>> where J=log2(w-J). But this is a very marginal improvement and it's

>> difficult to write out the bound in a clearer way.

>>

>>  Evan Balster

>> creator of imitone <http://imitone.com>

>>

>> On Thu, Jul 21, 2016 at 7:40 AM, Ethan Fenn <et...@polyspectral.com>

>> wrote:

>>

>>> Yeah, with a binary search, you're doing O(log(w)) work, but you might

>>> not end up discarding any samples. With the paper's linear search, you

>>> might do O(w) work, but only if you end up discarding O(w) samples. So, it

>>> turns out to be worth it, as long as you can tolerate the spike.

>>>

>>> The thinking with the partial linear search is sure, let's spend

>>> O(log(w)) time doing a binary search, but only if we first confirm we'll be

>>> discarding at least O(log(w)) samples.

>>>

>>> -Ethan

>>>

>>>

>>>

>>>

>>> On Wed, Jul 20, 2016 at 6:37 PM, Evan Balster <e...@imitone.com> wrote:

>>>

>>>> Ethan ---

>>>>

>>>> If we use only a binary search in the discard step, isn't the amortized

>>>> complexity still O(N)? I suppose not... We'd be doing a log2(w) search

>>>> every sample in the worst case where a monotonic decrease occurs.

>>>>

>>>> I'll have to look over the paper to get a better understanding, but

>>>> would be very interested to hear your thought process with the partial

>>>> linear search.

>>>>

Re: [music-dsp] efficient running max algorithm

2016-09-01 Thread Evan Balster
Hello, all ---

Reviving this topic to mention I've created an STL-compatible header
implementing what I'm calling the "Lemire-Fenn" algorithm for rolling
minimum and maximum:

https://github.com/EvanBalster/STL_mono_wedge

This was a little pet project I undertook today to familiarize myself with
Github; the algorithm itself is about ten lines of code, but I've
documented it obsessively and run it through a series of unit tests.  It's
compatible with std::vector, std::deque, and (I imagine) any STL-compliant
ringbuffer class exposing random-access iterators.

Feel free to share any feedback or thoughts about this.

– Evan Balster
creator of imitone 

On Thu, Jul 21, 2016 at 10:16 AM, Evan Balster  wrote:

> Ahh, smart!  Thanks for the insight; I understand now.
>
> It occurs to me that you could slightly tighten the bound on per-sample
> work, because it's not necessary to include the elements of the linear
> search in the binary one.  The worst-case would then be O(J) per sample,
> where J=log2(w-J).  But this is a very marginal improvement and it's
> difficult to write out the bound in a clearer way.
>
> – Evan Balster
> creator of imitone 
>
> On Thu, Jul 21, 2016 at 7:40 AM, Ethan Fenn 
> wrote:
>
>> Yeah, with a binary search, you're doing O(log(w)) work, but you might
>> not end up discarding any samples. With the paper's linear search, you
>> might do O(w) work, but only if you end up discarding O(w) samples. So, it
>> turns out to be worth it, as long as you can tolerate the spike.
>>
>> The thinking with the partial linear search is sure, let's spend
>> O(log(w)) time doing a binary search, but only if we first confirm we'll be
>> discarding at least O(log(w)) samples.
>>
>> -Ethan
>>
>>
>>
>>
>> On Wed, Jul 20, 2016 at 6:37 PM, Evan Balster  wrote:
>>
>>> Ethan ---
>>>
>>> If we use only a binary search in the discard step, isn't the amortized
>>> complexity still O(N)?  I suppose not...  We'd be doing a log2(w) search
>>> every sample in the worst case where a monotonic decrease occurs.
>>>
>>> I'll have to look over the paper to get a better understanding, but
>>> would be very interested to hear your thought process with the partial
>>> linear search.
>>>
>>> – Evan Balster
>>> creator of imitone 
>>>
>>> On Wed, Jul 20, 2016 at 1:23 PM, Dan Gillespie <
>>> dgilles...@cosineaudio.com> wrote:
>>>
 Regarding the Lemire paper his code is provided here:
 https://github.com/lemire/runningmaxmin

 It's been a while since I've read the paper, but iirc it reports fast
 average O(.), not worst case.  Specifically it's good if the signal has few
 changes in direction, but the worst case is not better than other
 algorithms.

 Dan Gillespie

 ___
 dupswapdrop: music-dsp mailing list
 music-dsp@music.columbia.edu
 https://lists.columbia.edu/mailman/listinfo/music-dsp


>>>
>>> ___
>>> dupswapdrop: music-dsp mailing list
>>> music-dsp@music.columbia.edu
>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>
>>
>>
>> ___
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm (Evan Balster)

2016-07-25 Thread Ethan Fenn
> - IIUC, none the other algos mentioned can cheaply vary the window size
>  at run-time, right?

I don't think what I've been calling "the Tito method" can cheaply vary the
window size, unless I've missed something clever. Or unless you're willing
to accept a "crossover period" during which the results aren't strictly
correct.

The Lemire method (assuming your operator is something like min or max) can
be made to do it cheaply if you store the indices of the potential maxima
along with their values. If you do this, changing the window size is an
O(log(w)) operation. This means you'll lose the O(1) average if the window
size is changing every sample, but I'd guess it would be faster than your
method in the end.

> - None of them generalize for other binary operators instead of max,
>  right?  (mine works for any operator that is associative and
>  commutative)

I think that's correct. The Tito method will work if you add one more
condition on the operator -- that every value has an inverse. The Lemire
method, and the heap- or tree-based methods, are pretty specific to max
and/or min.

> - Can I further optimise this algo by doing a linear search for the
>  first log2(w) samples?  Why? How?

No, this idea only makes sense if you're storing a sorted array of
potential maxima, like in the Lemire method. In your method there's not
really an array that you would want to search through!

-Ethan



On Mon, Jul 25, 2016 at 1:06 PM, Tito Latini  wrote:

> > Hi, OP here.
> >
> >
> > Unfortunately I don't understand all you are saying, due to my lack of C
> > and Z-transform skills.
> >
> > This is what I understood, please correct me if I'm wrong:
> > - My algo is called a binary max-tree.  (When used with max as an
> >  operator)
> > - The other algo discussed is faster on average, but has spikes that are
> >  higher.
> >
> >
> > Some questions:
> >
> > - IIUC, none the other algos mentioned can cheaply vary the window size
> >  at run-time, right?
>
> Your algo is not cheaper: the performance-time loop of your compiled
> slidingsum.dsp contains two for-loops (from compute() in slidingsum.cpp):
>
> for (int i=14; i>0; i--) fVec3[i] = fVec3[i-1];
> for (int i=6; i>0; i--) fVec2[i] = fVec2[i-1];
>
> and maxsize is 8 (maybe the loops are unrolled in this simple case but
> it's not always Sunday).
>
> A simple implementation of my suggested double buffering (the alternative
> and better modulation for the initial posted code) uses two loops, one
> "from i to oldsize-1" (max 7 in this case) and one "from 0 to i-1" to
> update max 8 slots instead of 15+7=22 (the numbers are just emphasis
> and the loops are possibly unrolled in this case). Besides, the two loops
> are used only if the winsize changes and it is minor than the old size.
> It is cheaper than your generated code, also by moving the two loops within
> the performance-time loop.
>
> > - None of them generalize for other binary operators instead of max,
> >  right?  (mine works for any operator that is associative and
> >  commutative)
>
> Nope. I'll show "max" for simplicity but the next example should work in
> similar contexts. We can use a circular buffer, a binary max heap and
> new functions to work with the heap instead of add/subtract to/from
> the accumulators:
>
> [just a simple idea, not the state of the art]
>
> a) remove the oldest from the heap
> b) insert the new to the heap
>
> ---[alternative 1]---
>
> a) on the heap: replace the oldest with the new (binary search and
> swap)
> b) on the heap: update the position of the new (binary comp)
>
> coda:
>
> c) update the circular buffer and increment the index (modulo winsize)
> d) max is heap[0], the root
>
> Again, if winsize changes and it is minor than the old size, it is
> necessary to remove the oldest "old_size - new_size" values from the heap.
>
> ---[alternative 2]---
>
> We can avoid the binary search (a) by using a heap where the elements are
> the indexes of the buffer values and an array (or a different structure
> for the elements of the heap, i.e. index and position in uint32 if
> maxsize is two bytes) to store the positions of the indexes on the heap.
>
> For example, with two arrays of type int called "heap" and "pos":
>
> heap[k] = buffer_index;
> pos[buffer_index] = k;
>
> binary-search-and-swap is unnecessary because we know the index of the
> oldest value on the heap:
>
> heap[pos[curr_buffer_index]]
>
> and we use that heap position to start the binary comp with the new
> value because the buffer index is the same (it is incremented after
> the insertion).
>
> The max value is:
>
> buffer[heap[0]]
>
> > - Can I further optimise this algo by doing a linear search for the
> >  first log2(w) samples?  Why? How?
>
> Where? :)
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
>

Re: [music-dsp] efficient running max algorithm (Evan Balster)

2016-07-25 Thread Tito Latini
> Hi, OP here.
>
>
> Unfortunately I don't understand all you are saying, due to my lack of C
> and Z-transform skills.
>
> This is what I understood, please correct me if I'm wrong:
> - My algo is called a binary max-tree.  (When used with max as an
>  operator)
> - The other algo discussed is faster on average, but has spikes that are
>  higher.
>
>
> Some questions:
>
> - IIUC, none the other algos mentioned can cheaply vary the window size
>  at run-time, right?

Your algo is not cheaper: the performance-time loop of your compiled
slidingsum.dsp contains two for-loops (from compute() in slidingsum.cpp):

for (int i=14; i>0; i--) fVec3[i] = fVec3[i-1];
for (int i=6; i>0; i--) fVec2[i] = fVec2[i-1];

and maxsize is 8 (maybe the loops are unrolled in this simple case but
it's not always Sunday).

A simple implementation of my suggested double buffering (the alternative
and better modulation for the initial posted code) uses two loops, one
"from i to oldsize-1" (max 7 in this case) and one "from 0 to i-1" to
update max 8 slots instead of 15+7=22 (the numbers are just emphasis
and the loops are possibly unrolled in this case). Besides, the two loops
are used only if the winsize changes and it is minor than the old size.
It is cheaper than your generated code, also by moving the two loops within
the performance-time loop.

> - None of them generalize for other binary operators instead of max,
>  right?  (mine works for any operator that is associative and
>  commutative)

Nope. I'll show "max" for simplicity but the next example should work in
similar contexts. We can use a circular buffer, a binary max heap and
new functions to work with the heap instead of add/subtract to/from
the accumulators:

[just a simple idea, not the state of the art]

a) remove the oldest from the heap
b) insert the new to the heap

---[alternative 1]---

a) on the heap: replace the oldest with the new (binary search and swap)
b) on the heap: update the position of the new (binary comp)

coda:

c) update the circular buffer and increment the index (modulo winsize)
d) max is heap[0], the root

Again, if winsize changes and it is minor than the old size, it is
necessary to remove the oldest "old_size - new_size" values from the heap.

---[alternative 2]---

We can avoid the binary search (a) by using a heap where the elements are
the indexes of the buffer values and an array (or a different structure
for the elements of the heap, i.e. index and position in uint32 if
maxsize is two bytes) to store the positions of the indexes on the heap.

For example, with two arrays of type int called "heap" and "pos":

heap[k] = buffer_index;
pos[buffer_index] = k;

binary-search-and-swap is unnecessary because we know the index of the
oldest value on the heap:

heap[pos[curr_buffer_index]]

and we use that heap position to start the binary comp with the new
value because the buffer index is the same (it is incremented after
the insertion).

The max value is:

buffer[heap[0]]

> - Can I further optimise this algo by doing a linear search for the
>  first log2(w) samples?  Why? How?

Where? :)
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp



Re: [music-dsp] efficient running max algorithm

2016-07-21 Thread Evan Balster
Ahh, smart!  Thanks for the insight; I understand now.

It occurs to me that you could slightly tighten the bound on per-sample
work, because it's not necessary to include the elements of the linear
search in the binary one.  The worst-case would then be O(J) per sample,
where J=log2(w-J).  But this is a very marginal improvement and it's
difficult to write out the bound in a clearer way.

– Evan Balster
creator of imitone 

On Thu, Jul 21, 2016 at 7:40 AM, Ethan Fenn  wrote:

> Yeah, with a binary search, you're doing O(log(w)) work, but you might not
> end up discarding any samples. With the paper's linear search, you might do
> O(w) work, but only if you end up discarding O(w) samples. So, it turns out
> to be worth it, as long as you can tolerate the spike.
>
> The thinking with the partial linear search is sure, let's spend O(log(w))
> time doing a binary search, but only if we first confirm we'll be
> discarding at least O(log(w)) samples.
>
> -Ethan
>
>
>
>
> On Wed, Jul 20, 2016 at 6:37 PM, Evan Balster  wrote:
>
>> Ethan ---
>>
>> If we use only a binary search in the discard step, isn't the amortized
>> complexity still O(N)?  I suppose not...  We'd be doing a log2(w) search
>> every sample in the worst case where a monotonic decrease occurs.
>>
>> I'll have to look over the paper to get a better understanding, but would
>> be very interested to hear your thought process with the partial linear
>> search.
>>
>> – Evan Balster
>> creator of imitone 
>>
>> On Wed, Jul 20, 2016 at 1:23 PM, Dan Gillespie <
>> dgilles...@cosineaudio.com> wrote:
>>
>>> Regarding the Lemire paper his code is provided here:
>>> https://github.com/lemire/runningmaxmin
>>>
>>> It's been a while since I've read the paper, but iirc it reports fast
>>> average O(.), not worst case.  Specifically it's good if the signal has few
>>> changes in direction, but the worst case is not better than other
>>> algorithms.
>>>
>>> Dan Gillespie
>>>
>>> ___
>>> dupswapdrop: music-dsp mailing list
>>> music-dsp@music.columbia.edu
>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>
>>>
>>
>> ___
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-07-21 Thread Wen Xue

The trick is to count the total number of operations, not for each incoming 
sample as the window moves. 

The algorithm maintains a buffer that pushes at the back and pops at both front 
and back. Each sample is pushed onto the buffer and popped out of it exactly 
once. If R samples are popped from the front, then N-R are popped from the 
back. All N pushes are at the back.

Each comparison with an incoming sample leads to either one push or one pop at 
the back. It naturally follows that the total of N-R pops and N pushes at the 
back costs 2N-R comparisons. There are also N yes/no comparisons to determine 
whether to pop a sample from the front as the window moves on.  So yes, the 
total is O(N) regardless of w.

-Xue


From: Ethan Fenn___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-07-21 Thread Ethan Fenn
Yeah, with a binary search, you're doing O(log(w)) work, but you might not
end up discarding any samples. With the paper's linear search, you might do
O(w) work, but only if you end up discarding O(w) samples. So, it turns out
to be worth it, as long as you can tolerate the spike.

The thinking with the partial linear search is sure, let's spend O(log(w))
time doing a binary search, but only if we first confirm we'll be
discarding at least O(log(w)) samples.

-Ethan




On Wed, Jul 20, 2016 at 6:37 PM, Evan Balster  wrote:

> Ethan ---
>
> If we use only a binary search in the discard step, isn't the amortized
> complexity still O(N)?  I suppose not...  We'd be doing a log2(w) search
> every sample in the worst case where a monotonic decrease occurs.
>
> I'll have to look over the paper to get a better understanding, but would
> be very interested to hear your thought process with the partial linear
> search.
>
> – Evan Balster
> creator of imitone 
>
> On Wed, Jul 20, 2016 at 1:23 PM, Dan Gillespie  > wrote:
>
>> Regarding the Lemire paper his code is provided here:
>> https://github.com/lemire/runningmaxmin
>>
>> It's been a while since I've read the paper, but iirc it reports fast
>> average O(.), not worst case.  Specifically it's good if the signal has few
>> changes in direction, but the worst case is not better than other
>> algorithms.
>>
>> Dan Gillespie
>>
>> ___
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>>
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-07-20 Thread Evan Balster
Ethan ---

If we use only a binary search in the discard step, isn't the amortized
complexity still O(N)?  I suppose not...  We'd be doing a log2(w) search
every sample in the worst case where a monotonic decrease occurs.

I'll have to look over the paper to get a better understanding, but would
be very interested to hear your thought process with the partial linear
search.

– Evan Balster
creator of imitone 

On Wed, Jul 20, 2016 at 1:23 PM, Dan Gillespie 
wrote:

> Regarding the Lemire paper his code is provided here:
> https://github.com/lemire/runningmaxmin
>
> It's been a while since I've read the paper, but iirc it reports fast
> average O(.), not worst case.  Specifically it's good if the signal has few
> changes in direction, but the worst case is not better than other
> algorithms.
>
> Dan Gillespie
>
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-07-20 Thread Ethan Fenn
Good idea and very clear summary.

One correction: in the discard step the paper prescribes a linear search
through the queue starting from the end, not a binary search. And the
modification I propose is to stick with this linear search for the first
log2(w) samples, then switch to a binary search. I believe this makes the
overall performance worse, but that it will still be O(n), and now the
worst case for an individual sample is O(log(w)) rather than O(w), a big
improvement.

As to how the O(n) bound works, I'd suggest just reading that part of the
paper until it clicks. Informally, the gist is this: Robert rightly worries
about the case where we have to search through this whole list of w
elements to figure out how many to discard. This will indeed take some time
for large w. However, the list only grows in size when we're lucky --
specifically when we encounter a sample which is less than everything in
the list. In this case we just add it to the end and move on, and the
discard step is nearly free. So, in order for the list to have gotten this
long in the first place we must have gotten lucky w times in the past. Even
though we have to pay the piper now, the time is amortized over all of
these samples. The average time is thus bounded by a constant, and really
quite a small constant.

(And sorry, Robert, I don't have any code to share).

-Ethan




On Wed, Jul 20, 2016 at 12:39 PM, Evan Balster <e...@imitone.com> wrote:

> OK, here's what I gather...  I'm also going to summarize for the benefit
> of people who haven't read the other thread.
>
> There exists an algorithm for computing the minimum and/or maximum sample
> value in a rolling range.  It runs in amortized linear time in proportion
> to the number of samples processed---that is, we can process N samples in
> O(N).  Individual samples have a worst-case time of O(w) or O(log2(w)) if
> we use a binary search.
>
> Daniel Lemire, Streaming Maximum-Minimum Filter Using No More than Three
>> Comparisons per Element, Nordic Journal of Computing, Volume 13, Number 4,
>> pages 328-339, 2006
>
>
>> http://arxiv.org/abs/cs/0610046
>
>
> From what I've read here, the maximum algorithm works thus:
>
>- Maintain a delay line containing the last w elements.
>
>- Maintain a "monotonic wedge" queue describing all potential
>maximums...
>   - The queue contains each sample from the last *w* samples which is
>   larger than all the samples after it.
>   - By definition, the front element in the queue is the maximum in
>   the range.
>   - The queue's maximum size is *w*; this happens when the samples
>   are monotonically descending.
>
>   - For each new input sample:
>- If the front element in the queue is now *w* or more samples old,
>   discard it.
>   - *Discard* all samples in the queue which are not larger than the
>   latest sample.
>   - Add the latest sample to the back of the queue.
>
>   - To implement the *Discard* step:
>   - Perform a binary search on the queue (which is already sorted) to
>   find the smallest sample larger than
>   <http://www.cplusplus.com/reference/algorithm/upper_bound/> the new
>   input sample.
>   - Discard all samples following the found sample.
>
>   - To query the maximum:
>   - Return the front-most element of the queue.
>
> We can derive the min or min-and-max algorithm trivially from these
> steps.  The queue can be efficiently implemented with a ring-buffer of size
> w, which could optionally store delay-line indices instead of samples.  The
> discard step would be as simple as revising the end pointer.
>
> Because the only sample which can be both a minimum and  maximum candidate
> is the latest sample, we might be able to implement the algorithm with
> queue storage of size w.
>
> As Robert observes, this could get very spiky for w larger than processing
> block size.  I suspect that it would be possible to do the same kind of
> magic used in realtime hash-table migration and make that discard step run
> in constant time per sample---but that's a little more tango than I feel
> like dancing today!
>
> – Evan Balster
> creator of imitone <http://imitone.com>
>
> On Wed, Jul 20, 2016 at 11:04 AM, robert bristow-johnson <
> r...@audioimagination.com> wrote:
>
>>
>>
>>  Original Message 
>> Subject: Re: [music-dsp] efficient running max algorithm
>> From: "Ethan Fenn" <et...@polyspectral.com>
>> Date: Wed, July 20, 2016 10:27 am
>> To: music-dsp@music.columbia.edu
>> --
>>
>>

Re: [music-dsp] efficient running max algorithm

2016-07-20 Thread robert bristow-johnson







 Original Message 

Subject: Re: [music-dsp] efficient running max algorithm

From: "Ethan Fenn" <et...@polyspectral.com>

Date: Wed, July 20, 2016 10:27 am

To: music-dsp@music.columbia.edu

--



> Of course, for processing n samples, something that is O(n) is going to

> eventually beat something that's O(n*log(w)), for big enough w.
i am skeptical of the claim. �at least for worst case. �if w = 2^30, i would 
think the computational cost O(.) will have an increasing function of w in the 
expression.
> FWIW if it's important to have
O(log(w)) worst case per sample, I think you
> can adapt the method of the paper to achieve this while keeping the O(1)

> average.

>
for real-time processing (that's how i think of a "running" max) the worst case 
timing is what one has to worry about. �at least to prevent an unexpected 
hiccup.
now, if the worst case *average* can be contained, then processing samples in 
blocks can take advantage of
that averaging. �so if we process samples in, say, 32 or 64 sample blocks, and 
the *worst case* timing of this can be guaranteed to beat the O(log2(w)), then 
the alg has value. �but if it cannot be guaranteed to do that, it does not for 
a real-time context.
> Instead of a linear
search back through the maxima, do a linear search for
> log2(w) samples and, failing that, switch to binary search. This is

> O(log(w)) worst case, but I think it would keep the property of O(1)

> comparisons per sample. I believe the upper bound would now be 4

> comparisons per sample rather than 3. I'd have to think more about it to be

> sure.

�
because i am having trouble understanding *exactly* the alg, i would still 
appreciate a snippet of code.
i will bet, that given a snippet of code, i can think of a signal that will 
push the computational cost into the worst case which will be worse than
O(log2(w)).
--
r b-j � � � � � � � � � � �r...@audioimagination.com
"Imagination is more important than knowledge."___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm

2016-07-20 Thread Ethan Fenn
Of course, for processing n samples, something that is O(n) is going to
eventually beat something that's O(n*log(w)), for big enough w.

FWIW if it's important to have O(log(w)) worst case per sample, I think you
can adapt the method of the paper to achieve this while keeping the O(1)
average.

Instead of a linear search back through the maxima, do a linear search for
log2(w) samples and, failing that, switch to binary search. This is
O(log(w)) worst case, but I think it would keep the property of O(1)
comparisons per sample. I believe the upper bound would now be 4
comparisons per sample rather than 3. I'd have to think more about it to be
sure.

-Ethan



On Tue, Jul 19, 2016 at 1:49 PM, robert bristow-johnson <
r...@audioimagination.com> wrote:

>
>
> changing the title as per Evan's request
>
>
>  Original Message 
> Subject: Re: [music-dsp] highly optimised variable rms and more
> From: "Ethan Fenn" 
> Date: Tue, July 19, 2016 9:45 am
> To: music-dsp@music.columbia.edu
> Cc: "robert bristow-johnson" 
> --
>
> > So you keep a list of all samples in your window (length w) which are
> > potential maxima -- that is, those which are >= all of the following
> > samples. By definition this is a monotonically decreasing list (not
> > strictly, there can be duplicates). For each input sample, you compare it
> > against the end of the list. If it's <= the last sample, just add it to
> the
> > list. Otherwise scan backwards until you find a sample >= it,
>
> how much does *that* cost?
>
> > chop off the
> > list after that point and add your sample. See if the first sample in
> your
> > list is expiring (if it is == the earliest sample in the window), and
> > discard it if so. Then output the first sample in the list and move on.
> >
> > It's O(1) in an amortized sense -- it's O(n) to process n samples,
> > regardless of w. The worst case time for any individual sample is O(w).
>
> that is as i expected.  O(w).  not O(1).
>
> the code i posted is, worst case, O(log2(w)).
>
>
>
> > Still, it seems very practical, the operations are simple and few.
>
> i'd like to see the code.  and then i'll cook up a signal that pushes it
> into the worst case.
>
> i thought the operations of the posted code are also few and simple and it
> guarantees worst case O(log2(w)).
>
>
>
> --
>
> r b-j  r...@audioimagination.com
>
> "Imagination is more important than knowledge."
>
>
>
>
>
> restated for convenience:
>
>
>
>
>
> #define A_REALLY_LARGE_NUMBER 3.40e38
>
> typedef struct
>{
>unsigned long filter_length;  // array_size/2 < filter_length
> <= array_size
>unsigned long array_size; // must be power of 2 for this
> simple implementation
>unsigned long input_index;// the actual sample placement is
> at (array_size + input_index);
>float* big_array_base;// the big array is malloc()
> separately and is actually twice array_size;
>} search_tree_array_data;
>
>
> void initSearchArray(unsigned long filter_length, search_tree_array_data*
> array_data)
>{
>array_data->filter_length = filter_length;
>
>array_data->array_size = 1;
>filter_length--;
>while (filter_length > 0)
> {
> array_data->array_size <<= 1;
> filter_length >>= 1;
> }
>// array_size is a power of 2 such that
>// filter_length <= array_size < 2*filter_length
>// array_size = 2^ceil(log2(filter_length)) =
> 2^(1+floor(filter_length-1))
>
>array_data->input_index = 0;
>
>array_data->big_array_base =
> (float*)malloc(sizeof(float)*2*array_data->array_size);// dunno
> what to do if malloc() fails.
>
>for (unsigned long n=0; n<2*array_data->array_size; n++)
> {
> array_data->big_array_base[n] = -A_REALLY_LARGE_NUMBER;//
> init array.
> } //
> array_base[0] is never used.
> }
>
>
>
> /*
>  *   findMaxSample(new_sample, _data) will place new_sample into the
> circular
>  *   buffer in the latter half of the array pointed to by
> array_data.big_array_base .
>  *   it will then compare the value in new_sample to its "sibling" value,
> takes the
>  *   greater of the two and then pops up one generation to the parent node
> where
>  *   this parent also has a sibling and repeats the process.  since the
> other parent
>  *   nodes already have the max value of the two child nodes, when getting
> to the
>  *   top-level parent node, this node will have the maximum value of all
> the samples
>  *   in the big_array.  the number of iterations of this loop is
> ceil(log2(filter_length)).
>  */
>
> float findMaxSample(float value, search_tree_array_data* array_data)
>{
>register float* big_array = array_data->big_array_base;
>
>