This has arisen in solving Euler Project problem 593.
https://projecteuler.net/problem=593
It is basically quite easy, especially for the examples with given
answers, ie
for F(100,10) and even for F(100000, 10000) but as usual with these
problems,
there's a sting in the tail, in the computing requirement for F(10^7, 10^5)
The data consist of what one can regard as a random set of integers,
which I called
S2keep, drawn from a finite set with ~ 20000 elements.
F(n,k) is the sum of medians on each window of k consecutive elements of
sample of n elements of S2keep.
So F(10,100) is easily constructed, hard-wired here:
10j1":-:+/10 (+/@:(4 5{/:~))\100{.S2keep
463628.5
This is of course inefficient for larger n and k.
However, the window size for the stated problem is ~5 times larger than
the universe
of elements, and we know that each successive window differs from the
previous one
in at most two elements, so it's somewhat more efficient, though more
tedious, to
maintain a count array (cts) of size ~20000. For any window, this array
consists of
effectively random zero or positive entries, summing to k. The indices
of these entries
are the indices of the elements of S2keep in a sorted list of nub
elements of S2keep. So
locating the indices of the (k/2)th members of the cumulative sum of cts
will reveal the
median elements of the window.
But this is still quite slow. The answer which I submitted had taken
about 14 minutes
with this method. (in J805 pre avx!)
I thought it should be possible to do better, and googled in stack
overflow and elsewhere.
The preferred method appears to require "indexable skip lists", said to
be better than binary
trees. I had a look - they're not too hard to construct, but it's
difficult to insert and delete
elements in a J-like way.
I gave up on this tack, and instead found some improvement in run-time
with using
heaps. So, instead of maintaining a count array with one slot per nub
element, the heaps
count several contiguous nub elements. The default of this approach is
to have ~ %: 20000
heaps, ie ~ 140.
Best time so far is still quite long - about 8 minutes. (And ~5 1/2 in
J806Beta using avx)
Question is - has anyone achieved efficient solutions in J? Other
languages seem to have
efficient implementations of heaps, binary trees, skip lists and so
on, if only one can
understand how to use them!
Not much use if it's just for solving Euler Project problems, but
people do need moving
statistics on occasion.
This is long enough without quoting my source code. Available on request!
Thanks,
Mike
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm