[Jprogramming] Moving median - any good methods?

'Mike Day' via Programming Tue, 14 Mar 2017 12:51:18 -0700

This has arisen in solving Euler Project problem 593.
https://projecteuler.net/problem=593

It is basically quite easy, especially for the examples with givenanswers, iefor F(100,10) and even for F(100000, 10000) but as usual with theseproblems,

there's a sting in the tail, in the computing requirement for F(10^7, 10^5)

The data consist of what one can regard as a random set of integers,which I called

S2keep,  drawn from a finite set with ~ 20000 elements.

F(n,k) is the sum of medians on each window of k consecutive elements of
sample of n elements of S2keep.

So F(10,100) is easily constructed,  hard-wired here:
    10j1":-:+/10 (+/@:(4 5{/:~))\100{.S2keep

463628.5


This is of course inefficient for larger n and k.

However, the window size for the stated problem is ~5 times larger thanthe universe

of elements, and we know that each successive window differs from theprevious one

in at most two elements, so it's somewhat more efficient, though moretedious, to

maintain a count array (cts) of size ~20000. For any window, this arrayconsists of

effectively random zero or positive entries, summing to k. The indicesof these entries

are the indices of the elements of S2keep in a sorted list of nubelements of S2keep. So

locating the indices of the (k/2)th members of the cumulative sum of ctswill reveal the


median elements of the window.

But this is still quite slow. The answer which I submitted had takenabout 14 minutes


with this method. (in J805 pre avx!)

I thought it should be possible to do better, and googled in stackoverflow and elsewhere.

The preferred method appears to require "indexable skip lists", said tobe better than binary

trees. I had a look - they're not too hard to construct, but it'sdifficult to insert and delete


elements in a J-like way.

I gave up on this tack, and instead found some improvement in run-timewith using

heaps. So, instead of maintaining a count array with one slot per nubelement, the heaps

count several contiguous nub elements. The default of this approach isto have ~ %: 20000


heaps, ie ~ 140.

Best time so far is still quite long - about 8 minutes. (And ~5 1/2 inJ806Beta using avx)

Question is - has anyone achieved efficient solutions in J? Otherlanguages seem to have

efficient implementations of heaps, binary trees, skip lists and soon, if only one can


understand how to use them!

Not much use if it's just for solving Euler Project problems, butpeople do need moving


statistics on occasion.


This is long enough without quoting my source code.  Available on request!


Thanks,


Mike




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] Moving median - any good methods?

Reply via email to