This has arisen in solving Euler Project problem 593.
https://projecteuler.net/problem=593
It is basically quite easy, especially for the examples with given answers, ie for F(100,10) and even for F(100000, 10000) but as usual with these problems,
there's a sting in the tail, in the computing requirement for F(10^7, 10^5)

The data consist of what one can regard as a random set of integers, which I called
S2keep,  drawn from a finite set with ~ 20000 elements.

F(n,k) is the sum of medians on each window of k consecutive elements of
sample of n elements of S2keep.

So F(10,100) is easily constructed,  hard-wired here:
    10j1":-:+/10 (+/@:(4 5{/:~))\100{.S2keep

463628.5


This is of course inefficient for larger n and k.


However, the window size for the stated problem is ~5 times larger than the universe

of elements, and we know that each successive window differs from the previous one

in at most two elements, so it's somewhat more efficient, though more tedious, to

maintain a count array (cts) of size ~20000. For any window, this array consists of

effectively random zero or positive entries, summing to k. The indices of these entries

are the indices of the elements of S2keep in a sorted list of nub elements of S2keep. So

locating the indices of the (k/2)th members of the cumulative sum of cts will reveal the

median elements of the window.


But this is still quite slow. The answer which I submitted had taken about 14 minutes

with this method. (in J805 pre avx!)


I thought it should be possible to do better, and googled in stack overflow and elsewhere.

The preferred method appears to require "indexable skip lists", said to be better than binary

trees. I had a look - they're not too hard to construct, but it's difficult to insert and delete

elements in a J-like way.


I gave up on this tack, and instead found some improvement in run-time with using

heaps. So, instead of maintaining a count array with one slot per nub element, the heaps

count several contiguous nub elements. The default of this approach is to have ~ %: 20000

heaps, ie ~ 140.


Best time so far is still quite long - about 8 minutes. (And ~5 1/2 in J806Beta using avx)


Question is - has anyone achieved efficient solutions in J? Other languages seem to have

efficient implementations of heaps, binary trees, skip lists and so on, if only one can

understand how to use them!


Not much use if it's just for solving Euler Project problems, but people do need moving

statistics on occasion.


This is long enough without quoting my source code.  Available on request!


Thanks,


Mike




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to