I'm not sure if your data is guaranteed sorted.
If it is, then this
seems ok:
ecdfsorted =: (# (% {:)@:(+/\))@:(#/.~)
It's pretty fast
for i.1e5
Otherwise, you need to sort first:
ecdfunsorted =: (#
(% {:)@:(+/\))@:(#/.~)@:/:~
Slightly slower.
I'm a bit puzzled
though, in that your example yields 3 copies of 3.75, 2 of 0.625
etc.
The frequencies emerge repeated as often as the observations.
Isn't it more useful to produce a keyed table of cumulative
frequencies?
eg:
(~.,.(((%{:)@:(+/\)))@:(#/.~))2 2 2 4 4 6 6 8
2
0.375
4 0.625
6 0.875
8 1
Good luck!
Mike
On 13/08/2014 20:06,
Joe Bogner wrote:
> I'm looking for any ideas to speed up this
function.
>
> I patched together this ecdf function from a few
different ideas:
>
> NB. > v<- c(2,2,2,4,4,6,6,8)
> NB. > ecdf(v)(v)
>
NB. [1] 0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000
>
> ecdf=: 3 :
0
> valsct=. # y
> tbl=:y,.(valsct %~ #) \ y
> max=:(0{"1 tbl) (>.
/)/. tbl
> , 1{"1 (({."1 max) i. y) { max
> )
>
> (0.375 0.375 0.375
0.625 0.625 0.875 0.875 1.000) -: ecdf (2,2,2,4,4,6,6,8)
> 1
>
>
>
timespacex 'ecdf i. 1e5'
> 8.84599 1.15392e7
>
> The r function is
nearly instantaneous
>
> I need to run this on a 1m+ array
>
> Thank
you for any suggestions
>
> http://en.wikipedia.org/wiki/Empirical_distribution_function>
> https://github.com/jstac/edtc-code/blob/master/python_code/ecdf.py
> https://github.com/dmbates/ecdfExample>
----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm