I don't understand what the input and outputs of ecdf are supposed to be.
But, if x is an empirical dataset and y is an array of values for which
you want the value of the ecdf, you could use
ecdf =: ((I.~ % #@]) (/:~@:(-&0.00001)))~
dataset ecdf 2 2 2 4 4 6 6 8
0.375 0.375 0.375 0.625 0.625 0.875 0.875 1
dataset ecdf i. 9
0 0 0.375 0.375 0.625 0.625 0.875 0.875 1
The 0.00001 bit is only if you really insist on P(X<=x) rather than P(X<x).
Henry Rich
On 8/13/2014 3:06 PM, Joe Bogner wrote:
I'm looking for any ideas to speed up this function.
I patched together this ecdf function from a few different ideas:
NB. > v<- c(2,2,2,4,4,6,6,8)
NB. > ecdf(v)(v)
NB. [1] 0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000
ecdf=: 3 : 0
valsct=. # y
tbl=:y,.(valsct %~ #) \ y
max=:(0{"1 tbl) (>./)/. tbl
, 1{"1 (({."1 max) i. y) { max
)
(0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000) -: ecdf (2,2,2,4,4,6,6,8)
1
timespacex 'ecdf i. 1e5'
8.84599 1.15392e7
The r function is nearly instantaneous
I need to run this on a 1m+ array
Thank you for any suggestions
http://en.wikipedia.org/wiki/Empirical_distribution_function
https://github.com/jstac/edtc-code/blob/master/python_code/ecdf.py
https://github.com/dmbates/ecdfExample
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm