Re: [Jprogramming] faster empirical cumulative distribution function

Henry Rich Wed, 13 Aug 2014 13:27:52 -0700

I don't understand what the input and outputs of ecdf are supposed to be.

But, if x is an empirical dataset and y is an array of values for whichyou want the value of the ecdf, you could use


   ecdf =: ((I.~ % #@]) (/:~@:(-&0.00001)))~
   dataset ecdf 2 2 2 4 4 6 6 8
0.375 0.375 0.375 0.625 0.625 0.875 0.875 1
   dataset ecdf i. 9
0 0 0.375 0.375 0.625 0.625 0.875 0.875 1

The 0.00001 bit is only if you really insist on P(X<=x) rather than P(X<x).



Henry Rich

On 8/13/2014 3:06 PM, Joe Bogner wrote:

I'm looking for any ideas to speed up this function.

I patched together this ecdf function from a few different ideas:

NB. > v<- c(2,2,2,4,4,6,6,8)
NB. > ecdf(v)(v)
NB. [1] 0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000

ecdf=: 3 : 0
   valsct=. # y
   tbl=:y,.(valsct %~ #) \ y
   max=:(0{"1 tbl) (>./)/. tbl
   , 1{"1 (({."1 max) i. y) { max
)

(0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000) -: ecdf (2,2,2,4,4,6,6,8)
1


timespacex 'ecdf i. 1e5'
8.84599 1.15392e7

The r function is nearly instantaneous

I need to run this on a 1m+ array

Thank you for any suggestions

http://en.wikipedia.org/wiki/Empirical_distribution_function
https://github.com/jstac/edtc-code/blob/master/python_code/ecdf.py
https://github.com/dmbates/ecdfExample
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] faster empirical cumulative distribution function

Reply via email to