[Jprogramming] faster empirical cumulative distribution function

Joe Bogner Wed, 13 Aug 2014 12:06:50 -0700

I'm looking for any ideas to speed up this function.

I patched together this ecdf function from a few different ideas:


NB. > v<- c(2,2,2,4,4,6,6,8)
NB. > ecdf(v)(v)
NB. [1] 0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000

ecdf=: 3 : 0
  valsct=. # y
  tbl=:y,.(valsct %~ #) \ y
  max=:(0{"1 tbl) (>./)/. tbl
  , 1{"1 (({."1 max) i. y) { max
)

(0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000) -: ecdf (2,2,2,4,4,6,6,8)
1


timespacex 'ecdf i. 1e5'
8.84599 1.15392e7

The r function is nearly instantaneous

I need to run this on a 1m+ array

Thank you for any suggestions

http://en.wikipedia.org/wiki/Empirical_distribution_function
https://github.com/jstac/edtc-code/blob/master/python_code/ecdf.py
https://github.com/dmbates/ecdfExample
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] faster empirical cumulative distribution function

Reply via email to