I'm not sure if your data is guaranteed sorted.  
If it is,  then this 
seems ok: 
   ecdfsorted =: (# (% {:)@:(+/\))@:(#/.~)

It's pretty fast 
for i.1e5 

Otherwise, you need to sort first: 
   ecdfunsorted =: (# 
(% {:)@:(+/\))@:(#/.~)@:/:~

Slightly slower.

I'm a bit puzzled 
though,  in that your example yields 3 copies of 3.75, 2 of 0.625 
etc.  
The frequencies emerge repeated as often as the observations.


Isn't it more useful to produce a keyed table of cumulative 
frequencies? 
eg:
   (~.,.(((%{:)@:(+/\)))@:(#/.~))2 2 2 4 4 6 6 8
2 
0.375
4 0.625
6 0.875
8     1

Good luck!

Mike

On 13/08/2014 20:06, 
Joe Bogner wrote:
> I'm looking for any ideas to speed up this 
function.
>
> I patched together this ecdf function from a few 
different ideas:
>
> NB. > v<- c(2,2,2,4,4,6,6,8)
> NB. > ecdf(v)(v)
> 
NB. [1] 0.375 0.375 0.375 0.625 0.625 0.875 0.875 1.000
>
> ecdf=: 3 : 
0
>   valsct=. # y
>   tbl=:y,.(valsct %~ #) \ y
>   max=:(0{"1 tbl) (>.
/)/. tbl
>   , 1{"1 (({."1 max) i. y) { max
> )
>
> (0.375 0.375 0.375 
0.625 0.625 0.875 0.875 1.000) -: ecdf (2,2,2,4,4,6,6,8)
> 1
>
>
> 
timespacex 'ecdf i. 1e5'
> 8.84599 1.15392e7
>
> The r function is 
nearly instantaneous
>
> I need to run this on a 1m+ array
>
> Thank 
you for any suggestions
>
> http://en.wikipedia.org/wiki/Empirical_distribution_function> 
> https://github.com/jstac/edtc-code/blob/master/python_code/ecdf.py
> https://github.com/dmbates/ecdfExample> 
----------------------------------------------------------------------

> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to