[OctDev] Resending: An improvement to ranks.m with no attachments:

Dave Goel Tue, 01 May 2012 21:58:18 -0700

Hi.

I believe that ranks.m (from
/usr/share/octave/3.2.4/m/statistics/base/ranks.m in Debian) is very
inefficient except when the elements of the input matrix are distinct.


For example,  for this input, a=round(10*rand(100123,100));

my alternative finished computation in 1.4s while the current function
hasn't yet returned an answer in the last 15-20 minutes.

For smaller inputs, I checked carefully, and the outputs do agree.


---
Additionally, the alternative also implements multiple ranking schemes:
fractional (default), standard competition ranking, modified competition
ranking, and finally, ordinal ranking; examples of all of which are seen
here: http://en.wikipedia.org/wiki/Ranking


----

Though I completely changed the algorithm, I managed to pen the
alternative as a derivative of the current function. 

Both the diff and the complete file are also included right here within
the body of this email.
----

--- /usr/share/octave/3.2.4/m/statistics/base/ranks.m   2010-10-25 
16:57:01.000000000 -0400
+++ ./ranksoctavemy.m   2012-05-02 00:26:34.000000000 -0400
@@ -1,45 +1,58 @@
 ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006,
-##               2007, 2008, 2009 Kurt Hornik
+## 2007, 2008, 2009 Kurt Hornik
+## Copyright (C) 2012 Dave Goel 
 ##
 ## This file is part of Octave.
 ##
 ## Octave is free software; you can redistribute it and/or modify it
-## under the terms of the GNU General Public License as published by
-## the Free Software Foundation; either version 3 of the License, or (at
+## under the terms of the GNU General Public License as published by the
+## Free Software Foundation; either version 3 of the License, or (at
 ## your option) any later version.
 ##
-## Octave is distributed in the hope that it will be useful, but
-## WITHOUT ANY WARRANTY; without even the implied warranty of
-## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-## General Public License for more details.
+## Octave is distributed in the hope that it will be useful, but WITHOUT
+## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+## FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+## for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Octave; see the file COPYING.  If not, see
 ## <http://www.gnu.org/licenses/>.
-
+##
 ## -*- texinfo -*-
-## @deftypefn {Function File} {} ranks (@var{x}, @var{dim})
-## Return the ranks of @var{x} along the first non-singleton dimension
-## adjust for ties.  If the optional argument @var{dim} is
-## given, operate along this dimension.
-## @end deftypefn
+##
+## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
+## ranks of @var{x} along the first non-singleton dimension adjust for
+## ties. If the optional argument @var{dim} is given, operate along this
+## dimension. Third argument @var{rtype} donates the type of ranking: 0
+## or "fractional" for fractional, which is the default, 1 or
+## "competition" for competiton ranking, 2 or "modified" for modified
+## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
+## "dense" for dense ranking. When using the third argument, you can
+## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
+
+## This function should return a result identical to GNU OCTAVE's
+## built-in rank function but is much faster (2.2s vs. 50s when the
+## input was, for example, 10*rand(100123,100)).
+## Additionally, we also handle several ranking schemes.
 
-## Author: KH <[email protected]>
+## Authors: KH <[email protected]>, Dave Goel <[email protected]
 ## Description: Compute ranks
 
-## This code was rather ugly, since it didn't use sort due to the
-## fact of how to deal with ties. Now it does use sort and its
-## even uglier!!! At least it handles NDArrays..
 
-function y = ranks (x, dim)
+function y = ranksoctavemy (x, dim, rtype)
 
-  if (nargin != 1 && nargin != 2)
+  if (nargin < 1);
     print_usage ();
   endif
 
+  if nargin<3||isempty(rtype);
+    rtype=0;
+  endif
+  
   nd = ndims (x);
   sz = size (x);
-  if (nargin != 2)
+  autodimp=(nargin<2)||isempty(dim);
+  if autodimp;
     ## Find the first non-singleton dimension.
     dim  = 1;
     while (dim < nd + 1 && sz(dim) == 1)
@@ -66,24 +79,69 @@
       perm(dim) = 1;
       x = permute (x, perm);
     endif
-    sz = size (x);
-    infvec = -Inf * ones ([1, sz(2 : end)]);
-    [xs, xi] = sort (x);
-    eq_el = find (diff ([xs; infvec]) == 0);
-    if (isempty (eq_el))
-      [eq_el, y] = sort (xi);
-    else
-      runs = complement (eq_el+1, eq_el);
-      len = diff (find (diff ([Inf; eq_el; -Inf]) != 1)) + 1;
-      [eq_el, y] = sort (xi);
-      for i = 1 : length(runs)
-       y (xi (runs (i) + [0:(len(i)-1)]) + floor (runs (i) ./ sz(1)) 
-          * sz(1)) = eq_el(runs(i)) + (len(i) - 1) / 2;
-      endfor
-    endif  
+    sz=size(x);
+    [sx ids]=sort(x); ## sx is sorted x.
+    
+    lin=cumsum(ones(size(x)),1);  ## A linearly increasing array.
+
+    switch rtype;
+      case {0,"fractional"};
+        lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
+      case {1,"competition"};
+        lin=_competition(lin,sx,sz);
+      case {2,"modified"};
+        lin=_modified(lin,sx,sz);
+      case {3,"ordinal"};
+        ## lin=lin;
+      otherwise 
+        rtype
+        error("Illegal value of rtype specified.");
+    endswitch
+    y=NaN(size(lin));
+    ## If input was a vector, we could have simply done this: 
+    ## y(ids)=lin;
+    y(_ids(ids,sz))=lin;
+
     if (dim != 1)
       y = permute (y, perm);
     endif
   endif
+endfunction
 
+function idf=_ids(ids,sz);
+  oo=ones(sz);
+  allids=[{ids-1}];
+  nn=numel(sz);
+  for ii=2:nn;
+    allids=[allids,{cumsum(oo,ii)-1}];
+  endfor
+  idf=allids{end};
+  for jj=(nn-1):-1:1;
+    idf=idf*sz(jj)+allids{jj};
+  endfor
+  idf+=1;
 endfunction
+
+
+function linnew=_competition (lin,sx,sz)
+  ## Stop increasing lin when sx does not increase. Same as before
+  ## otherwise.
+  infvec = -Inf * ones ([1, sz(2 : end)]);
+  fnewp= find(diff([infvec;sx]));
+  linnew=zeros(size(lin));
+  linnew(fnewp)=lin(fnewp);
+  linnew=cummax(linnew);
+endfunction
+
+
+function linnew=_modified (lin,sx,sz)
+  ## Traverse lin backwards. Stop decreasing it when sx doesn't
+  ## decrease.
+  infvec = Inf * ones ([1, sz(2 : end)]);
+  fnewp= find(diff([sx;infvec]));
+  linnew=Inf(size(lin));
+  linnew(fnewp)=lin(fnewp);
+  linnew=flipdim(cummin(flipdim(linnew,1)),1);
+endfunction 
+
+


 ====================================================

## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006,
## 2007, 2008, 2009 Kurt Hornik
## Copyright (C) 2012 Dave Goel 
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by the
## Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but WITHOUT
## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
## FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
## for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.
##
## -*- texinfo -*-
##
## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
## ranks of @var{x} along the first non-singleton dimension adjust for
## ties. If the optional argument @var{dim} is given, operate along this
## dimension. Third argument @var{rtype} donates the type of ranking: 0
## or "fractional" for fractional, which is the default, 1 or
## "competition" for competiton ranking, 2 or "modified" for modified
## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
## "dense" for dense ranking. When using the third argument, you can
## supply us '' for @var{dim} to have it auto-computed. @end deftypefn

## This function should return a result identical to GNU OCTAVE's
## built-in rank function but is much faster (2.2s vs. 50s when the
## input was, for example, 10*rand(100123,100)).
## Additionally, we also handle several ranking schemes.

## Authors: KH <[email protected]>, Dave Goel <[email protected]
## Description: Compute ranks


function y = ranksoctavemy (x, dim, rtype)

  if (nargin < 1);
    print_usage ();
  endif

  if nargin<3||isempty(rtype);
    rtype=0;
  endif
  
  nd = ndims (x);
  sz = size (x);
  autodimp=(nargin<2)||isempty(dim);
  if autodimp;
    ## Find the first non-singleton dimension.
    dim  = 1;
    while (dim < nd + 1 && sz(dim) == 1)
      dim = dim + 1;
    endwhile
    if (dim > nd)
      dim = 1;
    endif
  else
    if (! (isscalar (dim) && dim == round (dim))
        && dim > 0
        && dim < (nd + 1))
      error ("ranks: dim must be an integer and valid dimension");
    endif
  endif

  if (sz(dim) == 1)
    y = ones(sz);
  else
    ## The algorithm works only on dim = 1, so permute if necesary.
    if (dim != 1)
      perm = [1 : nd];
      perm(1) = dim;
      perm(dim) = 1;
      x = permute (x, perm);
    endif
    sz=size(x);
    [sx ids]=sort(x); ## sx is sorted x.
    
    lin=cumsum(ones(size(x)),1);  ## A linearly increasing array.

    switch rtype;
      case {0,"fractional"};
        lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
      case {1,"competition"};
        lin=_competition(lin,sx,sz);
      case {2,"modified"};
        lin=_modified(lin,sx,sz);
      case {3,"ordinal"};
        ## lin=lin;
      otherwise 
        rtype
        error("Illegal value of rtype specified.");
    endswitch
    y=NaN(size(lin));
    ## If input was a vector, we could have simply done this: 
    ## y(ids)=lin;
    y(_ids(ids,sz))=lin;

    if (dim != 1)
      y = permute (y, perm);
    endif
  endif
endfunction

function idf=_ids(ids,sz);
  oo=ones(sz);
  allids=[{ids-1}];
  nn=numel(sz);
  for ii=2:nn;
    allids=[allids,{cumsum(oo,ii)-1}];
  endfor
  idf=allids{end};
  for jj=(nn-1):-1:1;
    idf=idf*sz(jj)+allids{jj};
  endfor
  idf+=1;
endfunction


function linnew=_competition (lin,sx,sz)
  ## Stop increasing lin when sx does not increase. Same as before
  ## otherwise.
  infvec = -Inf * ones ([1, sz(2 : end)]);
  fnewp= find(diff([infvec;sx]));
  linnew=zeros(size(lin));
  linnew(fnewp)=lin(fnewp);
  linnew=cummax(linnew);
endfunction


function linnew=_modified (lin,sx,sz)
  ## Traverse lin backwards. Stop decreasing it when sx doesn't
  ## decrease.
  infvec = Inf * ones ([1, sz(2 : end)]);
  fnewp= find(diff([sx;infvec]));
  linnew=Inf(size(lin));
  linnew(fnewp)=lin(fnewp);
  linnew=flipdim(cummin(flipdim(linnew,1)),1);
endfunction 
---

Sincerely
Dave Goel.
-- 








------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

[OctDev] Resending: An improvement to ranks.m with no attachments:

Reply via email to