Hi. I believe that ranks.m (from /usr/share/octave/3.2.4/m/statistics/base/ranks.m in Debian) is very inefficient except when the elements of the input matrix are distinct.
For example, for this input, a=round(10*rand(100123,100)); my alternative finished computation in 1.4s while the current function hasn't yet returned an answer in the last 15-20 minutes. For smaller inputs, I checked carefully, and the outputs do agree. --- Additionally, the alternative also implements multiple ranking schemes: fractional (default), standard competition ranking, modified competition ranking, and finally, ordinal ranking; examples of all of which are seen here: http://en.wikipedia.org/wiki/Ranking ---- Though I completely changed the algorithm, I managed to pen the alternative as a derivative of the current function. Both the diff and the complete file are also included right here within the body of this email. ---- --- /usr/share/octave/3.2.4/m/statistics/base/ranks.m 2010-10-25 16:57:01.000000000 -0400 +++ ./ranksoctavemy.m 2012-05-02 00:26:34.000000000 -0400 @@ -1,45 +1,58 @@ ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006, -## 2007, 2008, 2009 Kurt Hornik +## 2007, 2008, 2009 Kurt Hornik +## Copyright (C) 2012 Dave Goel ## ## This file is part of Octave. ## ## Octave is free software; you can redistribute it and/or modify it -## under the terms of the GNU General Public License as published by -## the Free Software Foundation; either version 3 of the License, or (at +## under the terms of the GNU General Public License as published by the +## Free Software Foundation; either version 3 of the License, or (at ## your option) any later version. ## -## Octave is distributed in the hope that it will be useful, but -## WITHOUT ANY WARRANTY; without even the implied warranty of -## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -## General Public License for more details. +## Octave is distributed in the hope that it will be useful, but WITHOUT +## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +## for more details. ## ## You should have received a copy of the GNU General Public License ## along with Octave; see the file COPYING. If not, see ## <http://www.gnu.org/licenses/>. - +## ## -*- texinfo -*- -## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) -## Return the ranks of @var{x} along the first non-singleton dimension -## adjust for ties. If the optional argument @var{dim} is -## given, operate along this dimension. -## @end deftypefn +## +## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the +## ranks of @var{x} along the first non-singleton dimension adjust for +## ties. If the optional argument @var{dim} is given, operate along this +## dimension. Third argument @var{rtype} donates the type of ranking: 0 +## or "fractional" for fractional, which is the default, 1 or +## "competition" for competiton ranking, 2 or "modified" for modified +## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or +## "dense" for dense ranking. When using the third argument, you can +## supply us '' for @var{dim} to have it auto-computed. @end deftypefn + +## This function should return a result identical to GNU OCTAVE's +## built-in rank function but is much faster (2.2s vs. 50s when the +## input was, for example, 10*rand(100123,100)). +## Additionally, we also handle several ranking schemes. -## Author: KH <kurt.hor...@wu-wien.ac.at> +## Authors: KH <kurt.hor...@wu-wien.ac.at>, Dave Goel <dee...@gmail.comp ## Description: Compute ranks -## This code was rather ugly, since it didn't use sort due to the -## fact of how to deal with ties. Now it does use sort and its -## even uglier!!! At least it handles NDArrays.. -function y = ranks (x, dim) +function y = ranksoctavemy (x, dim, rtype) - if (nargin != 1 && nargin != 2) + if (nargin < 1); print_usage (); endif + if nargin<3||isempty(rtype); + rtype=0; + endif + nd = ndims (x); sz = size (x); - if (nargin != 2) + autodimp=(nargin<2)||isempty(dim); + if autodimp; ## Find the first non-singleton dimension. dim = 1; while (dim < nd + 1 && sz(dim) == 1) @@ -66,24 +79,69 @@ perm(dim) = 1; x = permute (x, perm); endif - sz = size (x); - infvec = -Inf * ones ([1, sz(2 : end)]); - [xs, xi] = sort (x); - eq_el = find (diff ([xs; infvec]) == 0); - if (isempty (eq_el)) - [eq_el, y] = sort (xi); - else - runs = complement (eq_el+1, eq_el); - len = diff (find (diff ([Inf; eq_el; -Inf]) != 1)) + 1; - [eq_el, y] = sort (xi); - for i = 1 : length(runs) - y (xi (runs (i) + [0:(len(i)-1)]) + floor (runs (i) ./ sz(1)) - * sz(1)) = eq_el(runs(i)) + (len(i) - 1) / 2; - endfor - endif + sz=size(x); + [sx ids]=sort(x); ## sx is sorted x. + + lin=cumsum(ones(size(x)),1); ## A linearly increasing array. + + switch rtype; + case {0,"fractional"}; + lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2; + case {1,"competition"}; + lin=_competition(lin,sx,sz); + case {2,"modified"}; + lin=_modified(lin,sx,sz); + case {3,"ordinal"}; + ## lin=lin; + otherwise + rtype + error("Illegal value of rtype specified."); + endswitch + y=NaN(size(lin)); + ## If input was a vector, we could have simply done this: + ## y(ids)=lin; + y(_ids(ids,sz))=lin; + if (dim != 1) y = permute (y, perm); endif endif +endfunction +function idf=_ids(ids,sz); + oo=ones(sz); + allids=[{ids-1}]; + nn=numel(sz); + for ii=2:nn; + allids=[allids,{cumsum(oo,ii)-1}]; + endfor + idf=allids{end}; + for jj=(nn-1):-1:1; + idf=idf*sz(jj)+allids{jj}; + endfor + idf+=1; endfunction + + +function linnew=_competition (lin,sx,sz) + ## Stop increasing lin when sx does not increase. Same as before + ## otherwise. + infvec = -Inf * ones ([1, sz(2 : end)]); + fnewp= find(diff([infvec;sx])); + linnew=zeros(size(lin)); + linnew(fnewp)=lin(fnewp); + linnew=cummax(linnew); +endfunction + + +function linnew=_modified (lin,sx,sz) + ## Traverse lin backwards. Stop decreasing it when sx doesn't + ## decrease. + infvec = Inf * ones ([1, sz(2 : end)]); + fnewp= find(diff([sx;infvec])); + linnew=Inf(size(lin)); + linnew(fnewp)=lin(fnewp); + linnew=flipdim(cummin(flipdim(linnew,1)),1); +endfunction + + ==================================================== ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006, ## 2007, 2008, 2009 Kurt Hornik ## Copyright (C) 2012 Dave Goel ## ## This file is part of Octave. ## ## Octave is free software; you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by the ## Free Software Foundation; either version 3 of the License, or (at ## your option) any later version. ## ## Octave is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License ## for more details. ## ## You should have received a copy of the GNU General Public License ## along with Octave; see the file COPYING. If not, see ## <http://www.gnu.org/licenses/>. ## ## -*- texinfo -*- ## ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the ## ranks of @var{x} along the first non-singleton dimension adjust for ## ties. If the optional argument @var{dim} is given, operate along this ## dimension. Third argument @var{rtype} donates the type of ranking: 0 ## or "fractional" for fractional, which is the default, 1 or ## "competition" for competiton ranking, 2 or "modified" for modified ## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or ## "dense" for dense ranking. When using the third argument, you can ## supply us '' for @var{dim} to have it auto-computed. @end deftypefn ## This function should return a result identical to GNU OCTAVE's ## built-in rank function but is much faster (2.2s vs. 50s when the ## input was, for example, 10*rand(100123,100)). ## Additionally, we also handle several ranking schemes. ## Authors: KH <kurt.hor...@wu-wien.ac.at>, Dave Goel <dee...@gmail.comp ## Description: Compute ranks function y = ranksoctavemy (x, dim, rtype) if (nargin < 1); print_usage (); endif if nargin<3||isempty(rtype); rtype=0; endif nd = ndims (x); sz = size (x); autodimp=(nargin<2)||isempty(dim); if autodimp; ## Find the first non-singleton dimension. dim = 1; while (dim < nd + 1 && sz(dim) == 1) dim = dim + 1; endwhile if (dim > nd) dim = 1; endif else if (! (isscalar (dim) && dim == round (dim)) && dim > 0 && dim < (nd + 1)) error ("ranks: dim must be an integer and valid dimension"); endif endif if (sz(dim) == 1) y = ones(sz); else ## The algorithm works only on dim = 1, so permute if necesary. if (dim != 1) perm = [1 : nd]; perm(1) = dim; perm(dim) = 1; x = permute (x, perm); endif sz=size(x); [sx ids]=sort(x); ## sx is sorted x. lin=cumsum(ones(size(x)),1); ## A linearly increasing array. switch rtype; case {0,"fractional"}; lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2; case {1,"competition"}; lin=_competition(lin,sx,sz); case {2,"modified"}; lin=_modified(lin,sx,sz); case {3,"ordinal"}; ## lin=lin; otherwise rtype error("Illegal value of rtype specified."); endswitch y=NaN(size(lin)); ## If input was a vector, we could have simply done this: ## y(ids)=lin; y(_ids(ids,sz))=lin; if (dim != 1) y = permute (y, perm); endif endif endfunction function idf=_ids(ids,sz); oo=ones(sz); allids=[{ids-1}]; nn=numel(sz); for ii=2:nn; allids=[allids,{cumsum(oo,ii)-1}]; endfor idf=allids{end}; for jj=(nn-1):-1:1; idf=idf*sz(jj)+allids{jj}; endfor idf+=1; endfunction function linnew=_competition (lin,sx,sz) ## Stop increasing lin when sx does not increase. Same as before ## otherwise. infvec = -Inf * ones ([1, sz(2 : end)]); fnewp= find(diff([infvec;sx])); linnew=zeros(size(lin)); linnew(fnewp)=lin(fnewp); linnew=cummax(linnew); endfunction function linnew=_modified (lin,sx,sz) ## Traverse lin backwards. Stop decreasing it when sx doesn't ## decrease. infvec = Inf * ones ([1, sz(2 : end)]); fnewp= find(diff([sx;infvec])); linnew=Inf(size(lin)); linnew(fnewp)=lin(fnewp); linnew=flipdim(cummin(flipdim(linnew,1)),1); endfunction --- Sincerely Dave Goel. -- ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev