On 2 May 2012 05:47, Dave Goel <dee...@gmail.com> wrote: > Hi. > > I believe that ranks.m (from > /usr/share/octave/3.2.4/m/statistics/base/ranks.m in Debian) is very > inefficient except when the elements of the input matrix are distinct. > > For example, for this input, a=round(10*rand(100123,100)); > > my alternative finished computation in 1.4s while the current function > hasn't yet returned an answer in the last 15-20 minutes. > > For smaller inputs, I checked carefully, and the outputs do agree. > > > --- > Additionally, the alternative also implements multiple ranking schemes: > fractional (default), standard competition ranking, modified competition > ranking, and finally, ordinal ranking; examples of all of which are seen > here: http://en.wikipedia.org/wiki/Ranking > > > ---- > > Though I completely changed the algorithm, I managed to pen the > alternative as a derivative of the current function. Thus, attaching (a) > diff; (b) as the complete file. > > Both the diff and the complete file are also included right here within > the body of this email. > ---- > > --- /usr/share/octave/3.2.4/m/statistics/base/ranks.m 2010-10-25 > 16:57:01.000000000 -0400 > +++ ./ranksoctavemy.m 2012-05-02 00:26:34.000000000 -0400 > @@ -1,45 +1,58 @@ > ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006, > -## 2007, 2008, 2009 Kurt Hornik > +## 2007, 2008, 2009 Kurt Hornik > +## Copyright (C) 2012 Dave Goel > ## > ## This file is part of Octave. > ## > ## Octave is free software; you can redistribute it and/or modify it > -## under the terms of the GNU General Public License as published by > -## the Free Software Foundation; either version 3 of the License, or (at > +## under the terms of the GNU General Public License as published by the > +## Free Software Foundation; either version 3 of the License, or (at > ## your option) any later version. > ## > -## Octave is distributed in the hope that it will be useful, but > -## WITHOUT ANY WARRANTY; without even the implied warranty of > -## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > -## General Public License for more details. > +## Octave is distributed in the hope that it will be useful, but WITHOUT > +## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > +## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +## for more details. > ## > ## You should have received a copy of the GNU General Public License > ## along with Octave; see the file COPYING. If not, see > ## <http://www.gnu.org/licenses/>. > - > +## > ## -*- texinfo -*- > -## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) > -## Return the ranks of @var{x} along the first non-singleton dimension > -## adjust for ties. If the optional argument @var{dim} is > -## given, operate along this dimension. > -## @end deftypefn > +## > +## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the > +## ranks of @var{x} along the first non-singleton dimension adjust for > +## ties. If the optional argument @var{dim} is given, operate along this > +## dimension. Third argument @var{rtype} donates the type of ranking: 0 > +## or "fractional" for fractional, which is the default, 1 or > +## "competition" for competiton ranking, 2 or "modified" for modified > +## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or > +## "dense" for dense ranking. When using the third argument, you can > +## supply us '' for @var{dim} to have it auto-computed. @end deftypefn > + > +## This function should return a result identical to GNU OCTAVE's > +## built-in rank function but is much faster (2.2s vs. 50s when the > +## input was, for example, 10*rand(100123,100)). > +## Additionally, we also handle several ranking schemes. > > -## Author: KH <kurt.hor...@wu-wien.ac.at> > +## Authors: KH <kurt.hor...@wu-wien.ac.at>, Dave Goel <dee...@gmail.comp > ## Description: Compute ranks > > -## This code was rather ugly, since it didn't use sort due to the > -## fact of how to deal with ties. Now it does use sort and its > -## even uglier!!! At least it handles NDArrays.. > > -function y = ranks (x, dim) > +function y = ranksoctavemy (x, dim, rtype) > > - if (nargin != 1 && nargin != 2) > + if (nargin < 1); > print_usage (); > endif > > + if nargin<3||isempty(rtype); > + rtype=0; > + endif > + > nd = ndims (x); > sz = size (x); > - if (nargin != 2) > + autodimp=(nargin<2)||isempty(dim); > + if autodimp; > ## Find the first non-singleton dimension. > dim = 1; > while (dim < nd + 1 && sz(dim) == 1) > @@ -66,24 +79,69 @@ > perm(dim) = 1; > x = permute (x, perm); > endif > - sz = size (x); > - infvec = -Inf * ones ([1, sz(2 : end)]); > - [xs, xi] = sort (x); > - eq_el = find (diff ([xs; infvec]) == 0); > - if (isempty (eq_el)) > - [eq_el, y] = sort (xi); > - else > - runs = complement (eq_el+1, eq_el); > - len = diff (find (diff ([Inf; eq_el; -Inf]) != 1)) + 1; > - [eq_el, y] = sort (xi); > - for i = 1 : length(runs) > - y (xi (runs (i) + [0:(len(i)-1)]) + floor (runs (i) ./ sz(1)) > - * sz(1)) = eq_el(runs(i)) + (len(i) - 1) / 2; > - endfor > - endif > + sz=size(x); > + [sx ids]=sort(x); ## sx is sorted x. > + > + lin=cumsum(ones(size(x)),1); ## A linearly increasing array. > + > + switch rtype; > + case {0,"fractional"}; > + lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2; > + case {1,"competition"}; > + lin=_competition(lin,sx,sz); > + case {2,"modified"}; > + lin=_modified(lin,sx,sz); > + case {3,"ordinal"}; > + ## lin=lin; > + otherwise > + rtype > + error("Illegal value of rtype specified."); > + endswitch > + y=NaN(size(lin)); > + ## If input was a vector, we could have simply done this: > + ## y(ids)=lin; > + y(_ids(ids,sz))=lin; > + > if (dim != 1) > y = permute (y, perm); > endif > endif > +endfunction > > +function idf=_ids(ids,sz); > + oo=ones(sz); > + allids=[{ids-1}]; > + nn=numel(sz); > + for ii=2:nn; > + allids=[allids,{cumsum(oo,ii)-1}]; > + endfor > + idf=allids{end}; > + for jj=(nn-1):-1:1; > + idf=idf*sz(jj)+allids{jj}; > + endfor > + idf+=1; > endfunction > + > + > +function linnew=_competition (lin,sx,sz) > + ## Stop increasing lin when sx does not increase. Same as before > + ## otherwise. > + infvec = -Inf * ones ([1, sz(2 : end)]); > + fnewp= find(diff([infvec;sx])); > + linnew=zeros(size(lin)); > + linnew(fnewp)=lin(fnewp); > + linnew=cummax(linnew); > +endfunction > + > + > +function linnew=_modified (lin,sx,sz) > + ## Traverse lin backwards. Stop decreasing it when sx doesn't > + ## decrease. > + infvec = Inf * ones ([1, sz(2 : end)]); > + fnewp= find(diff([sx;infvec])); > + linnew=Inf(size(lin)); > + linnew(fnewp)=lin(fnewp); > + linnew=flipdim(cummin(flipdim(linnew,1)),1); > +endfunction > + > + > > > ==================================================== > > ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006, > ## 2007, 2008, 2009 Kurt Hornik > ## Copyright (C) 2012 Dave Goel > ## > ## This file is part of Octave. > ## > ## Octave is free software; you can redistribute it and/or modify it > ## under the terms of the GNU General Public License as published by the > ## Free Software Foundation; either version 3 of the License, or (at > ## your option) any later version. > ## > ## Octave is distributed in the hope that it will be useful, but WITHOUT > ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > ## for more details. > ## > ## You should have received a copy of the GNU General Public License > ## along with Octave; see the file COPYING. If not, see > ## <http://www.gnu.org/licenses/>. > ## > ## -*- texinfo -*- > ## > ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the > ## ranks of @var{x} along the first non-singleton dimension adjust for > ## ties. If the optional argument @var{dim} is given, operate along this > ## dimension. Third argument @var{rtype} donates the type of ranking: 0 > ## or "fractional" for fractional, which is the default, 1 or > ## "competition" for competiton ranking, 2 or "modified" for modified > ## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or > ## "dense" for dense ranking. When using the third argument, you can > ## supply us '' for @var{dim} to have it auto-computed. @end deftypefn > > ## This function should return a result identical to GNU OCTAVE's > ## built-in rank function but is much faster (2.2s vs. 50s when the > ## input was, for example, 10*rand(100123,100)). > ## Additionally, we also handle several ranking schemes. > > ## Authors: KH <kurt.hor...@wu-wien.ac.at>, Dave Goel <dee...@gmail.comp > ## Description: Compute ranks > > > function y = ranksoctavemy (x, dim, rtype) > > if (nargin < 1); > print_usage (); > endif > > if nargin<3||isempty(rtype); > rtype=0; > endif > > nd = ndims (x); > sz = size (x); > autodimp=(nargin<2)||isempty(dim); > if autodimp; > ## Find the first non-singleton dimension. > dim = 1; > while (dim < nd + 1 && sz(dim) == 1) > dim = dim + 1; > endwhile > if (dim > nd) > dim = 1; > endif > else > if (! (isscalar (dim) && dim == round (dim)) > && dim > 0 > && dim < (nd + 1)) > error ("ranks: dim must be an integer and valid dimension"); > endif > endif > > if (sz(dim) == 1) > y = ones(sz); > else > ## The algorithm works only on dim = 1, so permute if necesary. > if (dim != 1) > perm = [1 : nd]; > perm(1) = dim; > perm(dim) = 1; > x = permute (x, perm); > endif > sz=size(x); > [sx ids]=sort(x); ## sx is sorted x. > > lin=cumsum(ones(size(x)),1); ## A linearly increasing array. > > switch rtype; > case {0,"fractional"}; > lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2; > case {1,"competition"}; > lin=_competition(lin,sx,sz); > case {2,"modified"}; > lin=_modified(lin,sx,sz); > case {3,"ordinal"}; > ## lin=lin; > otherwise > rtype > error("Illegal value of rtype specified."); > endswitch > y=NaN(size(lin)); > ## If input was a vector, we could have simply done this: > ## y(ids)=lin; > y(_ids(ids,sz))=lin; > > if (dim != 1) > y = permute (y, perm); > endif > endif > endfunction > > function idf=_ids(ids,sz); > oo=ones(sz); > allids=[{ids-1}]; > nn=numel(sz); > for ii=2:nn; > allids=[allids,{cumsum(oo,ii)-1}]; > endfor > idf=allids{end}; > for jj=(nn-1):-1:1; > idf=idf*sz(jj)+allids{jj}; > endfor > idf+=1; > endfunction > > > function linnew=_competition (lin,sx,sz) > ## Stop increasing lin when sx does not increase. Same as before > ## otherwise. > infvec = -Inf * ones ([1, sz(2 : end)]); > fnewp= find(diff([infvec;sx])); > linnew=zeros(size(lin)); > linnew(fnewp)=lin(fnewp); > linnew=cummax(linnew); > endfunction > > > function linnew=_modified (lin,sx,sz) > ## Traverse lin backwards. Stop decreasing it when sx doesn't > ## decrease. > infvec = Inf * ones ([1, sz(2 : end)]); > fnewp= find(diff([sx;infvec])); > linnew=Inf(size(lin)); > linnew(fnewp)=lin(fnewp); > linnew=flipdim(cummin(flipdim(linnew,1)),1); > endfunction > --- > > Sincerely > Dave Goel.
Hi Dave I'm forwarding your e-mail to the octave mailing list (this is the mailing list for octave forge packages, but ranks is part of octave core). Also, the best place to submit this would be the patch tracker on savannah http://savannah.gnu.org/patch/?group=octave One last note, the diff you have is against the code on 3.2.4 which is a quite old version. Your patch won't apply since the file has gone through some modifications already (see the log of the file here http://hg.savannah.gnu.org/hgweb/octave/log/c38a253723d3/scripts/statistics/base/ranks.m ). Carnë ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev