Re: [OctDev] pdist function

Francesco Potortì Wed, 12 Nov 2008 09:34:21 -0800

>a nice job (I wanted this function several times).

If you use it, and especially if you have the possibility of making
further checks against Matlab's behaviour, please let me know, or just
correct it.  I have not uploaded it yet, going home now.  Hope tomorrow.


>I have two comments:
>1. I think that instead of
>if (...)
>  switch ...
>   case ...
>     return
>   case ...
>     return
>endif
>
>...code...
>
>you should use
>if (...)
>  switch ...
>   case ...
>   case ...
>else
> ...code...
>endif
>
>I.e., no return in each case. This is more consistent with our
>recommended coding style, where we try to minimize the number of exit
>points from a function.

I tried several alternatives, at last this one seemed the less ugly.  In
fact, we have three cases for funcname: either one of the eleven known
strings, or a function name string, or a function name handle.  The
latter two are treated the same way using feval.  This way, the program
flow is straightforward, I have no code duplication, and all the return
statements are at the end of a case label, i.e., they are not scattered
through the code, which is what makes maintenance difficult.  And
unfortunately I cannot manage the function handle case into the
otherwise label of switch, because switch barfs when comparing a
function handle to a string.

Mmmh.  I could avoid the return statementes and check for the function
return value having been assigned at the end of the switch...

>2. The changeset http://hg.savannah.gnu.org/hgweb/octave/rev/b11c31849b44
>equipped `norm' with the ability to compute column or row norms of a matrix.
>This can be used to your explicit norm expressions like
>`sqrt (sumsq (diff))' or `(sum ((abs (diff)).^p)).^(1/p)'
>by
>`norm (diff, 'cols')' or `norm (diff, p, 'cols')', respectively.

Thank you, I will look into those.

>Similarly for the 1- and Inf- norm.
>Using the latter will probably be faster (avoids temporary matrices)
>and will also be robust w.r.t. overflow (i.e. the 20-norm of numbers
>of order 1e20 won't be Inf).

Sorry, I do not follow you here.  What are the 1- and Inf- norms, and
what is their relationship with pdist?

>This is going to be a feature of 3.2.0, so if you're fine with pdist
>depending on 3.2.0, I think you may exploit it.

Maybe I can write commented code, to be uncommented out later on, when
3.2 becomes widespread.

Appending the relevant code again, for reference

>> function y = pdist (x, distfun, varargin)
>>
>>  if (nargin < 1)
>>    print_usage ();
>>  elseif ((nargin > 1)
>>          && ! ischar (distfun)
>>          && ! isa (distfun, "function_handle"))
>>    error (["pdist: the distance function must be either a string or a "
>>            "function handle."]);
>>  endif
>>
>>  if (nargin < 2)
>>    distfun = "euclidean";
>>  endif
>>
>>  if (! ismatrix (x) || isempty (x))
>>    error ("pdist: x must be a nonempty matrix");
>>  elseif (length (size (x)) > 2)
>>    error ("pdist: x must be 1 or 2 dimensional");
>>  endif
>>
>>  if (ischar (distfun))
>>    order = nchoosek(1:rows(x),2);
>>    Xi = order(:,1);
>>    Yi = order(:,2);
>>    X = x';
>>    y = feval (["pdist_" distfun], x', Xi, Yi, varargin{:});
>>    switch (distfun)
>>      case "euclidean"
>>        diff = X(:,Xi) - X(:,Yi);
>>        d = sqrt (sumsq (diff));
>>        return
>>      case "seuclidean"
>>        diff = X(:,Xi) - X(:,Yi);
>>        weights = inv (diag (var (X')));
>>        d = sqrt (sum ((weights * diff) .* diff));
>>        return
>>      case "mahalanobis"
>>        diff = X(:,Xi) - X(:,Yi);
>>        weights = inv (cov (X'));
>>        d = sqrt (sum ((weights * diff) .* diff));
>>        return
>>      case "cityblock"
>>        diff = X(:,Xi) - X(:,Yi);
>>        d = sum (abs (diff));
>>        return
>>      case "minkowski"
>>        diff = X(:,Xi) - X(:,Yi);
>>        if (nargin > 2)
>>          p = varargin{1};
>>          d = (sum ((abs (diff)).^p)).^(1/p);
>>        else
>>          d = sqrt (sumsq (diff)); # default p=2
>>        endif
>>        return
>>      case "cosine"
>>        prod = X(:,Xi) .* X(:,Yi);
>>        weights = sumsq (X(:,Xi)) .* sumsq (X(:,Yi));
>>        d = 1 - sum (prod) ./ sqrt (weights);
>>        return
>>      case "correlation"
>>        corr = cor (X);
>>        d = 1 - corr (sub2ind (size (corr), Xi, Yi))';
>>        return
>>      case "spearman"
>>        corr = spearman (X);
>>        d = 1 - corr (sub2ind (size (corr), Xi, Yi))';
>>        return
>>      case "hamming"
>>        diff = logical (X(:,Xi) - X(:,Yi));
>>        d = sum (diff) / rows (X);
>>        return
>>      case "jaccard"
>>        diff = logical (X(:,Xi) - X(:,Yi));
>>        weights = X(:,Xi) | X(:,Yi);
>>        d = sum (diff & weights) ./ sum (weights);
>>        return
>>      case "chebychev"
>>        diff = X(:,Xi) - X(:,Yi);
>>        d = max (abs (diff));
>>        return
>>    endswitch
>>  endif
>>
>>  ## Distfun is a function handle or the name of an external function
>>  l = rows (x);
>>  y = zeros (1, nchoosek (l, 2))
>>  idx = 1;
>>  for ii = 1:l-1
>>    for jj = ii+1:l
>>      y(idx++) = feval (distfun, x(ii,:), x, varargin{:})(jj);
>>    endfor
>>  endfor
>>
>> endfunction

-- 
Francesco Potortì (ricercatore)        Voice: +39 050 315 3058 (op.2111)
ISTI - Area della ricerca CNR          Fax:   +39 050 315 2040
via G. Moruzzi 1, I-56124 Pisa         Email: [EMAIL PROTECTED]
(entrance 20, 1st floor, room C71)     Web:   http://fly.isti.cnr.it/


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] pdist function

Reply via email to