Hi Arno

see the forwarded message from the octave-help mailing list. Søren has
the following function that he modified to be matlab compatible. Do
you think it could abe added to the statistics package?

Carnë

---------- Forwarded message ----------
From: Søren Hauberg <so...@hauberg.org>
Date: 9 December 2011 08:26
Subject: Re: K means.
To: Carnë Draug <carandraug+...@gmail.com>
Cc: Jordi Gutiérrez Hermoso <jord...@octave.org>, h...@octave.org,
Prachi Jain <prachijain...@gmail.com>


fre, 09 12 2011 kl. 08:00 +0100, skrev Søren Hauberg:
> My current version works like
>
>   clusters = kmeans (data, initial_clusters);
>
> whereas Matlab's work like
>
>   clusters = kmeans (data, number_of_clusters);
>
> I think we should at least match this before putting it into the
> statistics package. But otherwise, I see no problems with not having all
> the options available that Matlab has.

The attached version has a more compatible API. It's not perfect, but it
seems to work fairly well.

Søren
## Copyright (C) 2011 Soren Hauberg
## 
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program; if not, write to the Free Software
## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

function [classes, centers] = kmeans (data, k, varargin)
  ## Input checking
  if (!ismatrix (data) || !isreal (data))
    error ("kmeans: first input argument must be a DxN real data matrix");
  endif
  if (!isscalar (k))
    error ("kmeans: second input argument must be a scalar");
  endif
  
  [N, D] = size (data);
  
  ## (so far) Harcoded options
  maxiter = Inf;
  start = "sample";
  
  ## Find initial clusters
  switch (lower (start))
    case "sample"
      idx = randperm (N) (1:k);
      centers = data (idx, :);
    otherwise
      error ("kmeans: unsupported initial clustering parameter");
  endswitch
  
  ## Run the algorithm
  D = zeros (N, k);
  iterations = 0;
  prevcenters = centers;
  while (true)
    ## Compute distances
    for i = 1:k
      D (:, i) = sum (( data - repmat (centers (i, :), N, 1)).^2, 2);
    endfor
    
    ## Classify
    [~, classes] = min (D, [], 2);
    
    ## Recompute centers
    for i = 1:k
      centers (i, :) = mean (data (classes == i, :));
    endfor
    
    ## Check for convergence
    iterations++;
    if (all (centers (:) == prevcenters (:)) || iterations >= maxiter)
      break;
    endif
    prevcenters = centers;
  endwhile
endfunction

%!demo
%! ## Generate a two-cluster problem
%! C1 = randn (100, 2) + 1;
%! C2 = randn (100, 2) - 1;
%! data = [C1; C2];
%!
%! ## Perform clustering
%! [idx, centers] = kmeans (data, 2);
%!
%! ## Plot the result
%! figure
%! plot (data (idx==1, 1), data (idx==1, 2), 'ro');
%! hold on
%! plot (data (idx==2, 1), data (idx==2, 2), 'bs');
%! plot (centers (:, 1), centers (:, 2), 'kv', 'markersize', 10);
%! hold off

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to