Hi,

To evaluate a same functions on parallel processes, there are
parcellfun, which fork an exec function on forks. But I have some
problems with parcellfun. With some exemple, when I want to do many
short computations, that this methods spend more time in I/O than
computation. See below :

> octave:1> A=reshape(1:1024^2,1024,1024);
> octave:2> B=reshape(1024^2:-1:1,1024,1024);
> octave:3> cA=num2cell(A);
> octave:4> cB=num2cell(B);
> octave:5> testfun=@(x,y) exp(-x)*exp(-y);

> octave:6> tic; a=cellfun(testfun,cA,cB); toc
> Elapsed time is 13.2478 seconds.

> octave:7> tic; b=parcellfun(5,testfun,cA,cB); toc
> parcellfun: 1048576/1048576 jobs done
> Elapsed time is 159.641 seconds.


So, we can see that parcellfun is inefficient for doing this task.

My idea, is create meta-jobs which contains many jobs,
and exec meta-jobs with parcellfun. A meta-job is the exec of many jobs
by a traditional cellfun.

So I create a function pcellfun which use parcellfun and cellfun which
is efficient with many very little jobs, see below with the same
example:

> octave:8> tic; c=pcellfun(5,testfun,cA,cB); toc
> parcellfun: 500/500 jobs done
> Elapsed time is 5.3813 seconds.

When parcellfun is efficient (with bigs jobs), pcellfun have the same
performance than parcellfun, see :

> octave:1> Mats=rand(1299,300);
> octave:2> cI=num2cell(1:1000);
> octave:3> testfun=@(k) max(eig(Mats(k:k+299,:)));

> octave:4> tic; a=cellfun(testfun,cI); toc
> Elapsed time is 162.455 seconds.

> octave:5> tic; b=parcellfun(5,testfun,cI); toc
> parcellfun: 1000/1000 jobs done
> Elapsed time is 53.2233 seconds.

> octave:6> tic; c=pcellfun(5,testfun,cI); toc
> parcellfun: 500/500 jobs done
> Elapsed time is 52.9729 seconds.

And pcellfun support multiple inputs, multiple ouputs, 'UniformOutput',
'ErrorHandler' as cellfun and parcellfun.

In my pcellfun function I copy some code from parcellfun, so I keep the
copyright.

To test functions I use octave 3.2.4 on a amd64 X4.

Do you think this function is usefull in octave-forge ?

-- 
Jean-Benoist Leger
## Copyright (C) 2009 VZLU Prague, a.s., Czech Republic, Jaroslav Hajek
## Copyright (C) 2010 Jean-Benoist Leger <[email protected]>
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
## 
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
## 
## You should have received a copy of the GNU General Public License
## along with this program; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn{Function File} [...@var{o1}, @var{o2}, @dots{}] = pcellfun 
(@var{nproc}, @var{fun}, @var{a1}, @var{a2}, @dots{})
## @deftypefnx{Function File} pcellfun ([...@var{nproc}, @var{njobbyproc}], 
fun, @dots{})
## @deftypefnx{Function File} pcellfun (nproc, fun, @dots{}, "UniformOutput", 
@var{val})
## @deftypefnx{Function File} pcellfun (nproc, fun, @dots{}, "ErrorHandler", 
@var{errfunc})
## Evaluates a function for multiple argument sets using multiple processes.
## @var{nproc} should specify the number of processes. A maximum recommended 
value is
## equal to number of CPUs on your machine or one less. 
## @var{njobbyproc} should specify the number of jobs created by process. 
Default to 100.
## @var{fun} is a function handle pointing to the requested evaluating function.
## @var{a1}, @var{a2} etc. should be cell arrays of equal size.
## @var{o1}, @var{o2} etc. will be set to corresponding output arguments.
##
## The UniformOutput and ErrorHandler options are supported with meaning 
identical
## to @dfn{cellfun}.

function varargout=pcellfun (params_parallel, fun, varargin)

  if (numel (params_parallel) == 1)
      nproc = params_parallel;
      nblocs = nproc*100;
  elseif (numel (params_parallel) == 2)
      nproc = params_parallel(1);
      nblocs = nproc*params_parallel(2);
  else
      print_usage();
  endif

  if (nargin < 3 || nproc <= 0 || ! isscalar (nproc))
    print_usage ();
  endif

  if (ischar (fun))
    fun = str2func (fun);
  elseif (! isa (fun, "function_handle"))
    error ("pcellfun: fun must be either a function handle or name")
  endif

  uniform_output = true;
  error_handler = [];

  args = varargin;
  nargs = length (varargin);

  ## parse options
  if (nargs > 1)
    do
      if (strcmp (args{nargs-1}, "UniformOutput"))
        uniform_output = args{nargs};
        nargs -= 2;
        continue;
      endif
      if (strcmp (args{nargs-1}, "ErrorHandler"))
        error_handler = args{nargs};
        nargs -= 2;
        continue;
      endif
      break;
    until (nargs < 2);
  endif

  args = args(1:nargs);

  if (length (args) == 0)
    print_usage ();
  elseif (length (args) > 1 && ! size_equal (args{:}))
    error ("arguments size must match");
  endif

  ## We make the mask
  N = numel (args{1});
  len_bloc = ceil (N/nblocs);
  mask = [len_bloc*ones(1,nblocs-1)  N-len_bloc*(nblocs-1)];
  
  ## Somes problems with low N values...
  while (mask(end) <= 0)
    mask(end-1)+=mask(end);
    mask=mask(1:end-1);
  endwhile
  nblocs = numel (mask);

  ## We makes blocs of indexs
  blocs = mat2cell ((1:N)', mask);

  ## arguments of a bloc
  part_arg = @(bloc) cellfun(@(arg) arg(bloc),args,'UniformOutput', false);

  ## function executed for a bloc
  if (isempty (error_handler))
    group_fun = @(bloc) cellfun( fun, part_arg(bloc){:}, 'UniformOutput', 
false);
  else
    group_fun = @(bloc) cellfun( fun, part_arg(bloc){:}, 'UniformOutput', 
false, 'ErrorHandler', error_handler);
  endif

  ## preparing output
  out_brut = cell (1, nargout);

  ## main
  [out_brut{:}] = parcellfun (nproc, group_fun, blocs, 'UniformOutput', false);

  varargout = cell (1, nargout);
  for iargout = 1:nargout
    
    out_cat = cell(N,1);
    for iblocs = 1:nblocs
      out_cat(sum(mask(1:iblocs-1))+(1:mask(iblocs))) = 
out_brut{iargout}{iblocs}(:);
    endfor

    varargout{iargout} = reshape(out_cat,size(args{1}));
    
    if (uniform_output)
      varargout{iargout} = cell2mat (varargout{iargout});
    endif

  endfor
      


endfunction

Attachment: signature.asc
Description: Digital signature

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to