Hi, To evaluate a same functions on parallel processes, there are parcellfun, which fork an exec function on forks. But I have some problems with parcellfun. With some exemple, when I want to do many short computations, that this methods spend more time in I/O than computation. See below :
> octave:1> A=reshape(1:1024^2,1024,1024); > octave:2> B=reshape(1024^2:-1:1,1024,1024); > octave:3> cA=num2cell(A); > octave:4> cB=num2cell(B); > octave:5> testfun=@(x,y) exp(-x)*exp(-y); > octave:6> tic; a=cellfun(testfun,cA,cB); toc > Elapsed time is 13.2478 seconds. > octave:7> tic; b=parcellfun(5,testfun,cA,cB); toc > parcellfun: 1048576/1048576 jobs done > Elapsed time is 159.641 seconds. So, we can see that parcellfun is inefficient for doing this task. My idea, is create meta-jobs which contains many jobs, and exec meta-jobs with parcellfun. A meta-job is the exec of many jobs by a traditional cellfun. So I create a function pcellfun which use parcellfun and cellfun which is efficient with many very little jobs, see below with the same example: > octave:8> tic; c=pcellfun(5,testfun,cA,cB); toc > parcellfun: 500/500 jobs done > Elapsed time is 5.3813 seconds. When parcellfun is efficient (with bigs jobs), pcellfun have the same performance than parcellfun, see : > octave:1> Mats=rand(1299,300); > octave:2> cI=num2cell(1:1000); > octave:3> testfun=@(k) max(eig(Mats(k:k+299,:))); > octave:4> tic; a=cellfun(testfun,cI); toc > Elapsed time is 162.455 seconds. > octave:5> tic; b=parcellfun(5,testfun,cI); toc > parcellfun: 1000/1000 jobs done > Elapsed time is 53.2233 seconds. > octave:6> tic; c=pcellfun(5,testfun,cI); toc > parcellfun: 500/500 jobs done > Elapsed time is 52.9729 seconds. And pcellfun support multiple inputs, multiple ouputs, 'UniformOutput', 'ErrorHandler' as cellfun and parcellfun. In my pcellfun function I copy some code from parcellfun, so I keep the copyright. To test functions I use octave 3.2.4 on a amd64 X4. Do you think this function is usefull in octave-forge ? -- Jean-Benoist Leger
## Copyright (C) 2009 VZLU Prague, a.s., Czech Republic, Jaroslav Hajek ## Copyright (C) 2010 Jean-Benoist Leger <[email protected]> ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 3 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program; see the file COPYING. If not, see ## <http://www.gnu.org/licenses/>. ## -*- texinfo -*- ## @deftypefn{Function File} [...@var{o1}, @var{o2}, @dots{}] = pcellfun (@var{nproc}, @var{fun}, @var{a1}, @var{a2}, @dots{}) ## @deftypefnx{Function File} pcellfun ([...@var{nproc}, @var{njobbyproc}], fun, @dots{}) ## @deftypefnx{Function File} pcellfun (nproc, fun, @dots{}, "UniformOutput", @var{val}) ## @deftypefnx{Function File} pcellfun (nproc, fun, @dots{}, "ErrorHandler", @var{errfunc}) ## Evaluates a function for multiple argument sets using multiple processes. ## @var{nproc} should specify the number of processes. A maximum recommended value is ## equal to number of CPUs on your machine or one less. ## @var{njobbyproc} should specify the number of jobs created by process. Default to 100. ## @var{fun} is a function handle pointing to the requested evaluating function. ## @var{a1}, @var{a2} etc. should be cell arrays of equal size. ## @var{o1}, @var{o2} etc. will be set to corresponding output arguments. ## ## The UniformOutput and ErrorHandler options are supported with meaning identical ## to @dfn{cellfun}. function varargout=pcellfun (params_parallel, fun, varargin) if (numel (params_parallel) == 1) nproc = params_parallel; nblocs = nproc*100; elseif (numel (params_parallel) == 2) nproc = params_parallel(1); nblocs = nproc*params_parallel(2); else print_usage(); endif if (nargin < 3 || nproc <= 0 || ! isscalar (nproc)) print_usage (); endif if (ischar (fun)) fun = str2func (fun); elseif (! isa (fun, "function_handle")) error ("pcellfun: fun must be either a function handle or name") endif uniform_output = true; error_handler = []; args = varargin; nargs = length (varargin); ## parse options if (nargs > 1) do if (strcmp (args{nargs-1}, "UniformOutput")) uniform_output = args{nargs}; nargs -= 2; continue; endif if (strcmp (args{nargs-1}, "ErrorHandler")) error_handler = args{nargs}; nargs -= 2; continue; endif break; until (nargs < 2); endif args = args(1:nargs); if (length (args) == 0) print_usage (); elseif (length (args) > 1 && ! size_equal (args{:})) error ("arguments size must match"); endif ## We make the mask N = numel (args{1}); len_bloc = ceil (N/nblocs); mask = [len_bloc*ones(1,nblocs-1) N-len_bloc*(nblocs-1)]; ## Somes problems with low N values... while (mask(end) <= 0) mask(end-1)+=mask(end); mask=mask(1:end-1); endwhile nblocs = numel (mask); ## We makes blocs of indexs blocs = mat2cell ((1:N)', mask); ## arguments of a bloc part_arg = @(bloc) cellfun(@(arg) arg(bloc),args,'UniformOutput', false); ## function executed for a bloc if (isempty (error_handler)) group_fun = @(bloc) cellfun( fun, part_arg(bloc){:}, 'UniformOutput', false); else group_fun = @(bloc) cellfun( fun, part_arg(bloc){:}, 'UniformOutput', false, 'ErrorHandler', error_handler); endif ## preparing output out_brut = cell (1, nargout); ## main [out_brut{:}] = parcellfun (nproc, group_fun, blocs, 'UniformOutput', false); varargout = cell (1, nargout); for iargout = 1:nargout out_cat = cell(N,1); for iblocs = 1:nblocs out_cat(sum(mask(1:iblocs-1))+(1:mask(iblocs))) = out_brut{iargout}{iblocs}(:); endfor varargout{iargout} = reshape(out_cat,size(args{1})); if (uniform_output) varargout{iargout} = cell2mat (varargout{iargout}); endif endfor endfunction
signature.asc
Description: Digital signature
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev
_______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
