>
> The general rule is that vectorization helps if your loops have
> relatively cheap bodies. If there's a big matrix multiplication or
> factorization somewhere, vectorization will probably win you
> little-to-nothing.
>
> On the contrary, sometimes people really have an unnecessarily loopy
> code that can be vectorized for some 20x speed-up, i.e. more than a
> parallelization can typically offer.
I am one of those unfortunate people. Thanks for advise, I will try to
steak to these guidelines.
> > Could you please show me an example with anonymous function, I definitely
> > unaware of this trick.
> >
>
> Suppose you have N matrices A{1} ... A{N} and you want to calculate
> A{i} \ B for a given B and all i.
>
> You can either build up a cell array of copies of B
> cellfun (@mldivide, A, {B}(ones (1, N)))
>
> or encapsulate B in an anonymous function
>
> cellfun (@(X) X \ B, A)
Thanks again, I ws anaware of above tricks.
>
> and equivalently for parcellfun. Note that the expression {B}(ones (1,
> N)), although it creates N copies of B, is not at all inefficient;
> Octave uses shallow copying where possible, so that {B}(ones (1, N))
> will only occupy the memory for B plus about 8*N bytes or so.
And yet again you proved me wrong. I should keep my mouth shut.
I was not aware about shallow copying (is it even in the docs?), thus
my nonsense without a fact check with a code.
Sizeof fulled me, it clearly shows that size is growing, but
output of the top show that size grows remain about the same except the
small overhead which you mentioned.
>
> Cellfun also allows you to do
>
> cellfun (@mldivide, A, {B})
>
> for performance reasons (handles to built-in functions are
> significantly more efficient than anonymous functions).
> parcellfun currently doesn't have this feature, so I think I'll add it.
That would be probably great for similarity of use with cellfun, but you
showed me the workaround for now.
> > At our cluster which I yet to learn how to use, they have quite a mix of
> > hardware, so I do not know in advance which machine will execute the code.
> >
> > Matlabs parfor still has some appealing side, it seems to know about local
> > cpu/cores and can also execute a code in a cluster enviroment on remote
> > cpus (for the extra money though). But in worst case scenario it fall back
> > and behave just like a normal 'for'. But it seems like 'parcellfun' could
> > spread on mosix cluster without extra work as well.
> >
>
> Note that parcellfun uses fork()ing, so normally it will be only able
> to utilize the CPUs (cores) of a single node, unless your cluster is
> equipped with a special software that allows migration of processes
> amongst nodes (I've heard some clusters can do this, but I've never
> seen it). This is ideal for our cluster where we have 4- and 8-CPU
> nodes and typically a person reserves CPUs on just a single node. But
> for clusters with many single-CPU nodes (and a fast network),
> parcellfun is just useless.
While ago I played with homemade mosix cluster, when applying their
kernel was straight forward or maybe I used mosix enabled kernel from
Debian. It handled fork transparently over network (as long as your
machines are the same architecture). May be I should resurrect this
activity.
>
> For more general parallelism, there's either the parallel package, or
> a very recent (and under development) openMPI package. But then
> parallelization is no longer a drop-in replacement of functions.
this I would call 'hard parallelism' as opposite to the parcellfun use.
>
> >
> > May be it make sense for 'cellfun' to call 'parcellfun' if some global
> > switch is toggled by user. Of cause once it is argument compatible (i.e.
> > capable of reiterating scalars as cellfun).
> >
>
> No, this is out of question in any near future. The main reason is
> that cellfun can not make assumptions about the complexity of the
> function being evaluated; for "cheap" functions, cellfun will
> significantly outperform parcellfun, because of the overhead of the
> parallel setup and communication. It's only expensive functions where
> parcellfun pays off, but the function itself simply cannot tell.
> Surely there can be an option for that, but then you can as well have
> two functions (esp. given that their implementation is very
> different).
I see. You are right again.
>
> But in your own code, you can easily achieve the trick yourself by
> adding something like the following to the front of the script:
>
> ## uncomment the following line to run in parallel, set the number of CPUs
> ## ncpus = 8; cellfun = @(varargin) parcellfun (ncpus, varargin{:});
I guess I should learn how to use anonymous functions. Thanks for
recipe.
--
Eugeniy E. Mikhailov
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev