Re: [OctDev] easy parallelism was Re: optim/leasqr.m style changes, diff

Eugeniy Mikhailov Tue, 15 Dec 2009 19:44:21 -0800

> 
> The general rule is that vectorization helps if your loops have
> relatively cheap bodies. If there's a big matrix multiplication or
> factorization somewhere, vectorization will probably win you
> little-to-nothing.
> 
> On the contrary, sometimes people really have an unnecessarily loopy
> code that can be vectorized for some 20x speed-up, i.e. more than a
> parallelization can typically offer.
I am one of those unfortunate people. Thanks for advise, I will try to
steak to these guidelines.


> > Could you please  show me an example with anonymous  function, I definitely
> > unaware of this trick.
> >
> 
> Suppose you have N matrices A{1} ... A{N} and you want to calculate
> A{i} \ B for a given B and all i.
> 
> You can either build up a cell array of copies of B
> cellfun (@mldivide, A, {B}(ones (1, N)))
> 
> or encapsulate B in an anonymous function
> 
> cellfun (@(X) X \ B, A)
Thanks again, I ws anaware of above tricks.
> 
> and equivalently for parcellfun. Note that the expression {B}(ones (1,
> N)), although it creates N copies of B, is not at all inefficient;
> Octave uses shallow copying where possible, so that {B}(ones (1, N))
> will only occupy the memory for B plus about 8*N bytes or so.
And yet again you proved me wrong. I should keep my mouth shut. 
I was not aware about shallow copying (is it even in the docs?), thus 
my nonsense without a fact check with a code.

Sizeof  fulled me, it clearly shows that size is growing, but
output of the top show that size  grows remain about the same except the
small overhead which you mentioned.

> 
> Cellfun also allows you to do
> 
> cellfun (@mldivide, A, {B})
> 
> for performance reasons (handles to built-in functions are
> significantly more efficient than anonymous functions).
> parcellfun currently doesn't have this feature, so I think I'll add it.
That would be probably great for similarity of use with cellfun, but you
showed me the workaround for now.

> > At our cluster  which I yet to learn  how to use, they have quite  a mix of
> > hardware, so I do not know in advance which machine will execute the code.
> >
> > Matlabs parfor still has some appealing  side, it seems to know about local
> > cpu/cores and  can also execute  a code in  a cluster enviroment  on remote
> > cpus (for the extra money though). But  in worst case scenario it fall back
> > and behave just  like a normal 'for'. But it  seems like 'parcellfun' could
> > spread on mosix cluster without extra work as well.
> >
> 
> Note that parcellfun uses fork()ing, so normally it will be only able
> to utilize the CPUs (cores) of a single node, unless your cluster is
> equipped with a special software that allows migration of processes
> amongst nodes (I've heard some clusters can do this, but I've never
> seen it). This is ideal for our cluster where we have 4- and 8-CPU
> nodes and typically a person reserves CPUs on just a single node. But
> for clusters with many single-CPU nodes (and a fast network),
> parcellfun is just useless.
While ago I played with homemade mosix cluster, when applying their
kernel was straight forward or maybe I used mosix enabled kernel from
Debian. It handled fork transparently over network (as long as your
machines are the same architecture). May be I should resurrect this
activity.

> 
> For more general parallelism, there's either the parallel package, or
> a very recent (and under development) openMPI package. But then
> parallelization is no longer a drop-in replacement of functions.
this I would call 'hard parallelism' as opposite to the parcellfun use.

> 
> >
> > May be  it make  sense for  'cellfun' to call  'parcellfun' if  some global
> > switch is  toggled by user. Of  cause once it is  argument compatible (i.e.
> > capable of reiterating scalars as cellfun).
> >
> 
> No, this is out of question in any near future. The main reason is
> that cellfun can not make assumptions about the complexity of the
> function being evaluated; for "cheap" functions, cellfun will
> significantly outperform parcellfun, because of the overhead of the
> parallel setup and communication. It's only expensive functions where
> parcellfun pays off, but the function itself simply cannot tell.
> Surely there can be an option for that, but then you can as well have
> two functions (esp. given that their implementation is very
> different).
I see. You are right again.
> 
> But in your own code, you can easily achieve the trick yourself by
> adding something like the following to the front of the script:
> 
> ## uncomment the following line to run in parallel, set the number of CPUs
> ## ncpus = 8; cellfun = @(varargin) parcellfun (ncpus, varargin{:});
I guess I should learn how to use anonymous functions. Thanks for
recipe.

-- 
Eugeniy E. Mikhailov


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] easy parallelism was Re: optim/leasqr.m style changes, diff

Reply via email to