Re: [OctDev] easy parallelism was Re: optim/leasqr.m style changes, diff

Jaroslav Hajek Tue, 15 Dec 2009 21:51:41 -0800

2009/12/16 Eugeniy Mikhailov <[email protected]>:
>>
>> The general rule is that vectorization helps if your loops have
>> relatively cheap bodies. If there's a big matrix multiplication or
>> factorization somewhere, vectorization will probably win you
>> little-to-nothing.
>>
>> On the contrary, sometimes people really have an unnecessarily loopy
>> code that can be vectorized for some 20x speed-up, i.e. more than a
>> parallelization can typically offer.
> I am one of those unfortunate people. Thanks for advise, I will try to
> steak to these guidelines.
>
>> > Could you please  show me an example with anonymous  function, I definitely
>> > unaware of this trick.
>> >
>>
>> Suppose you have N matrices A{1} ... A{N} and you want to calculate
>> A{i} \ B for a given B and all i.
>>
>> You can either build up a cell array of copies of B
>> cellfun (@mldivide, A, {B}(ones (1, N)))
>>
>> or encapsulate B in an anonymous function
>>
>> cellfun (@(X) X \ B, A)
> Thanks again, I ws anaware of above tricks.
>>
>> and equivalently for parcellfun. Note that the expression {B}(ones (1,
>> N)), although it creates N copies of B, is not at all inefficient;
>> Octave uses shallow copying where possible, so that {B}(ones (1, N))
>> will only occupy the memory for B plus about 8*N bytes or so.
> And yet again you proved me wrong. I should keep my mouth shut.
> I was not aware about shallow copying (is it even in the docs?), thus
> my nonsense without a fact check with a code.
>
> Sizeof  fulled me, it clearly shows that size is growing, but
> output of the top show that size  grows remain about the same except the
> small overhead which you mentioned.
>


Yes, that is a bit unfortunate. sizeof effectively shows the maximum
size that a variable would have if there was no sharing of data, and
ignores some internal stuff. It would be nice if the actual size was
shown, but that would probably be a fairly complex task, because there
are basically two levels of data sharing (the octave_value class and
the Array class), and also some internal caching mechanism (index
cache, cellstr cache, diagonal & permutation -> full matrix cache).
Too much work for too little benefit, I'd say.

>>
>> Cellfun also allows you to do
>>
>> cellfun (@mldivide, A, {B})
>>
>> for performance reasons (handles to built-in functions are
>> significantly more efficient than anonymous functions).
>> parcellfun currently doesn't have this feature, so I think I'll add it.
> That would be probably great for similarity of use with cellfun, but you
> showed me the workaround for now.
>

OK.

>> > At our cluster  which I yet to learn  how to use, they have quite  a mix of
>> > hardware, so I do not know in advance which machine will execute the code.
>> >
>> > Matlabs parfor still has some appealing  side, it seems to know about local
>> > cpu/cores and  can also execute  a code in  a cluster enviroment  on remote
>> > cpus (for the extra money though). But  in worst case scenario it fall back
>> > and behave just  like a normal 'for'. But it  seems like 'parcellfun' could
>> > spread on mosix cluster without extra work as well.
>> >
>>
>> Note that parcellfun uses fork()ing, so normally it will be only able
>> to utilize the CPUs (cores) of a single node, unless your cluster is
>> equipped with a special software that allows migration of processes
>> amongst nodes (I've heard some clusters can do this, but I've never
>> seen it). This is ideal for our cluster where we have 4- and 8-CPU
>> nodes and typically a person reserves CPUs on just a single node. But
>> for clusters with many single-CPU nodes (and a fast network),
>> parcellfun is just useless.
> While ago I played with homemade mosix cluster, when applying their
> kernel was straight forward or maybe I used mosix enabled kernel from
> Debian. It handled fork transparently over network (as long as your
> machines are the same architecture). May be I should resurrect this
> activity.
>

That sounds cool. I need to read something about Mosix.

>>
>> For more general parallelism, there's either the parallel package, or
>> a very recent (and under development) openMPI package. But then
>> parallelization is no longer a drop-in replacement of functions.
> this I would call 'hard parallelism' as opposite to the parcellfun use.
>
>>
>> >
>> > May be  it make  sense for  'cellfun' to call  'parcellfun' if  some global
>> > switch is  toggled by user. Of  cause once it is  argument compatible (i.e.
>> > capable of reiterating scalars as cellfun).
>> >
>>
>> No, this is out of question in any near future. The main reason is
>> that cellfun can not make assumptions about the complexity of the
>> function being evaluated; for "cheap" functions, cellfun will
>> significantly outperform parcellfun, because of the overhead of the
>> parallel setup and communication. It's only expensive functions where
>> parcellfun pays off, but the function itself simply cannot tell.
>> Surely there can be an option for that, but then you can as well have
>> two functions (esp. given that their implementation is very
>> different).
> I see. You are right again.
>>
>> But in your own code, you can easily achieve the trick yourself by
>> adding something like the following to the front of the script:
>>
>> ## uncomment the following line to run in parallel, set the number of CPUs
>> ## ncpus = 8; cellfun = @(varargin) parcellfun (ncpus, varargin{:});
> I guess I should learn how to use anonymous functions. Thanks for
> recipe.
>

If you use Octave for serious work, then you definitely should. It's a
very powerful mechanism. Matlab also seems to encourage it where
possible.

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] easy parallelism was Re: optim/leasqr.m style changes, diff

Reply via email to