2009/12/14 Eugeniy Mikhailov <[email protected]>:
>> > But here is a problem, suppose you have quite large input set (let's
>> > say size of N), with which you have to evaluate a function in the
>> > loop (let's say M times). Now you need to reserve N*M amount of
>> > memory. While if parfor loop you would need no more then M*(number
>> > of cpu/cores). In other words 'parcellfun' seems to be memory
>> > hungry.
>>
>> Can you give a more specific example? Big inputs are usually generated
>> from smaller ones; so you can simply extend the parallelized part to work
>> with the smaller inputs. I see
>
> Well, unfortunately, generation of that input is quite long, mainly because
> I did not spend much time trying to vectorize the code. But globals indeed
> could be passed to parcellfun. Apparently I just had a weird typo
> somewhere.
>
The general rule is that vectorization helps if your loops have
relatively cheap bodies. If there's a big matrix multiplication or
factorization somewhere, vectorization will probably win you
little-to-nothing.
On the contrary, sometimes people really have an unnecessarily loopy
code that can be vectorized for some 20x speed-up, i.e. more than a
parallelization can typically offer.
There are other advantages to vectorization, not often mentioned. As
soon as you master it somewhat, you'll find out that a fully
vectorized code is extremely easy to debug, even without a debugger,
because it becomes just a linear forward sequence of transformations,
which can be checked individually.
>> that unlike cellfun, parcellfun does not allow auto-expanding scalar
>> cells as arguments. I'll add that feature. But it can be worked around
>> using anonymous function; in general there is no need to duplicate
>> inputs.
>
> Could you please show me an example with anonymous function, I definitely
> unaware of this trick.
>
Suppose you have N matrices A{1} ... A{N} and you want to calculate
A{i} \ B for a given B and all i.
You can either build up a cell array of copies of B
cellfun (@mldivide, A, {B}(ones (1, N)))
or encapsulate B in an anonymous function
cellfun (@(X) X \ B, A)
and equivalently for parcellfun. Note that the expression {B}(ones (1,
N)), although it creates N copies of B, is not at all inefficient;
Octave uses shallow copying where possible, so that {B}(ones (1, N))
will only occupy the memory for B plus about 8*N bytes or so.
Cellfun also allows you to do
cellfun (@mldivide, A, {B})
for performance reasons (handles to built-in functions are
significantly more efficient than anonymous functions).
parcellfun currently doesn't have this feature, so I think I'll add it.
>>
>> > Unless I miss something, I do not see a way to pass a global
>> > variable to the 'parcellfun' at least it seems to fail at this stage, also
>> > 'evalin' does not work as well, probably for the same reason.
>> >
>>
>> Surely you can use a global variable in the function being evaluated.
>> At least it should work; if you found a bug, please submit an example.
> As I said above, it is a bug in my test code. My apology for the
> unchecked results.
>
>> The whole Octave is memory hungry, so if you're short of memory, Octave
>> may be problematic in general. Just for the record, for intensive
>> computations I use Octave on a machine with 8 CPUs and 16GB RAM, and I
>> don't think I ever exceeded 2GB.
>
> My coding machine is much more humble :) I have just 512 MB of RAM. So we
> have slightly different definition of memory hungry. Before I find how to
> put globals to parcellfun, I had to copy quite big matrix for every cell,
> which first of all eats cpu cycles and secondly memory. Now everything
> seems to be fine.
>
Maybe you did something wrong? As I said, copying a 100MB matrix to
1000 cells should eat only about 8kb of memory. The copies share the
data until a physical copy is needed.
If you show the code (or relevant parts), maybe we'll be able to help.
>
> At our cluster which I yet to learn how to use, they have quite a mix of
> hardware, so I do not know in advance which machine will execute the code.
>
> Matlabs parfor still has some appealing side, it seems to know about local
> cpu/cores and can also execute a code in a cluster enviroment on remote
> cpus (for the extra money though). But in worst case scenario it fall back
> and behave just like a normal 'for'. But it seems like 'parcellfun' could
> spread on mosix cluster without extra work as well.
>
Note that parcellfun uses fork()ing, so normally it will be only able
to utilize the CPUs (cores) of a single node, unless your cluster is
equipped with a special software that allows migration of processes
amongst nodes (I've heard some clusters can do this, but I've never
seen it). This is ideal for our cluster where we have 4- and 8-CPU
nodes and typically a person reserves CPUs on just a single node. But
for clusters with many single-CPU nodes (and a fast network),
parcellfun is just useless.
For more general parallelism, there's either the parallel package, or
a very recent (and under development) openMPI package. But then
parallelization is no longer a drop-in replacement of functions.
>
> May be it make sense for 'cellfun' to call 'parcellfun' if some global
> switch is toggled by user. Of cause once it is argument compatible (i.e.
> capable of reiterating scalars as cellfun).
>
No, this is out of question in any near future. The main reason is
that cellfun can not make assumptions about the complexity of the
function being evaluated; for "cheap" functions, cellfun will
significantly outperform parcellfun, because of the overhead of the
parallel setup and communication. It's only expensive functions where
parcellfun pays off, but the function itself simply cannot tell.
Surely there can be an option for that, but then you can as well have
two functions (esp. given that their implementation is very
different).
But in your own code, you can easily achieve the trick yourself by
adding something like the following to the front of the script:
## uncomment the following line to run in parallel, set the number of CPUs
## ncpus = 8; cellfun = @(varargin) parcellfun (ncpus, varargin{:});
note that this will alter *all* cellfun calls; in a more complicated
code, you may instead want to be picky and just parallelize some.
best regards
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev