Re: [Pdl-porters] [Perldl] Auto Multi-Core Support Added to Next Development Release

Chris Marshall Wed, 18 May 2011 04:06:07 -0700

Hi John-

I am unable to compile pdlmagic.c due to some declaration
inconsistencies:

gcc-4 -c   -DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -g3 -fno-strict-aliasing -pipe -fstack-protector 
-I/usr/local/include -DUSEIMPORTLIB -O3   -DVERSION=\"2.4.9_002\" 
-DXS_VERSION=\"2.4.9_002\"  "-I/usr/lib/perl5/5.10/i686-cygwin/CORE"   pdlmagic.c
pdlmagic.c:571: error: conflicting types for ‘pdl_magic_thread_cast’
pdlmagic.h:127: error: previous declaration of ‘pdl_magic_thread_cast’ was here
pdlmagic.c:574: error: conflicting types for ‘pdl_pthread_barf’
pdlmagic.h:116: error: previous declaration of ‘pdl_pthread_barf’ was here
make[2]: *** [pdlmagic.o] Error 1
make[2]: Leaving directory `/cygdrive/c/chm/pdl/git/pdl/Basic/Core'
make[1]: *** [subdirs] Error 2
make[1]: Leaving directory `/cygdrive/c/chm/pdl/git/pdl/Basic'
make: *** [subdirs] Error 2


And here are the declarations indicated:

pdlmagic.c:571:void pdl_magic_thread_cast(pdl *it,void (*func)(pdl_trans 
*),pdl_trans *t) {}
pdlmagic.h:127:void pdl_magic_thread_cast(pdl *,void (*func)(pdl_trans 
*),pdl_trans *t, pdl_thread *thread);

pdlmagic.c:574:int pdl_pthread_barf(const char* pat, va_list *args){ return 0;};
pdlmagic.h:116:void pdl_pthread_barf(const char* pat, va_list *args);


Regards,
Chris


On 5/12/2011 9:41 AM, John Cerney wrote:

The Auto Multi-Core Support patch has been applied to the main trunk in
git.

This change adds support (currently experimental) for splitting up
numerical processing between multiple parallel processor threads (or
pthreads) using new functions "set_autopthread_targ" and
"set_autopthread_size". This can improve processing performance (by
greater than 2-4X in most cases) by taking advantage of multi-core
and/or multi-processor machines.

Currently, this feature is turned-off by default. You have to explicitly
turn it on by calling the set_autopthread_targ and set_autopthread_size
functions as described below.

Below is more information on the change (Taken from the new
ParallelCPU.pod document )

---------------------------------------------------------------------

NAME
PDL::ParallelCPU - Parallel Processor MultiThreading Support in
PDL (Experimental)

DESCRIPTION
PDL has support (currently experimental) for splitting up
numerical processing between multiple parallel processor threads
(or pthreads) using the *set_autopthread_targ* and
*set_autopthread_size* functions. This can improve processing
performance (by greater than 2-4X in most cases) by taking
advantage of multi-core and/or multi-processor machines.

SYNOPSIS
use PDL;

# Set target of 4 parallel pthreads to create, with a lower limit of
# 5Meg elements for splitting processing into parallel pthreads.
set_autopthread_targ(4);
set_autopthread_size(5);

$a = zeroes(5000,5000); # Create 25Meg element array

$b = $a + 5; # Processing will be split up into multiple pthreads

# Get the actual number of pthreads for the last
# processing operation.
$actualPthreads = get_autopthread_actual();

Terminology
The use of the term *threading* can be confusing with PDL, because
it can refer to *PDL threading*, as defined in the PDL::Threading
docs, or to *processor multi-threading*.

To reduce confusion with the existing PDL threading terminology,
this document uses pthreading to refer to *processor
multi-threading*, which is the use of multiple processor threads
to split up numerical processing into parallel operations.

Functions that control PDL PThreads
This is a brief listing and description of the PDL pthreading
functions, see the PDL::Core docs for detailed information.

set_autopthread_targ
Set the target number of processor-threads (pthreads) for
multi-threaded processing. Setting auto_pthread_targ to 0
means that no pthreading will occur.

See PDL::Core for details.

set_autopthread_size
Set the minimum size (in Meg-elements or 2**20 elements) of
the largest PDL involved in a function where auto-pthreading
will be performed. For small PDLs, it probably isn't worth
starting multiple pthreads, so this function is used to
define a minimum threshold where auto-pthreading won't be
attempted.

See PDL::Core for details.

get_autopthread_actual
Get the actual number of pthreads executed for the last pdl
processing function.

See PDL::get_autopthread_actual for details.

Global Control of PDL PThreading using Environment Variables
PDL PThreading can be globally turned on, without modifying
existing code by setting environment variables
PDL_AUTOPTHREAD_TARG and PDL_AUTOPTHREAD_SIZE before running a PDL
script. These environment variables are checked when PDL starts up
and calls to *set_autopthread_targ* and *set_autopthread_size*
functions made with the environment variable's values.

For example, if the environment var PDL_AUTOPTHREAD_TARG is set to
3, and PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will
run as if the following lines were at the top of the file:

set_autopthread_targ(3);
set_autopthread_size(10);

How It Works
The auto-pthreading process works by analyzing threaded array
dimensions in PDL operations and splitting up processing based on
the thread dimension sizes and desired number of pthreads (i.e.
the pthread target or pthread_targ). The offsets and increments
that PDL uses to step thru the data in memory are modified for
each pthread so each one sees a different set of data when
performing processing.

Example

$a = sequence(20,4,3); # Small 3-D Array, size 20,4,3

# Setup auto-pthreading:
set_autopthread_targ(2); # Target of 2 pthreads
set_autopthread_size(0); # Zero so that the small PDLs in this
# example will be pthreaded

# This will be split up into 2 pthreads
$c = maximum($a);

For the above example, the *maximum* function has a signature of
"(a(n); [o]c())", which means that the first dimension of $a (size
20) is a *Core* dimension of the *maximum* function. The other
dimensions of $a (size 4,3) are *threaded* dimensions (i.e. will
be threaded-over in the *maximum* function.

The auto-pthreading algorithm examines the threaded dims of size
(4,3) and picks the 4 dimension, since it is evenly divisible by
the autopthread_targ of 2. The processing of the maximum function
is then split into two pthreads on the size-4 dimension, with dim
indexes 0,2 processed by one pthread and dim indexes 1,3 processed
by the other pthread.

Limitations
Must have POSIX Threads Enabled
Auto-PThreading only works if your PDL installation was compiled
with POSIX threads enabled. This is normally the case if you are
running on linux, or other unix variants.

Non-Threadsafe Code
Not all the libraries that PDL intefaces to are thread-safe, i.e.
they aren't written to operate in a multi-threaded environment
without crashing or causing side-effects. Some examples in the PDL
core is the *fft* function and the *pnmout* functions.

To operate properly with these types of functions, the PPCode flag
NoPthread has been introduced to indicate a function as *not*
being pthread-safe. See PDL::PP docs for details.

Size of PDL Dimensions and PThread Target
Due to the way a PDL is split-up for operation using multiple
pthreads, the size of a dimension must be evenly divisible by the
pthread target. For example, if a PDL has threaded dimension sizes
of (4,3,3) and the *auto_pthread_targ* has been set to 2, then the
first threaded dimension (size 4) will be picked to be split up
into two pthreads of size 2 and 2. However, if the threaded
dimension sizes are (3,3,3) and the *auto_pthread_targ* is still
2, then pthreading won't occur, because no threaded dimensions are
divisible by 2.

The algorithm that picks the actual number of pthreads has some
smarts (but could probably be improved) to adjust down from the
*auto_pthread_targ* to get a number of pthreads that can evenly
divide one of the threaded dimensions. For example, if a PDL has
threaded dimension sizes of (9,2,2) and the *auto_pthread_targ* is
4, the algorithm will see that no dimension is divisible by 4,
then adjust down the target to 3, resulting in splitting up the
first threaded dimension (size 9) into 3 pthreads.

Speed improvement might be less than you expect.
If you have a 8 core machine and call *auto_pthread_targ* with 8
to generate 8 parallel pthreads, you probably won't get a 8X
improvement in speed, due to memory bandwidth issues. Even though
you have 8 separate CPUs crunching away on data, you will have
(for most common machine architectures) common RAM that now
becomes your bottleneck. For simple calculations (e.g simple
additions) you can run into a performance limit at about 4
pthreads. For more complex calculations the limit will be higher.




_______________________________________________
Perldl mailing list
per...@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1321 / Virus Database: 1500/3632 - Release Date: 05/11/11



_______________________________________________
PDL-porters mailing list
PDL-porters@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters

Re: [Pdl-porters] [Perldl] Auto Multi-Core Support Added to Next Development Release

Reply via email to