[petsc-dev] OpenMP/Vec

recrusader Sun, 26 Feb 2012 22:54:03 -0600

Dear Jed,

For multithread support in PETSc, my question is whether KSP and/or PC work
when Vec and Mat use multithread mode.


Thanks,
Yujie

On Sun, Feb 26, 2012 at 6:39 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Sun, Feb 26, 2012 at 04:07, Gerard Gorman <g.gorman at 
> imperial.ac.uk>wrote:
>
>> > Did you post a repository yet? I'd like to have a look at the code.
>>
>> It's on Barry's favourite collaborative software development site of
>> course ;-)
>>
>> https://bitbucket.org/wence/petsc-dev-omp/overview
>>
>
> I looked through the code and I'm concerned that all the OpenMP code is
> inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I can
> tell, no way to use OpenMP for some objects in a simulation, but not
> others. I think that all the pragmas should have num_threads(x->nthreads)
> clauses. We can compute the correct number of threads based on sizes when
> memory is allocated (or specified through command line options, inherited
> from related objects, etc).
>
> I don't think we can get away from OpenMP schedule overhead (several
> hundred cycles) even for those objects that we choose not to use threads
> for, but (at least with my gcc tests), that overhead is only about a third
> of the cost of actually starting a parallel region.
>
> It's really not acceptable to insert unguarded
>
> #pragma omp ...
>
> into the code because this will generate tons of warnings or errors with
> compilers that don't know about OpenMP. It would be better to test for
> _Pragma and use
>
> #define PetscPragmatize(x) _Pragma(#x)
> #if defined(PETSC_USE_OPENMP)
> #  define PetscPragmaOMP(x) PetscPragmatize(omp x)
> #else
> #  define PetscPragmaOMP(x)
> #endif
>
> then use
>
> PetscPragmaOMP(parallel for ...)
>
> We should probably use a variant for object-based threading
>
> #define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x
> num_threads((obj)->nthreads))
>
> In the case of multiple objects, I think you usually want the object being
> modified to control the number of threads.
>
> In many cases, I would prefer more control over the partition of the loop.
> For example, in many cases, I'd be willing to tolerate a slight
> computational imbalance between threads in exchange for working exclusively
> within my page. Note that the arithmetic to compute such things is orders
> of magnitude less expensive than the schedule/distribution to threads. I
> don't know how to do that except to
>
> PragmaOMP(parallel) {
>   int nthreads = omp_get_num_threads();
>   int tnum = omp_get_thread_num();
>   int start,end;
>   // compute start and end
>   for (int i=start; i<end; i++) {
>     // the work
>   }
> }
>
> We could perhaps capture some of this common logic in a macro:
>
> #define VecOMPParallelBegin(X,args) do { \
>   PragmaOMPObject(X,parallel args) { \
>   PetscInt _start, _end; \
>   VecOMPGetThreadLocalPart(X,&_start,&_end); \
>   { do {} while(0)
>
> #define VecOMPParallelEnd() }}} while(0)
>
> VecOMPParallelBegin(X, shared/private ...);
> {
>   PetscInt i;
>   for (i=_start; i<_end; i++) {
>     // the work
>   }
> }
> VecOMPParallelEnd();
>
> That should reasonably give us complete run-time control of the number of
> parallel threads per object and their distribution, within the constraints
> of contiguous thread partition. That also leaves open the possibility of
> using libnuma to query and migrate pages. (For example, a short vector that
> needs to be accessed from multiple NUMA nodes might intentionally be
> faulted with pages spread apart even though other vectors of similar size
> might be accessed from within one NUMA nodes and thus not use threads at
> all. (One 4 KiB page is only 512 doubles, but if the memory is local to a
> single NUMA node, we wouldn't use threads until the vector length was 4 to
> 8 times larger.)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120226/c87f9ab1/attachment.html>

[petsc-dev] OpenMP/Vec

Reply via email to