Hi Shri,
It seems that the problem does not result from the affinities setting
for threads. I have tried several settings, the threads are set to
different cores, but there is no improvement.
Here is the information of package, core and thread maps
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf
11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected:
{0,1,2,3,4,5,6,7,8,9,10,11}
OMP: Info #156: KMP_AFFINITY: 12 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 2 threads/core
(6 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 4 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 4 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 5 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 5 thread 1
OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost
levels of machine
And here is the internal thread bounding with different kmp_affinity
settings:
1. KMP_AFFINITY=verbose,granularity=thread,compact
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
2. KMP_AFFINITY=verbose,granularity=fine,compact
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
3. KMP_AFFINITY=verbose,granularity=fine,compact,1,0
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6}
4. KMP_AFFINITY=verbose,scatter
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2,3}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4,5}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6,7}
5. KMP_AFFINITY=verbose,compact (For this setting, two threads are
assigned to the same core)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3}
6. KMP_AFFINITY=verbose,granularity=core,compact (For this setting, two
threads are assigned to the same core)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3}
The first 4 settings can assign threads to a distinct core, but the
problem is not solved.
Thanks,
Danyang
On 22/09/2013 8:00 PM, Shri wrote:
I think this is definitely an issue with setting the affinities for
threads, i.e., the assignment of threads to cores. Ideally each thread
should be assigned to a distinct core but in your case all the 4
threads are getting pinned to the same core resulting in such a
massive slowdown. Unfortunately, the thread affinities for OpenMP are
set through environment variables. For Intel's OpenMP one needs to
define the thread affinities through the environment variable
KMP_AFFINITY. See this document here
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm.
Try setting the affinities via KMP_AFFINITY and let us know if it works.
Shri
On Sep 21, 2013, at 11:06 PM, Danyang Su wrote:
Hi Shri,
Thanks for your info. It can work with the option -threadcomm_type
openmp. But another problem arises, as described as follows.
The sparse matrix is 53760*53760 with 1067392 non-zero entries. If
the codes is compiled using PETSc-3.4.2, it works fine, the equations
can be solved quickly and I can see the speedup. But if the code is
compiled using PETSc-dev with OpenMP option, it takes a long time in
solving the equations and I cannot see any speedup when more
processors are used.
For PETSc-3.4.2, run by "mpiexec -n 4 ksp_inhm_d -log_summary
log_mpi4_petsc3.4.2.log", the iteration and runtime are:
Iterations 6 *time_assembly 0.4137E-01* time_ksp 0.9296E-01
For PETSc-dev, run by "mpiexec -n 1 ksp_inhm_d -threadcomm_type
openmp -threadcomm_nthreads 4 -log_summary log_openmp_petsc_dev.log",
the iteration and runtime are:
Iterations 6 *time_assembly 0.3595E+03* time_ksp 0.2907E+00
Most of the time *'time_assembly 0.3595E+03*' is spent on the
following codes
do i = istart, iend - 1
ii = ia_in(i+1)
jj = ia_in(i+2)
call MatSetValues(a, ione, i, jj-ii,
ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
end do
The log files for both PETSc-3.4.2 and PETSc-dev are attached.
Is there anything wrong with my codes or with running option? The
above codes works fine when using MPICH.
Thanks and regards,
Danyang
On 21/09/2013 2:09 PM, Shri wrote:
There are three thread communicator types in PETSc. The default is
"no thread" which is basically a non-threaded version. The other two
types are "openmp" and "pthread". If you want to use OpenMP then use
the option -threadcomm_type openmp.
Shri
On Sep 21, 2013, at 3:46 PM, Danyang Su <[email protected]
<mailto:[email protected]>> wrote:
Hi Barry,
Thanks for the quick reply.
After changing
/#if defined(PETSC_HAVE_PTHREADCLASSES) || defined
(PETSC_HAVE_OPENMP) /
to
/#if defined(PETSC_HAVE_PTHREADCLASSES)/
and comment out
/#elif defined(PETSC_HAVE_OPENMP)//
//PETSC_EXTERN PetscStack *petscstack;/
It can be compiled and validated with "make test".
But I still have questions on running the examples. After rebuild
the codes (e.g., ksp_ex2f.f), I can run it with "mpiexec -n 1
ksp_ex2f", or "mpiexec -n 4 ksp_ex2f", or "mpiexec -n 1 ksp_ex2f
-threadcomm_nthreads 1", but if I run it with "mpiexec -n 1
ksp_ex2f -threadcomm_nthreads 4", there will be a lot of error
information (attached).
The codes is not modified and there is no OpenMP routines in it.
For the current development in my project, I want to keep the
OpenMP codes in calculating matrix values, but want to solve it
with PETSc (OpenMP). Is it possible?
Thanks and regards,
Danyang
On 21/09/2013 7:26 AM, Barry Smith wrote:
Danyang,
I don't think the || defined (PETSC_HAVE_OPENMP) belongs in the code
below.
/* Linux functions CPU_SET and others don't work if sched.h is not included
before
including pthread.h. Also, these functions are active only if either
_GNU_SOURCE
or __USE_GNU is not set (see /usr/include/sched.h and
/usr/include/features.h), hence
set these first.
*/
#if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
Edit include/petscerror.h and locate these lines and remove that part and then
rerun make all. Let us know if it works or not.
Barry
i.e. replace
#if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
with
#if defined(PETSC_HAVE_PTHREADCLASSES)
On Sep 21, 2013, at 6:53 AM, Matthew Knepley<[email protected]> wrote:
On Sat, Sep 21, 2013 at 12:18 AM, Danyang Su<[email protected]> wrote:
Hi All,
I got error information in compiling petsc-dev with openmp in cygwin. Before, I
have successfully compiled petsc-3.4.2 and it works fine.
The log files have been attached.
The OpenMP configure test is wrong. It clearly fails to find pthread.h, but the
test passes. Then in petscerror.h
we guard pthread.h using PETSC_HAVE_OPENMP. Can someone who knows OpenMP fix
this?
Matt
Thanks,
Danyang
--
What most experimenters take for granted before they begin their experiments is
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
<error.txt>
<log_mpi4_petsc3.4.2.log><log_openmp_petsc_dev.log>