I think this is definitely an issue with setting the affinities for threads, i.e., the assignment of threads to cores. Ideally each thread should be assigned to a distinct core but in your case all the 4 threads are getting pinned to the same core resulting in such a massive slowdown. Unfortunately, the thread affinities for OpenMP are set through environment variables. For Intel's OpenMP one needs to define the thread affinities through the environment variable KMP_AFFINITY. See this document here http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm. Try setting the affinities via KMP_AFFINITY and let us know if it works.
Shri On Sep 21, 2013, at 11:06 PM, Danyang Su wrote: > Hi Shri, > > Thanks for your info. It can work with the option -threadcomm_type openmp. > But another problem arises, as described as follows. > > The sparse matrix is 53760*53760 with 1067392 non-zero entries. If the codes > is compiled using PETSc-3.4.2, it works fine, the equations can be solved > quickly and I can see the speedup. But if the code is compiled using > PETSc-dev with OpenMP option, it takes a long time in solving the equations > and I cannot see any speedup when more processors are used. > > For PETSc-3.4.2, run by "mpiexec -n 4 ksp_inhm_d -log_summary > log_mpi4_petsc3.4.2.log", the iteration and runtime are: > Iterations 6 time_assembly 0.4137E-01 time_ksp 0.9296E-01 > > For PETSc-dev, run by "mpiexec -n 1 ksp_inhm_d -threadcomm_type openmp > -threadcomm_nthreads 4 -log_summary log_openmp_petsc_dev.log", the iteration > and runtime are: > Iterations 6 time_assembly 0.3595E+03 time_ksp 0.2907E+00 > > Most of the time 'time_assembly 0.3595E+03' is spent on the following codes > do i = istart, iend - 1 > ii = ia_in(i+1) > jj = ia_in(i+2) > call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, > a_in(ii:jj-1), Insert_Values, ierr) > end do > > The log files for both PETSc-3.4.2 and PETSc-dev are attached. > > Is there anything wrong with my codes or with running option? The above codes > works fine when using MPICH. > > Thanks and regards, > > Danyang > > On 21/09/2013 2:09 PM, Shri wrote: >> There are three thread communicator types in PETSc. The default is "no >> thread" which is basically a non-threaded version. The other two types are >> "openmp" and "pthread". If you want to use OpenMP then use the option >> -threadcomm_type openmp. >> >> Shri >> >> On Sep 21, 2013, at 3:46 PM, Danyang Su <[email protected]> wrote: >> >>> Hi Barry, >>> >>> Thanks for the quick reply. >>> >>> After changing >>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP) >>> to >>> #if defined(PETSC_HAVE_PTHREADCLASSES) >>> and comment out >>> #elif defined(PETSC_HAVE_OPENMP) >>> PETSC_EXTERN PetscStack *petscstack; >>> >>> It can be compiled and validated with "make test". >>> >>> But I still have questions on running the examples. After rebuild the codes >>> (e.g., ksp_ex2f.f), I can run it with "mpiexec -n 1 ksp_ex2f", or "mpiexec >>> -n 4 ksp_ex2f", or "mpiexec -n 1 ksp_ex2f -threadcomm_nthreads 1", but if I >>> run it with "mpiexec -n 1 ksp_ex2f -threadcomm_nthreads 4", there will be a >>> lot of error information (attached). >>> >>> The codes is not modified and there is no OpenMP routines in it. For the >>> current development in my project, I want to keep the OpenMP codes in >>> calculating matrix values, but want to solve it with PETSc (OpenMP). Is it >>> possible? >>> >>> Thanks and regards, >>> >>> Danyang >>> >>> >>> >>> On 21/09/2013 7:26 AM, Barry Smith wrote: >>>> Danyang, >>>> >>>> I don't think the || defined (PETSC_HAVE_OPENMP) belongs in the >>>> code below. >>>> >>>> /* Linux functions CPU_SET and others don't work if sched.h is not >>>> included before >>>> including pthread.h. Also, these functions are active only if either >>>> _GNU_SOURCE >>>> or __USE_GNU is not set (see /usr/include/sched.h and >>>> /usr/include/features.h), hence >>>> set these first. >>>> */ >>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP) >>>> >>>> Edit include/petscerror.h and locate these lines and remove that part and >>>> then rerun make all. Let us know if it works or not. >>>> >>>> Barry >>>> >>>> i.e. replace >>>> >>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP) >>>> >>>> with >>>> >>>> #if defined(PETSC_HAVE_PTHREADCLASSES) >>>> >>>> On Sep 21, 2013, at 6:53 AM, Matthew Knepley <[email protected]> >>>> wrote: >>>> >>>>> On Sat, Sep 21, 2013 at 12:18 AM, Danyang Su <[email protected]> wrote: >>>>> Hi All, >>>>> >>>>> I got error information in compiling petsc-dev with openmp in cygwin. >>>>> Before, I have successfully compiled petsc-3.4.2 and it works fine. >>>>> The log files have been attached. >>>>> >>>>> The OpenMP configure test is wrong. It clearly fails to find pthread.h, >>>>> but the test passes. Then in petscerror.h >>>>> we guard pthread.h using PETSC_HAVE_OPENMP. Can someone who knows OpenMP >>>>> fix this? >>>>> >>>>> Matt >>>>> >>>>> Thanks, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which >>>>> their experiments lead. >>>>> -- Norbert Wiener >>> >>> <error.txt> > > <log_mpi4_petsc3.4.2.log><log_openmp_petsc_dev.log>
