On Sep 23, 2013, at 2:13 PM, Danyang Su <[email protected]> wrote:

> Hi Barry,
> 
> Another strange problem:
> 
> Currently I have PETSc-3.4.2 MPI version and PETSc-dev OpenMP version on my 
> computer, with different environment variable of PETSC_ARCH and PETSC_DIR. 
> Before installation of PETSc-dev OpenMP version, the PETSc-3.4.2 MPI version 
> works fine. But after installation of PETSc-dev OpenMP version, the same 
> problem exist in PETSc-3.4.2 MPI version if run with 1 processor, but no 
> problem with 2 or more processors.

  Are you sure some environmental variable is not set? Related to OpenMP? What 
if you run on one process with mpiexec and then immediately after on two 
process with mpiexec in the same window?

   Barry

> 
> Thanks,
> 
> Danyang
> 
> On 23/09/2013 12:01 PM, Danyang Su wrote:
>> Hi Barry,
>> 
>> Sorry I forgot the message in the previous email. It is still slow when run 
>> without the "-threadcomm_type openmp -threadcomm_nthreads 1"
>> 
>> Thanks,
>> 
>> Danyang
>> 
>> On 23/09/2013 11:43 AM, Barry Smith wrote:
>>>    You did not answer my question from yesterday:
>>> 
>>>  If you run the Openmp compiled version WITHOUT the
>>> 
>>> -threadcomm_nthreads 1
>>> -threadcomm_type openmp
>>> 
>>>  command line options is it still slow?
>>> 
>>> 
>>> On Sep 23, 2013, at 1:33 PM, Danyang Su <[email protected]> wrote:
>>> 
>>>> Hi Shri,
>>>> 
>>>> It seems that the problem does not result from the affinities setting for 
>>>> threads. I have tried several settings, the threads are set to different 
>>>> cores, but there is no improvement.
>>>> 
>>>> Here is the information of package, core and thread maps
>>>> 
>>>> OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
>>>> OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 
>>>> info
>>>> OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 
>>>> {0,1,2,3,4,5,6,7,8,9,10,11}
>>>> OMP: Info #156: KMP_AFFINITY: 12 available OS procs
>>>> OMP: Info #157: KMP_AFFINITY: Uniform topology
>>>> OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 2 threads/core (6 
>>>> total cores)
>>>> OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 4 thread 0
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 4 thread 1
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 5 thread 0
>>>> OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 5 thread 1
>>>> OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost 
>>>> levels of machine
>>>> 
>>>> 
>>>> And here is the internal thread bounding with different kmp_affinity 
>>>> settings:
>>>> 
>>>> 1. KMP_AFFINITY=verbose,granularity=thread,compact
>>>> 
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
>>>> 
>>>> 2. KMP_AFFINITY=verbose,granularity=fine,compact
>>>> 
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
>>>> 
>>>> 3. KMP_AFFINITY=verbose,granularity=fine,compact,1,0
>>>> 
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6}
>>>> 
>>>> 4. KMP_AFFINITY=verbose,scatter
>>>> 
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2,3}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4,5}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6,7}
>>>> 
>>>> 5. KMP_AFFINITY=verbose,compact (For this setting, two threads are 
>>>> assigned to the same core)
>>>> 
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3}
>>>> 
>>>> 6. KMP_AFFINITY=verbose,granularity=core,compact  (For this setting, two 
>>>> threads are assigned to the same core)
>>>> 
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3}
>>>> OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3}
>>>> 
>>>> The first 4 settings can assign threads to a distinct core, but the 
>>>> problem is not solved.
>>>> 
>>>> Thanks,
>>>> 
>>>> Danyang
>>>> 
>>>> 
>>>> 
>>>> On 22/09/2013 8:00 PM, Shri wrote:
>>>>> I think this is definitely an issue with setting the affinities for 
>>>>> threads, i.e., the assignment of threads to cores. Ideally each thread 
>>>>> should be assigned to a distinct core but in your case all the 4 threads 
>>>>> are getting pinned to the same core resulting in such a massive slowdown. 
>>>>> Unfortunately, the thread affinities for OpenMP are set through 
>>>>> environment variables. For Intel's OpenMP one needs to define the thread 
>>>>> affinities through the environment variable KMP_AFFINITY. See this 
>>>>> document here 
>>>>> http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm.
>>>>>  Try setting the affinities via KMP_AFFINITY and let us know if it works.
>>>>> 
>>>>> Shri
>>>>> On Sep 21, 2013, at 11:06 PM, Danyang Su wrote:
>>>>> 
>>>>>> Hi Shri,
>>>>>> 
>>>>>> Thanks for your info. It can work with the option -threadcomm_type 
>>>>>> openmp. But another problem arises, as described as follows.
>>>>>> 
>>>>>> The sparse matrix is  53760*53760 with 1067392 non-zero entries. If the 
>>>>>> codes is compiled using PETSc-3.4.2, it works fine, the equations can be 
>>>>>> solved quickly and I can see the speedup. But if the code is compiled 
>>>>>> using PETSc-dev with OpenMP option, it takes a long time in solving the 
>>>>>> equations and I cannot see any speedup when more processors are used.
>>>>>> 
>>>>>> For PETSc-3.4.2,  run by "mpiexec -n 4 ksp_inhm_d -log_summary 
>>>>>> log_mpi4_petsc3.4.2.log", the iteration and runtime are:
>>>>>> Iterations     6 time_assembly  0.4137E-01 time_ksp 0.9296E-01
>>>>>> 
>>>>>> For PETSc-dev,  run by "mpiexec -n 1 ksp_inhm_d -threadcomm_type openmp 
>>>>>> -threadcomm_nthreads 4 -log_summary log_openmp_petsc_dev.log", the 
>>>>>> iteration and runtime are:
>>>>>> Iterations     6 time_assembly  0.3595E+03 time_ksp 0.2907E+00
>>>>>> 
>>>>>> Most of the time 'time_assembly  0.3595E+03' is spent on the following 
>>>>>> codes
>>>>>>                 do i = istart, iend - 1
>>>>>>                    ii = ia_in(i+1)
>>>>>>                    jj = ia_in(i+2)
>>>>>>                    call MatSetValues(a, ione, i, jj-ii, 
>>>>>> ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
>>>>>>                 end do
>>>>>> 
>>>>>> The log files for both PETSc-3.4.2 and PETSc-dev are attached.
>>>>>> 
>>>>>> Is there anything wrong with my codes or with running option? The above 
>>>>>> codes works fine when using MPICH.
>>>>>> 
>>>>>> Thanks and regards,
>>>>>> 
>>>>>> Danyang
>>>>>> 
>>>>>> On 21/09/2013 2:09 PM, Shri wrote:
>>>>>>> There are three thread communicator types in PETSc. The default is "no 
>>>>>>> thread" which is basically a non-threaded version. The other two types 
>>>>>>> are "openmp" and "pthread". If you want to use OpenMP then use the 
>>>>>>> option -threadcomm_type openmp.
>>>>>>> 
>>>>>>> Shri
>>>>>>> 
>>>>>>> On Sep 21, 2013, at 3:46 PM, Danyang Su <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi Barry,
>>>>>>>> 
>>>>>>>> Thanks for the quick reply.
>>>>>>>> 
>>>>>>>> After changing
>>>>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>>>> to
>>>>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>>>>>> and comment out
>>>>>>>> #elif defined(PETSC_HAVE_OPENMP)
>>>>>>>> PETSC_EXTERN PetscStack *petscstack;
>>>>>>>> 
>>>>>>>> It can be compiled and validated with "make test".
>>>>>>>> 
>>>>>>>> But I still have questions on running the examples. After rebuild the 
>>>>>>>> codes (e.g., ksp_ex2f.f), I can run it with "mpiexec -n 1 ksp_ex2f", 
>>>>>>>> or "mpiexec -n 4 ksp_ex2f", or "mpiexec -n 1 ksp_ex2f 
>>>>>>>> -threadcomm_nthreads 1", but if I run it with "mpiexec -n 1 ksp_ex2f 
>>>>>>>> -threadcomm_nthreads 4", there will be a lot of error information 
>>>>>>>> (attached).
>>>>>>>> 
>>>>>>>> The codes is not modified and there is no OpenMP routines in it. For 
>>>>>>>> the current development in my project, I want to keep the OpenMP codes 
>>>>>>>> in calculating matrix values, but want to solve it with PETSc 
>>>>>>>> (OpenMP). Is it possible?
>>>>>>>> 
>>>>>>>> Thanks and regards,
>>>>>>>> 
>>>>>>>> Danyang
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 21/09/2013 7:26 AM, Barry Smith wrote:
>>>>>>>>>   Danyang,
>>>>>>>>> 
>>>>>>>>>      I don't think the  || defined (PETSC_HAVE_OPENMP)   belongs in 
>>>>>>>>> the code below.
>>>>>>>>> 
>>>>>>>>> /*  Linux functions CPU_SET and others don't work if sched.h is not 
>>>>>>>>> included before
>>>>>>>>>     including pthread.h. Also, these functions are active only if 
>>>>>>>>> either _GNU_SOURCE
>>>>>>>>>     or __USE_GNU is not set (see /usr/include/sched.h and 
>>>>>>>>> /usr/include/features.h), hence
>>>>>>>>>     set these first.
>>>>>>>>> */
>>>>>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>>>>> 
>>>>>>>>> Edit include/petscerror.h and locate these lines and remove that part 
>>>>>>>>> and then rerun make all.  Let us know if it works or not.
>>>>>>>>> 
>>>>>>>>>    Barry
>>>>>>>>> 
>>>>>>>>> i.e. replace
>>>>>>>>> 
>>>>>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>>>>> 
>>>>>>>>> with
>>>>>>>>> 
>>>>>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>>>>>>> 
>>>>>>>>> On Sep 21, 2013, at 6:53 AM, Matthew Knepley
>>>>>>>>> <[email protected]>
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Sat, Sep 21, 2013 at 12:18 AM, Danyang Su <[email protected]>
>>>>>>>>>>  wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>> I got error information in compiling petsc-dev with openmp in 
>>>>>>>>>> cygwin. Before, I have successfully compiled petsc-3.4.2 and it 
>>>>>>>>>> works fine.
>>>>>>>>>> The log files have been attached.
>>>>>>>>>> 
>>>>>>>>>> The OpenMP configure test is wrong. It clearly fails to find 
>>>>>>>>>> pthread.h, but the test passes. Then in petscerror.h
>>>>>>>>>> we guard pthread.h using PETSC_HAVE_OPENMP. Can someone who knows 
>>>>>>>>>> OpenMP fix this?
>>>>>>>>>> 
>>>>>>>>>>     Matt
>>>>>>>>>>  Thanks,
>>>>>>>>>> 
>>>>>>>>>> Danyang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> What most experimenters take for granted before they begin their 
>>>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>>>> their experiments lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>> 
>>>>>>>> <error.txt>
>>>>>> <log_mpi4_petsc3.4.2.log><log_openmp_petsc_dev.log>
>> 
> 

Reply via email to