Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Junchao Zhang
I don't know OpenMP, but  I saw these in your configure

  OMP_PROC_BIND = 'TRUE'
  OMP_PLACES = '{0:24}'

Try not to do any binding and let OS freely schedule threads.

--Junchao Zhang


On Fri, Apr 7, 2023 at 7:17 PM Astor Piaz  wrote:

> Thanks for your reply Matt.
>
> The problem seems to be the MKL threads I just realized.
>
> Inside the MatShell I call:
>
> call omp_set_nested(.true.)
> call omp_set_dynamic(.false.)
> call mkl_set_dynamic(0)
>
> Then, inside the omp single thread I use:
>
> nMkl0 = mkl_set_num_threads_local(nMkl)
>
> where nMkl is set to 24
>
> MKL_VERBOSE shows, that the calls to have access to 24 threads but the
> timings are the same as in 1 thread
>
> MKL_VERBOSE
> ZGEMV(N,12544,12544,0x7ffde9edc800,0x14e4662d2010,12544,0x14985e610,1,0x7ffde9edc7f0,0x189faaa90,1)
> 117.09ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:24
> MKL_VERBOSE
> ZGEMV(N,12544,12544,0x7ffe00355700,0x14c8ec1e4010,12544,0x16959c830,1,0x7ffe003556f0,0x17dd7da70,1)
> 117.37ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:1
>
> The configuration of OpenMP that is launching these MKL processes is as
> follows:
>
> OPENMP DISPLAY ENVIRONMENT BEGIN
>   _OPENMP = '201511'
>   OMP_DYNAMIC = 'FALSE'
>   OMP_NESTED = 'TRUE'
>   OMP_NUM_THREADS = '24'
>   OMP_SCHEDULE = 'DYNAMIC'
>   OMP_PROC_BIND = 'TRUE'
>   OMP_PLACES = '{0:24}'
>   OMP_STACKSIZE = '0'
>   OMP_WAIT_POLICY = 'PASSIVE'
>   OMP_THREAD_LIMIT = '4294967295'
>   OMP_MAX_ACTIVE_LEVELS = '255'
>   OMP_CANCELLATION = 'FALSE'
>   OMP_DEFAULT_DEVICE = '0'
>   OMP_MAX_TASK_PRIORITY = '0'
>   OMP_DISPLAY_AFFINITY = 'FALSE'
>   OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
>   OMP_ALLOCATOR = 'omp_default_mem_alloc'
>   OMP_TARGET_OFFLOAD = 'DEFAULT'
>   GOMP_CPU_AFFINITY = ''
>   GOMP_STACKSIZE = '0'
>   GOMP_SPINCOUNT = '30'
> OPENMP DISPLAY ENVIRONMENT END
>
>
>
> On Fri, Apr 7, 2023 at 1:25 PM Matthew Knepley  wrote:
>
>> On Fri, Apr 7, 2023 at 2:26 PM Astor Piaz  wrote:
>>
>>> Hi Matthew, Jungchau,
>>> Thank you for your advice. The code still does not work, I give more
>>> details about it below, I can specify more about it as you wish.
>>>
>>> I am implementing a spectral method resulting in a block matrix where
>>> the off-diagonal blocks are Poincare-Steklov operators of
>>> impedance-to-impedance type.
>>> Those Poincare-Steklov operators have been created hierarchically
>>> merging subdomain operators (the HPS method), and I have a well tuned (but
>>> rather complex) OpenMP+MKL code that can apply this operator very fast.
>>> I would like to use PETSc's MPI-parallel GMRES solver with a MatShell
>>> that calls my OpenMP+MKL code, while each block can be in a different MPI
>>> process.
>>>
>>> At the moment the code runs correctly, except that PETSc is not letting
>>> my OpenMP+MKL code make the scheduling of threads as I choose.
>>>
>>
>> PETSc does not say anything about OpenMP threads. However, maybe you need
>> to launch the executable with the correct OMP env variables?
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> I am using
>>> ./configure --with-scalar-type=complex --prefix=../install/fast/
>>> --with-debugging=0 -with-openmp=1 --with-blaslapack-dir=${MKLROOT}
>>> --with-mkl_cpardiso-dir=${MKLROOT} --with-threadsafety --with-log=0
>>> COPTFLAGS=-g -Ofast CXXOPTFLAGS=-g -Ofast FOPTFLAGS=-g -Ofast
>>>
>>> Attached is an image of htop showing that the MKL threads are indeed
>>> being spawn, but they remain unused by the code. The previous calculations
>>> on the code show that it is capable of using OpenMP and MKL, only when
>>> PETSC KSPSolver is called MKL seems to be turned off.
>>>
>>> On Fri, Apr 7, 2023 at 8:10 AM Matthew Knepley 
>>> wrote:
>>>
 On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz 
 wrote:

> Hello petsc-users,
> I am trying to use a code that is parallelized with a combination of
> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
> processes.
> I have carefully scheduled the processes such that the right amount is
> launched, at the right time.
> When trying to use my code inside a MatShell (for later use in an
> FGMRES KSPSolver), MKL processes are not being used.
>
> I am sorry if this has been asked before.
> What configuration should I use in order to profit from MPI+OpenMP+MKL
> parallelism?
>

 You should configure using --with-threadsafety

   Thanks,

  Matt


> Thank you!
> --
> Astor
>


 --
 What most experimenters take for granted before they begin their
 experiments is infinitely more interesting than any results to which their
 experiments lead.
 -- Norbert Wiener

 https://www.cse.buffalo.edu/~knepley/
 

>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https:

Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Astor Piaz
I'm sorry I meant OpenMP threads are able to spawn MKL processes

On Fri, Apr 7, 2023 at 8:29 PM Dave May  wrote:

>
>
> On Fri 7. Apr 2023 at 07:06, Astor Piaz  wrote:
>
>> Hello petsc-users,
>> I am trying to use a code that is parallelized with a combination of
>> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
>> processes.
>>
>
> Is this really the correct way to go?
>
>
> Would it not be more suitable (or simpler) to run your application on an
>  MPI sub communicator which maps one rank to say one compute node, and then
> within each rank of the sub comm you utilize your threaded OpenMP / MKL
> code using as many physical threads as there are cores/ node  (and or hyper
> threads if that’s is effective for you)?
>
> Thanks,
> Dave
>
> I have carefully scheduled the processes such that the right amount is
>> launched, at the right time.
>> When trying to use my code inside a MatShell (for later use in an FGMRES
>> KSPSolver), MKL processes are not being used.
>>
>> I am sorry if this has been asked before.
>> What configuration should I use in order to profit from MPI+OpenMP+MKL
>> parallelism?
>>
>> Thank you!
>> --
>> Astor
>>
>


Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Dave May
On Fri 7. Apr 2023 at 07:06, Astor Piaz  wrote:

> Hello petsc-users,
> I am trying to use a code that is parallelized with a combination of
> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
> processes.
>

Is this really the correct way to go?


Would it not be more suitable (or simpler) to run your application on an
 MPI sub communicator which maps one rank to say one compute node, and then
within each rank of the sub comm you utilize your threaded OpenMP / MKL
code using as many physical threads as there are cores/ node  (and or hyper
threads if that’s is effective for you)?

Thanks,
Dave

I have carefully scheduled the processes such that the right amount is
> launched, at the right time.
> When trying to use my code inside a MatShell (for later use in an FGMRES
> KSPSolver), MKL processes are not being used.
>
> I am sorry if this has been asked before.
> What configuration should I use in order to profit from MPI+OpenMP+MKL
> parallelism?
>
> Thank you!
> --
> Astor
>


Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Astor Piaz
Thanks for your reply Matt.

The problem seems to be the MKL threads I just realized.

Inside the MatShell I call:

call omp_set_nested(.true.)
call omp_set_dynamic(.false.)
call mkl_set_dynamic(0)

Then, inside the omp single thread I use:

nMkl0 = mkl_set_num_threads_local(nMkl)

where nMkl is set to 24

MKL_VERBOSE shows, that the calls to have access to 24 threads but the
timings are the same as in 1 thread

MKL_VERBOSE
ZGEMV(N,12544,12544,0x7ffde9edc800,0x14e4662d2010,12544,0x14985e610,1,0x7ffde9edc7f0,0x189faaa90,1)
117.09ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:24
MKL_VERBOSE
ZGEMV(N,12544,12544,0x7ffe00355700,0x14c8ec1e4010,12544,0x16959c830,1,0x7ffe003556f0,0x17dd7da70,1)
117.37ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:1

The configuration of OpenMP that is launching these MKL processes is as
follows:

OPENMP DISPLAY ENVIRONMENT BEGIN
  _OPENMP = '201511'
  OMP_DYNAMIC = 'FALSE'
  OMP_NESTED = 'TRUE'
  OMP_NUM_THREADS = '24'
  OMP_SCHEDULE = 'DYNAMIC'
  OMP_PROC_BIND = 'TRUE'
  OMP_PLACES = '{0:24}'
  OMP_STACKSIZE = '0'
  OMP_WAIT_POLICY = 'PASSIVE'
  OMP_THREAD_LIMIT = '4294967295'
  OMP_MAX_ACTIVE_LEVELS = '255'
  OMP_CANCELLATION = 'FALSE'
  OMP_DEFAULT_DEVICE = '0'
  OMP_MAX_TASK_PRIORITY = '0'
  OMP_DISPLAY_AFFINITY = 'FALSE'
  OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
  OMP_ALLOCATOR = 'omp_default_mem_alloc'
  OMP_TARGET_OFFLOAD = 'DEFAULT'
  GOMP_CPU_AFFINITY = ''
  GOMP_STACKSIZE = '0'
  GOMP_SPINCOUNT = '30'
OPENMP DISPLAY ENVIRONMENT END



On Fri, Apr 7, 2023 at 1:25 PM Matthew Knepley  wrote:

> On Fri, Apr 7, 2023 at 2:26 PM Astor Piaz  wrote:
>
>> Hi Matthew, Jungchau,
>> Thank you for your advice. The code still does not work, I give more
>> details about it below, I can specify more about it as you wish.
>>
>> I am implementing a spectral method resulting in a block matrix where the
>> off-diagonal blocks are Poincare-Steklov operators of
>> impedance-to-impedance type.
>> Those Poincare-Steklov operators have been created hierarchically merging
>> subdomain operators (the HPS method), and I have a well tuned (but rather
>> complex) OpenMP+MKL code that can apply this operator very fast.
>> I would like to use PETSc's MPI-parallel GMRES solver with a MatShell
>> that calls my OpenMP+MKL code, while each block can be in a different MPI
>> process.
>>
>> At the moment the code runs correctly, except that PETSc is not letting
>> my OpenMP+MKL code make the scheduling of threads as I choose.
>>
>
> PETSc does not say anything about OpenMP threads. However, maybe you need
> to launch the executable with the correct OMP env variables?
>
>   Thanks,
>
>  Matt
>
>
>> I am using
>> ./configure --with-scalar-type=complex --prefix=../install/fast/
>> --with-debugging=0 -with-openmp=1 --with-blaslapack-dir=${MKLROOT}
>> --with-mkl_cpardiso-dir=${MKLROOT} --with-threadsafety --with-log=0
>> COPTFLAGS=-g -Ofast CXXOPTFLAGS=-g -Ofast FOPTFLAGS=-g -Ofast
>>
>> Attached is an image of htop showing that the MKL threads are indeed
>> being spawn, but they remain unused by the code. The previous calculations
>> on the code show that it is capable of using OpenMP and MKL, only when
>> PETSC KSPSolver is called MKL seems to be turned off.
>>
>> On Fri, Apr 7, 2023 at 8:10 AM Matthew Knepley  wrote:
>>
>>> On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz 
>>> wrote:
>>>
 Hello petsc-users,
 I am trying to use a code that is parallelized with a combination of
 OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
 processes.
 I have carefully scheduled the processes such that the right amount is
 launched, at the right time.
 When trying to use my code inside a MatShell (for later use in an
 FGMRES KSPSolver), MKL processes are not being used.

 I am sorry if this has been asked before.
 What configuration should I use in order to profit from MPI+OpenMP+MKL
 parallelism?

>>>
>>> You should configure using --with-threadsafety
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Thank you!
 --
 Astor

>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Matthew Knepley
On Fri, Apr 7, 2023 at 2:26 PM Astor Piaz  wrote:

> Hi Matthew, Jungchau,
> Thank you for your advice. The code still does not work, I give more
> details about it below, I can specify more about it as you wish.
>
> I am implementing a spectral method resulting in a block matrix where the
> off-diagonal blocks are Poincare-Steklov operators of
> impedance-to-impedance type.
> Those Poincare-Steklov operators have been created hierarchically merging
> subdomain operators (the HPS method), and I have a well tuned (but rather
> complex) OpenMP+MKL code that can apply this operator very fast.
> I would like to use PETSc's MPI-parallel GMRES solver with a MatShell that
> calls my OpenMP+MKL code, while each block can be in a different MPI
> process.
>
> At the moment the code runs correctly, except that PETSc is not letting my
> OpenMP+MKL code make the scheduling of threads as I choose.
>

PETSc does not say anything about OpenMP threads. However, maybe you need
to launch the executable with the correct OMP env variables?

  Thanks,

 Matt


> I am using
> ./configure --with-scalar-type=complex --prefix=../install/fast/
> --with-debugging=0 -with-openmp=1 --with-blaslapack-dir=${MKLROOT}
> --with-mkl_cpardiso-dir=${MKLROOT} --with-threadsafety --with-log=0
> COPTFLAGS=-g -Ofast CXXOPTFLAGS=-g -Ofast FOPTFLAGS=-g -Ofast
>
> Attached is an image of htop showing that the MKL threads are indeed being
> spawn, but they remain unused by the code. The previous calculations on the
> code show that it is capable of using OpenMP and MKL, only when PETSC
> KSPSolver is called MKL seems to be turned off.
>
> On Fri, Apr 7, 2023 at 8:10 AM Matthew Knepley  wrote:
>
>> On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz  wrote:
>>
>>> Hello petsc-users,
>>> I am trying to use a code that is parallelized with a combination of
>>> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
>>> processes.
>>> I have carefully scheduled the processes such that the right amount is
>>> launched, at the right time.
>>> When trying to use my code inside a MatShell (for later use in an FGMRES
>>> KSPSolver), MKL processes are not being used.
>>>
>>> I am sorry if this has been asked before.
>>> What configuration should I use in order to profit from MPI+OpenMP+MKL
>>> parallelism?
>>>
>>
>> You should configure using --with-threadsafety
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Thank you!
>>> --
>>> Astor
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Junchao Zhang
> OpenMP threads are able to spawn MPI processes

I am curious why you have this usage.  Is it because that you want a pure
OpenMP code (i.e., not MPI capable) to call petsc?

--Junchao Zhang


On Fri, Apr 7, 2023 at 9:06 AM Astor Piaz  wrote:

> Hello petsc-users,
> I am trying to use a code that is parallelized with a combination of
> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
> processes.
> I have carefully scheduled the processes such that the right amount is
> launched, at the right time.
> When trying to use my code inside a MatShell (for later use in an FGMRES
> KSPSolver), MKL processes are not being used.
>
> I am sorry if this has been asked before.
> What configuration should I use in order to profit from MPI+OpenMP+MKL
> parallelism?
>
> Thank you!
> --
> Astor
>


Re: [petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Matthew Knepley
On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz  wrote:

> Hello petsc-users,
> I am trying to use a code that is parallelized with a combination of
> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
> processes.
> I have carefully scheduled the processes such that the right amount is
> launched, at the right time.
> When trying to use my code inside a MatShell (for later use in an FGMRES
> KSPSolver), MKL processes are not being used.
>
> I am sorry if this has been asked before.
> What configuration should I use in order to profit from MPI+OpenMP+MKL
> parallelism?
>

You should configure using --with-threadsafety

  Thanks,

 Matt


> Thank you!
> --
> Astor
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


[petsc-users] MPI+OpenMP+MKL

2023-04-07 Thread Astor Piaz
Hello petsc-users,
I am trying to use a code that is parallelized with a combination of OpenMP
and MKL parallelisms, where OpenMP threads are able to spawn MPI processes.
I have carefully scheduled the processes such that the right amount is
launched, at the right time.
When trying to use my code inside a MatShell (for later use in an FGMRES
KSPSolver), MKL processes are not being used.

I am sorry if this has been asked before.
What configuration should I use in order to profit from MPI+OpenMP+MKL
parallelism?

Thank you!
--
Astor