> On Jan 21, 2021, at 5:37 PM, Mark Adams <[email protected]> wrote:
>
> This did not work. I verified that MPI_Init_thread is being called correctly
> and that MPI returns that it supports this highest level of thread safety.
>
> I am going to ask ORNL.
>
> And if I use:
>
> -fieldsplit_i1_ksp_norm_type none
> -fieldsplit_i1_ksp_max_it 300
>
> for all 9 "i" variables, I can run normal iterations on the 10th variable, in
> a 10 species problem, and it works perfectly with 10 threads.
>
> So it is definitely that VecNorm is not thread safe.
>
> And, I want to call SuperLU_dist, which uses threads, but I don't want
> SuperLU to start using threads. Is there a way to tell superLU that there are
> no threads but have PETSc use them?
My interpretation and Satish's for many years is that SuperLU_DIST has to be
built with and use OpenMP in order to work with CUDA.
def formCMakeConfigureArgs(self):
args = config.package.CMakePackage.formCMakeConfigureArgs(self)
if self.openmp.found:
self.usesopenmp = 'yes'
else:
args.append('-DCMAKE_DISABLE_FIND_PACKAGE_OpenMP=TRUE')
if self.cuda.found:
if not self.openmp.found:
raise RuntimeError('SuperLU_DIST GPU code currently requires OpenMP.
Use --with-openmp=1')
But this could be ok. You use OpenMP and then it uses OpenMP internally, each
doing their own business (what could go wrong :-)).
Have you tried it?
Barry
>
> Thanks,
> Mark
>
> On Thu, Jan 21, 2021 at 5:19 PM Mark Adams <[email protected]
> <mailto:[email protected]>> wrote:
> OK, the problem is probably:
>
> PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;
>
> There is an example that sets:
>
> PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE;
>
> This is what I need.
>
>
>
>
> On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <[email protected]
> <mailto:[email protected]>> wrote:
>
>
> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <[email protected]
> <mailto:[email protected]>> wrote:
> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <[email protected]
> <mailto:[email protected]>> wrote:
> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <[email protected]
> <mailto:[email protected]>> wrote:
> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <[email protected]
> <mailto:[email protected]>> wrote:
> Yes, the problem is that each KSP solver is running in an OMP thread (So at
> this point it only works for SELF and its Landau so it is all I need). It
> looks like MPI reductions called with a comm_self are not thread safe (eg,
> the could say, this is one proc, thus, just copy send --> recv, but they
> don't)
>
> Instead of using SELF, how about Comm_dup() for each thread?
>
> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this.
> Thanks,
>
> You would have to dup them all outside the OMP section, since it is not
> threadsafe. Then each thread uses one I think.
>
> Yea sure. I do it in SetUp.
>
> Well that worked to get different Comms, finally, I still get the same
> problem. The number of iterations differ wildly. This two species and two
> threads (13 SNES its that is not deterministic). Way below is one thread (8
> its) and fairly uniform iteration counts.
>
> Maybe this MPI is just not thread safe at all. Let me look into it.
> Thanks anyway,
>
> 0 SNES Function norm 4.974994975313e-03
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x80017c60.
> Comms pc=0x67ad27c0 ksp=0x7ffe1600 newcomm=0x8014b6e0
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7ffdabc0.
> Comms pc=0x67ad27c0 ksp=0x7fff70d0 newcomm=0x7ffe9980
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 282
> 1 SNES Function norm 1.836376279964e-05
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 19
> 2 SNES Function norm 3.059930074740e-07
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 15
> 3 SNES Function norm 4.744275398121e-08
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 4
> 4 SNES Function norm 4.014828563316e-08
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 456
> 5 SNES Function norm 5.670836337808e-09
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 2
> 6 SNES Function norm 2.410421401323e-09
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 18
> 7 SNES Function norm 6.533948191791e-10
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 458
> 8 SNES Function norm 1.008133815842e-10
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 9
> 9 SNES Function norm 1.690450876038e-11
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 4
> 10 SNES Function norm 1.336383986009e-11
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 463
> 11 SNES Function norm 1.873022410774e-12
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 113
> 12 SNES Function norm 1.801834606518e-13
> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL iterations 1
> 13 SNES Function norm 1.004397317339e-13
> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 13
>
>
>
>
> 0 SNES Function norm 4.974994975313e-03
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e265010.
> Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e25bc40.
> Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 282
> 1 SNES Function norm 1.836376279963e-05
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 380
> 2 SNES Function norm 3.018499983019e-07
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 387
> 3 SNES Function norm 1.826353175637e-08
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 391
> 4 SNES Function norm 1.378600599548e-09
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 392
> 5 SNES Function norm 1.077289085611e-10
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 394
> 6 SNES Function norm 8.571891727748e-12
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 395
> 7 SNES Function norm 6.897647643450e-13
> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL iterations
> 395
> 8 SNES Function norm 5.606434614114e-14
> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 8
>
>
>
>
>
>
>
>
>
> Matt
>
> Matt
>
> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <[email protected]
> <mailto:[email protected]>> wrote:
> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <[email protected]
> <mailto:[email protected]>> wrote:
> It looks like PETSc is just too clever for me. I am trying to get a different
> MPI_Comm into each block, but PETSc is thwarting me:
>
> It looks like you are using SELF. Is that what you want? Do you want a bunch
> of comms with the same group, but independent somehow? I am confused.
>
> Matt
>
> if (jac->use_openmp) {
> ierr = KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit with
> -------------- link: %p. Comms %p
> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
> } else {
> ierr =
> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
> }
>
> produces:
>
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7e9cb4f0.
> Comms 0x660c6ad0 0x660c6ad0
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7e88f7d0.
> Comms 0x660c6ad0 0x660c6ad0
>
> How can I work around this?
>
>
> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <[email protected]
> <mailto:[email protected]>> wrote:
>
>
> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <[email protected]
> <mailto:[email protected]>> wrote:
>
>
>> On Jan 20, 2021, at 3:09 PM, Mark Adams <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> So I put in a temporary hack to get the first Fieldsplit apply to NOT use
>> OMP and it sort of works.
>>
>> Preonly/lu is fine. GMRES calls vector creates/dups in every solve so that
>> is a big problem.
>
> It should definitely not be creating vectors "in every" solve. But it does
> do lazy allocation of needed restarted vectors which may make it look like it
> is creating "every" vectors in every solve. You can use
> -ksp_gmres_preallocate to force it to create all the restart vectors up front
> at KSPSetUp().
>
> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse Vecs in
> the 2nd solve.
>
>
> Why is creating vectors "at every solve" a problem? It is not thread safe I
> guess?
>
> It dies when it looks at the options database, in a Free in the get-options
> method to be exact (see stacks).
>
> ======= Backtrace: =========
> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>
>
>
>> Richardson works except the convergence test gets confused, presumably
>> because MPI reductions with PETSC_COMM_SELF is not threadsafe.
>
>>
>> One fix for the norms might be to create each subdomain solver with a
>> different communicator.
>
> Yes you could do that. It might actually be the correct thing to do also,
> if you have multiple threads call MPI reductions on the same communicator
> that would be a problem. Each KSP should get a new MPI_Comm.
>
> OK. I will only do this.
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>