Re: [petsc-dev] CUDA + OMP make error

Matthew Knepley Thu, 16 Apr 2020 06:32:03 -0700

On Thu, Apr 16, 2020 at 8:42 AM Mark Adams <[email protected]> wrote:

> Yea, GPU assembly would be great. I was figuring OMP might be simpler.
>
> As far as the interface, I am flexible, the simplest way to do it would be
> to take an array of element matrices and a DMPlex and call
> to DMPlexMatSetClosure. You can see this code in
> mark/feature-xgc-interface-rebase, at the bottom of
> src/vec/vec/impls/seq/seqcuda/landau.cu.
>
> I was shy about putting a version of DMPlexMatSetClosure in CUDA, but
> maybe that is easier, just plow through it and cut stuff that we don't
> need. OMP broke because there are some temp arrays that Matt caches that
> need to be "private" of dealt with in some way.
>


We should refactor so that all temp arrays are sized and constructed up
front, and then the work is done in an internal function which is passed
those arrays. I tried to do this, but might have crapped out here. Then
you can just call the internal function directly with your arrays.

  Matt


> Coloring is not attractive to me because GPUs demand a lot of parallelism
> and the code that this serial (velocity space) solver would be embedded in
> a full 3D code that does not use a huge amount of MPI parallelism. For
> instance if the app code was to use 6 (or 7 max in SUMMIT) cores per GPU
> (or even 4x that with hardware threads) then *I could imagine* there
> would be enough parallelism, with coloring, to fuse the element
> construction and assembly, so assembling the element matrices right after
> they are created. That would be great in terms of not storing all these
> matrices and then assembling them all at once. The app that I am
> targeting does not use that much MPI parallelism though. But we could
> explore that, coloring, space and my mental model could be inaccurate.
> (note, I did recently add 8x more parallelism to my code this week and got
> a 25% speedup, using one whole GPU).
>
> Or if you have some sort of lower level synchronization that could allow
> for fusing the the assembly with the element creation, then, by all means,
> we can explore that.
>
> I'd be happy to work with you on this.
>
> Thanks,
> Mark
>
> On Mon, Apr 13, 2020 at 7:08 PM Junchao Zhang <[email protected]>
> wrote:
>
>> Probably matrix assembly on GPU is more important. Do you have an example
>> for me to play to see what GPU interface we should have?
>> --Junchao Zhang
>>
>> On Mon, Apr 13, 2020 at 5:44 PM Mark Adams <[email protected]> wrote:
>>
>>> I was looking into assembling matrices with threads. I have a coloring
>>> to avoid conflicts.
>>>
>>> Turning off all the logging seems way overkill and for methods that can
>>> get called in a thread then we could use PETSC_HAVE_THREADSAFTEY (thingy)
>>> to protect logging functions. So one can still get timings for the whole
>>> assembly process, just not for MatSetValues. Few people are going to do
>>> this. I don't think it will be a time sink, and if it is we just revert
>>> back to saying 'turn logging off'. I don't see a good argument for
>>> insisting on turning off logging, it is pretty important, if we just say
>>> that we are going to protect methods as needed.
>>>
>>> It is not a big deal, I am just exploring this idea. It is such a basic
>>> concept in shared memory sparse linear algebra that it seems like a good
>>> thing to be able to support and have in an example to say we can assemble
>>> matrices in threads (not that it is a great idea). We have all the tools
>>> (eg, coloring methods) that it is just a matter of protecting code a few
>>> methods. I use DMPlex MatClosure instead of MatSetValues and this is where
>>> I die now with non-thread safe code. We have an idea, from Jed, on how to
>>> fix it.
>>>
>>> Anyway, thanks for your help, but I think we should hold off on doing
>>> anything until we have some consensus that this would be a good idea to put
>>> some effort into getting a thread safe PETSc that can support OMP matrix
>>> assembly with a nice compact example.
>>>
>>> Thanks again,
>>> Mark
>>>
>>> On Mon, Apr 13, 2020 at 5:44 PM Junchao Zhang <[email protected]>
>>> wrote:
>>>
>>>> Mark,
>>>>  I saw you had "--with-threadsaftey --with-log=0".  Do you really want
>>>> to call petsc from multiple threads (in contrast to letting petsc call
>>>> other libraries, e.g., BLAS, doing multithreading)?  If not, you can
>>>> drop --with-threadsaftey.
>>>>  I have https://gitlab.com/petsc/petsc/-/merge_requests/2714 that
>>>> should fix your original compilation errors.
>>>>
>>>> --Junchao Zhang
>>>>
>>>> On Mon, Apr 13, 2020 at 2:07 PM Mark Adams <[email protected]> wrote:
>>>>
>>>>> https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
>>>>>
>>>>> and I see this on my Mac:
>>>>>
>>>>> 14:23 1 mark/feature-xgc-interface-rebase *= ~/Codes/petsc$
>>>>> ../arch-macosx-gnu-O-omp.py
>>>>>
>>>>>
>>>>>
>>>>> ===============================================================================
>>>>>              Configuring PETSc to compile on your system
>>>>>
>>>>>
>>>>> ===============================================================================
>>>>> ===============================================================================
>>>>>
>>>>>
>>>>>                Warning: PETSC_ARCH from environment does not match
>>>>> command-line or name of script.
>>>>>
>>>>>                                      Warning: Using from command-line or
>>>>> name of script: arch-macosx-gnu-O-omp, ignoring environment:
>>>>> arch-macosx-gnu-g
>>>>>
>>>>> ===============================================================================
>>>>>
>>>>>
>>>>>          TESTING: configureLibraryOptions from
>>>>> PETSc.options.libraryOptions(config/PETSc/options/libraryOptions.py:37)
>>>>>
>>>>>
>>>>>  
>>>>> *******************************************************************************
>>>>>          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log
>>>>> for details):
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>> Must use --with-log=0 with --with-threadsafety
>>>>>
>>>>> *******************************************************************************
>>>>>
>>>>>
>>>>> On Mon, Apr 13, 2020 at 2:54 PM Junchao Zhang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 13, 2020 at 12:06 PM Mark Adams <[email protected]> wrote:
>>>>>>
>>>>>>> BTW, I can build on SUMMIT with logging and OMP, apparently. I also
>>>>>>> seem to be able to build with debugging. Both of which are not allowed
>>>>>>> according the the docs. I am puzzled.
>>>>>>>
>>>>>>  What are "the docs"?
>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 13, 2020 at 12:05 PM Mark Adams <[email protected]> wrote:
>>>>>>>
>>>>>>>> I think the problem is that you have to turn off logging with
>>>>>>>> openmp and the (newish) GPU timers did not protect their timers.
>>>>>>>>
>>>>>>>> I don't see a good reason to require logging be turned off with
>>>>>>>> OMP. We could use PETSC_HAVE_THREADSAFETY to protect logs that we care
>>>>>>>> about (eg, in MatSetValues) and as users discover more things that they
>>>>>>>> want to call in an OMP thread block, then tell them to turn logging 
>>>>>>>> off and
>>>>>>>> we will fix it when we can.
>>>>>>>>
>>>>>>>> Any thoughts on the idea of letting users keep logging with openmp?
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2020 at 11:40 AM Junchao Zhang <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Yes. Looks we need to include petsclog.h. Don't know why OMP
>>>>>>>>> triggered the error.
>>>>>>>>> --Junchao Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2020 at 9:59 AM Mark Adams <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Should I do an MR to fix this?
>>>>>>>>>>
>>>>>>>>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-dev] CUDA + OMP make error

Reply via email to