Re: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Pierre Jolivet Sun, 26 Feb 2023 11:42:38 -0800


> On 26 Feb 2023, at 8:07 PM, Mike Michell <[email protected]> wrote:
> 
> I cannot agree with this argument, unless you also tested with petsc 3.18.4 
> tarball from https://petsc.org/release/install/download/. 
> If library has issue, it is trivial that I will see an error from my code. 
> 
> I ran my code with valgrind and see no error if it is with petsc 3.18.4. You 
> can test with my code with valgrind or address sanitizer with this version of 
> petsc-3.18.4.tar.gz from (https://petsc.org/release/install/download/). I 
> expect you see no error.
> 
> 
> Let me ask my question differently: 
> Has any change been made on DMPlexMarkBoundaryFaces() recently?


Yes, and it will may break your application if you are not careful about it: 
https://gitlab.com/petsc/petsc/-/commit/a29bf4df3e5335fbd3b27b552a624c7f2a5a2f0a

Thanks,
Pierre

> I found that the latest petsc does not recognize parallel (but not physical) 
> boundary as boundary for distributed dm (line 235 of my example code). 
> Because of this, you saw the error from the arrays:
> 
> ! midpoint of median-dual face for inner face
>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
> 
> and these were allocated here
> 
>  allocate(xc(ncell))
>  allocate(yc(ncell))
> 
> as you pointed out. Or any change made on distribution of dm over procs?
> 
> Thanks,
> Mike
> 
>> 
>> On Sun, Feb 26, 2023 at 11:32 AM Mike Michell <[email protected] 
>> <mailto:[email protected]>> wrote:
>>> This is what I get from petsc main which is not correct:
>>> Overall volume computed from median-dual ...
>>>    6.37050098781844     
>>> Overall volume computed from PETSc ...
>>>    3.15470053800000
>>> 
>>> 
>>> This is what I get from petsc 3.18.4 which is correct:
>>> Overall volume computed from median-dual ...
>>>    3.15470053800000     
>>> Overall volume computed from PETSc ...
>>>    3.15470053800000
>>> 
>>> 
>>> If there is a problem in the code, it is also strange for me that petsc 
>>> 3.18.4 gives the correct answer
>> 
>> As I said, this can happen due to different layouts in memory. If you run it 
>> under valgrind, or address sanitizer, you will see
>> that there is a problem.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>>> Thanks,
>>> Mike
>>> 
>>>> 
>>>> On Sun, Feb 26, 2023 at 11:19 AM Mike Michell <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>> Which version of petsc you tested? With petsc 3.18.4, median duan volume 
>>>>> gives the same value with petsc from DMPlexComputeCellGeometryFVM(). 
>>>> 
>>>> This is only an accident of the data layout. The code you sent writes over 
>>>> memory in the local Fortran arrays.
>>>> 
>>>>   Thanks,
>>>> 
>>>>      Matt
>>>>  
>>>>>> 
>>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> My apologies for the late follow-up. There was a time conflict. 
>>>>>>> 
>>>>>>> A simple example code related to the issue I mentioned is attached 
>>>>>>> here. The sample code does: (1) load grid on dm, (2) compute 
>>>>>>> vertex-wise control volume for each node in a median-dual way, (3) halo 
>>>>>>> exchange among procs to have complete control volume values, and (4) 
>>>>>>> print out its field as a .vtu file. To make sure, the computed control 
>>>>>>> volume is also compared with PETSc-computed control volume via 
>>>>>>> DMPlexComputeCellGeometryFVM() (see lines 771-793). 
>>>>>>> 
>>>>>>> Back to the original problem, I can get a proper control volume field 
>>>>>>> with PETSc 3.18.4, which is the latest stable release. However, if I 
>>>>>>> use PETSc from the main repo, it gives a strange control volume field. 
>>>>>>> Something is certainly strange around the parallel boundaries, thus I 
>>>>>>> think something went wrong with halo communication. To help understand, 
>>>>>>> a comparing snapshot is also attached. I guess a certain part of my 
>>>>>>> code is no longer compatible with PETSc unless there is a bug in the 
>>>>>>> library. Could I get comments on it?
>>>>>> 
>>>>>> I can run your example. The numbers I get for "median-dual volume" do 
>>>>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover, 
>>>>>> the median-dual numbers change, which suggests a memory fault. I 
>>>>>> compiled it using address sanitizer, and it found an error:
>>>>>> 
>>>>>>  Number of physical boundary edge ...            4           0  
>>>>>>  Number of physical and parallel boundary edge ...            4          
>>>>>>  0  
>>>>>>  Number of parallel boundary edge ...            0           0  
>>>>>>  Number of physical boundary edge ...            4           1  
>>>>>>  Number of physical and parallel boundary edge ...            4          
>>>>>>  1  
>>>>>>  Number of parallel boundary edge ...            0           1  
>>>>>> =================================================================
>>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address 
>>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8
>>>>>> READ of size 8 at 0x603000022d40 thread T0
>>>>>> =================================================================
>>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address 
>>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8
>>>>>> READ of size 8 at 0x60300000f0f0 thread T0
>>>>>>     #0 0x10cf702a7 in MAIN__ test.F90:657
>>>>>>     #1 0x10cf768ee in main test.F90:43
>>>>>>     #0 0x1068e12a7 in MAIN__ test.F90:657
>>>>>>     #1 0x1068e78ee in main test.F90:43
>>>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>> 
>>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region 
>>>>>> [0x60300000f0d0,0x60300000f0f0)
>>>>>> allocated by thread T0 here:
>>>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>> 
>>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region 
>>>>>> [0x603000022d20,0x603000022d40)
>>>>>> allocated by thread T0 here:
>>>>>>     #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f)
>>>>>>     #1 0x1068dba71 in MAIN__ test.F90:499
>>>>>>     #2 0x1068e78ee in main test.F90:43
>>>>>>     #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>> 
>>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__
>>>>>> Shadow bytes around the buggy address:
>>>>>> 
>>>>>> which corresponds to
>>>>>> 
>>>>>>      ! midpoint of median-dual face for inner face
>>>>>>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>>>>>>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>>>>>> 
>>>>>> and these were allocated here
>>>>>> 
>>>>>>  allocate(xc(ncell))
>>>>>>  allocate(yc(ncell))
>>>>>> 
>>>>>> Hopefully the error is straightforward to see now.
>>>>>> 
>>>>>>   Thanks,
>>>>>> 
>>>>>>     Matt
>>>>>>  
>>>>>>> Thanks,
>>>>>>> Mike
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> As a follow-up, I tested: 
>>>>>>>>>> 
>>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab 
>>>>>>>>>> (https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on 
>>>>>>>>>> DMPlex halo exchange. This version works as I expect.
>>>>>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) 
>>>>>>>>>> has issues with DMPlex halo exchange. Something is suspicious about 
>>>>>>>>>> this main branch, related to DMPlex halo. The solution field I got 
>>>>>>>>>> is not correct. But it works okay with 1-proc. 
>>>>>>>>>> 
>>>>>>>>>> Does anyone have any comments on this issue? I am curious if other 
>>>>>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not 
>>>>>>>>>> declare ghost layers for halo exchange. 
>>>>>>>>> 
>>>>>>>>> There should not have been any changes there and there are definitely 
>>>>>>>>> tests for this.
>>>>>>>>> 
>>>>>>>>> It would be great if you could send something that failed. I could 
>>>>>>>>> fix it and add it as a test.
>>>>>>>> 
>>>>>>>> Just to follow up, we have tests of the low-level communication (Plex 
>>>>>>>> tests ex1, ex12, ex18, ex29, ex31), and then we have
>>>>>>>> tests that use halo exchange for PDE calculations, for example SNES 
>>>>>>>> tutorial ex12, ex13, ex62. THe convergence rates
>>>>>>>> should be off if the halo exchange were wrong. Is there any example 
>>>>>>>> similar to your code that is failing on your installation?
>>>>>>>> Or is there a way to run your code?
>>>>>>>> 
>>>>>>>>   Thanks,
>>>>>>>> 
>>>>>>>>      Matt
>>>>>>>>  
>>>>>>>>>   Thanks,
>>>>>>>>> 
>>>>>>>>>      Matt
>>>>>>>>>  
>>>>>>>>>> Thanks,
>>>>>>>>>> Mike
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Dear PETSc team,
>>>>>>>>>>> 
>>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this 
>>>>>>>>>>> version of PETSc:
>>>>>>>>>>> >>git rev-parse origin
>>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1
>>>>>>>>>>> >>git rev-parse FETCH_HEAD
>>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b
>>>>>>>>>>> 
>>>>>>>>>>> There has been no issue, before the one with VTK viewer, which Jed 
>>>>>>>>>>> fixed today 
>>>>>>>>>>> (https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735).
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> Since that MR has been merged into the main repo, I pulled the 
>>>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But 
>>>>>>>>>>> if I use the same configure option with before, and run my code, 
>>>>>>>>>>> then there is an issue with halo exchange. The code runs without 
>>>>>>>>>>> error message, but it gives wrong solution field. I guess the issue 
>>>>>>>>>>> I have is related to graph partitioner or halo exchange part. This 
>>>>>>>>>>> is because if I run the code with 1-proc, the solution is correct. 
>>>>>>>>>>> I only updated the version of PETSc and there was no change in my 
>>>>>>>>>>> own code. Could I get any comments on the issue? I was wondering if 
>>>>>>>>>>> there have been many changes in halo exchange or graph partitioning 
>>>>>>>>>>> & distributing part related to DMPlex.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mike
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> What most experimenters take for granted before they begin their 
>>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>>> their experiments lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>>> 
>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> What most experimenters take for granted before they begin their 
>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>> their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>> 
>>>>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> What most experimenters take for granted before they begin their 
>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>> their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>> 
>>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>> <http://www.cse.buffalo.edu/~knepley/>
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Reply via email to