Re: [petsc-users] [petsc-maint] DMSwarm on multiple processors

2023-12-18 Thread Joauma Marichal via petsc-users
Hello,

Sorry for the delay. I attach the file that I obtain when running the code with 
the debug mode.

Thanks for your help.

Best regards,

Joauma

De : Matthew Knepley 
Date : jeudi, 23 novembre 2023 à 15:32
À : Joauma Marichal 
Cc : petsc-ma...@mcs.anl.gov , petsc-users@mcs.anl.gov 

Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Thu, Nov 23, 2023 at 9:01 AM Joauma Marichal 
mailto:joauma.maric...@uclouvain.be>> wrote:
Hello,

My problem persists… Is there anything I could try?

Yes. It appears to be failing from a call inside PetscSFSetUpRanks(). It does 
allocation, and the failure
is in libc, and it only happens on larger examples, so I suspect some 
allocation problem. Can you rebuild with debugging and run this example? Then 
we can see if the allocation fails.

  Thanks,

 Matt

Thanks a lot.

Best regards,

Joauma

De : Matthew Knepley mailto:knep...@gmail.com>>
Date : mercredi, 25 octobre 2023 à 14:45
À : Joauma Marichal 
mailto:joauma.maric...@uclouvain.be>>
Cc : petsc-ma...@mcs.anl.gov 
mailto:petsc-ma...@mcs.anl.gov>>, 
petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Wed, Oct 25, 2023 at 8:32 AM Joauma Marichal via petsc-maint 
mailto:petsc-ma...@mcs.anl.gov>> wrote:
Hello,

I am using the DMSwarm library in some Eulerian-Lagrangian approach to have 
vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. 
Unfortunately, when I increase the number of processors used to run the 
simulation, I get the following error:


free(): invalid size

[cns136:590327] *** Process received signal ***

[cns136:590327] Signal: Aborted (6)

[cns136:590327] Signal code:  (-6)

[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]

[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]

[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]

[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]

[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]

[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]

[cns136:590327] [ 6] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]

[cns136:590327] [ 7] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]

[cns136:590327] [ 8] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]

[cns136:590327] [ 9] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]

[cns136:590327] [10] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]

[cns136:590327] [11] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]

[cns136:590327] [12] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]

[cns136:590327] [13] ./cobpor[0x4418dc]

[cns136:590327] [14] ./cobpor[0x408b63]

[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]

[cns136:590327] [16] ./cobpor[0x40bdee]

[cns136:590327] *** End of error message ***

--

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--

--

mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on 
signal 6 (Aborted).

--

When I reduce the number of processors the error disappears and when I run my 
code without the vapor bubbles it also works.
The problem seems to take place at this moment:

DMCreate(PETSC_COMM_WORLD,swarm);
DMSetType(*swarm,DMSWARM);
DMSetDimension(*swarm,3);
DMSwarmSetType(*swarm,DMSWARM_PIC);
DMSwarmSetCellDM(*swarm,*dmcell);


Thanks a lot for your help.

Things that would help us track this down:

1) The smallest example where it fails

2) The smallest number of processes where it fails

3) A stack trace of the failure

4) A simple example that we can run that also fails

  Thanks,

 Matt

Best regards,

Joauma


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


Re: [petsc-users] [petsc-maint] DMSwarm on multiple processors

2023-11-23 Thread Joauma Marichal via petsc-users
Hello,

My problem persists… Is there anything I could try?

Thanks a lot.

Best regards,

Joauma

De : Matthew Knepley 
Date : mercredi, 25 octobre 2023 à 14:45
À : Joauma Marichal 
Cc : petsc-ma...@mcs.anl.gov , petsc-users@mcs.anl.gov 

Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Wed, Oct 25, 2023 at 8:32 AM Joauma Marichal via petsc-maint 
mailto:petsc-ma...@mcs.anl.gov>> wrote:
Hello,

I am using the DMSwarm library in some Eulerian-Lagrangian approach to have 
vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. 
Unfortunately, when I increase the number of processors used to run the 
simulation, I get the following error:


free(): invalid size

[cns136:590327] *** Process received signal ***

[cns136:590327] Signal: Aborted (6)

[cns136:590327] Signal code:  (-6)

[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]

[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]

[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]

[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]

[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]

[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]

[cns136:590327] [ 6] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]

[cns136:590327] [ 7] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]

[cns136:590327] [ 8] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]

[cns136:590327] [ 9] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]

[cns136:590327] [10] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]

[cns136:590327] [11] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]

[cns136:590327] [12] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]

[cns136:590327] [13] ./cobpor[0x4418dc]

[cns136:590327] [14] ./cobpor[0x408b63]

[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]

[cns136:590327] [16] ./cobpor[0x40bdee]

[cns136:590327] *** End of error message ***

--

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--

--

mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on 
signal 6 (Aborted).

--

When I reduce the number of processors the error disappears and when I run my 
code without the vapor bubbles it also works.
The problem seems to take place at this moment:

DMCreate(PETSC_COMM_WORLD,swarm);
DMSetType(*swarm,DMSWARM);
DMSetDimension(*swarm,3);
DMSwarmSetType(*swarm,DMSWARM_PIC);
DMSwarmSetCellDM(*swarm,*dmcell);


Thanks a lot for your help.

Things that would help us track this down:

1) The smallest example where it fails

2) The smallest number of processes where it fails

3) A stack trace of the failure

4) A simple example that we can run that also fails

  Thanks,

 Matt

Best regards,

Joauma


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


[petsc-users] Joauma Marichal a partagé le dossier « marha » avec vous

2023-10-26 Thread Joauma Marichal via petsc-users
[Partager l'image]

Joauma Marichal a partagé un dossier avec vous

Joauma Marichal a partagé ce dossier avec vous.


[icon]  marha
[permission globe icon] Ce lien ne fonctionne que pour les 
destinataires directs de ce message.
Ouvrir 

[Microsoft logo][cid:faf45f49-2eb0-45c1-831d-d87e9a739e5c]
Déclaration de confidentialité 



Re: [petsc-users] [petsc-maint] DMSwarm on multiple processors

2023-10-26 Thread Joauma Marichal via petsc-users
Hello,



Here is a very simple version where I have issues.



Which I run as follows:



cd Grid_generation

make clean

make all

./grid_generation

cd ..

make clean

make all

./cobpor # on 1 proc

# OR

mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct 
-pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs


The error that I get is the following:

munmap_chunk(): invalid pointer

[cns266:2552391] *** Process received signal ***

[cns266:2552391] Signal: Aborted (6)

[cns266:2552391] Signal code:  (-6)

[cns266:2552391] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7fd7fd194b20]

[cns266:2552391] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fd7fd194a9f]

[cns266:2552391] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fd7fd167e05]

[cns266:2552391] [ 3] /lib64/libc.so.6(+0x91037)[0x7fd7fd1d7037]

[cns266:2552391] [ 4] /lib64/libc.so.6(+0x9819c)[0x7fd7fd1de19c]

[cns266:2552391] [ 5] /lib64/libc.so.6(+0x9844c)[0x7fd7fd1de44c]

[cns266:2552391] [ 6] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e]

[cns266:2552391] [ 7] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad]

[cns266:2552391] [ 8] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59]

[cns266:2552391] [ 9] ./cobpor[0x402df9]

[cns266:2552391] [10] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7fd7fd180cf3]

[cns266:2552391] [11] ./cobpor[0x40304e]

[cns266:2552391] *** End of error message ***





Thanks a lot for your help.



Best regards,



Joauma




De : Matthew Knepley 
Date : mercredi, 25 octobre 2023 à 14:45
À : Joauma Marichal 
Cc : petsc-ma...@mcs.anl.gov , petsc-users@mcs.anl.gov 

Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Wed, Oct 25, 2023 at 8:32 AM Joauma Marichal via petsc-maint 
mailto:petsc-ma...@mcs.anl.gov>> wrote:
Hello,

I am using the DMSwarm library in some Eulerian-Lagrangian approach to have 
vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. 
Unfortunately, when I increase the number of processors used to run the 
simulation, I get the following error:


free(): invalid size

[cns136:590327] *** Process received signal ***

[cns136:590327] Signal: Aborted (6)

[cns136:590327] Signal code:  (-6)

[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]

[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]

[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]

[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]

[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]

[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]

[cns136:590327] [ 6] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]

[cns136:590327] [ 7] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]

[cns136:590327] [ 8] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]

[cns136:590327] [ 9] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]

[cns136:590327] [10] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]

[cns136:590327] [11] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]

[cns136:590327] [12] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]

[cns136:590327] [13] ./cobpor[0x4418dc]

[cns136:590327] [14] ./cobpor[0x408b63]

[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]

[cns136:590327] [16] ./cobpor[0x40bdee]

[cns136:590327] *** End of error message ***

--

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--

--

mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on 
signal 6 (Aborted).

--

When I reduce the number of processors the error disappears and when I run my 
code without the vapor bubbles it also works.
The problem seems to take place at this moment:

DMCreate(PETSC_COMM_WORLD,swarm);
DMSetType(*swarm,DMSWARM);
DMSetDimension(*swarm,3);
DMSwarmSetType(*swarm,DMSWARM_PIC);
DMSwarmSetCellDM(*swarm,*dmcell);


Thanks a lot for your help.

Things that would help us track this down:

1) The smallest example where it fails

2) The smallest number of processes where it fails

3) A stack trace of the 

[petsc-users] DMSwarm on multiple processors

2023-10-25 Thread Joauma Marichal via petsc-users
Hello,

I am using the DMSwarm library in some Eulerian-Lagrangian approach to have 
vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. 
Unfortunately, when I increase the number of processors used to run the 
simulation, I get the following error:


free(): invalid size

[cns136:590327] *** Process received signal ***

[cns136:590327] Signal: Aborted (6)

[cns136:590327] Signal code:  (-6)

[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]

[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]

[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]

[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]

[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]

[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]

[cns136:590327] [ 6] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]

[cns136:590327] [ 7] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]

[cns136:590327] [ 8] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]

[cns136:590327] [ 9] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]

[cns136:590327] [10] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]

[cns136:590327] [11] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]

[cns136:590327] [12] 
/gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]

[cns136:590327] [13] ./cobpor[0x4418dc]

[cns136:590327] [14] ./cobpor[0x408b63]

[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]

[cns136:590327] [16] ./cobpor[0x40bdee]

[cns136:590327] *** End of error message ***

--

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--

--

mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on 
signal 6 (Aborted).

--

When I reduce the number of processors the error disappears and when I run my 
code without the vapor bubbles it also works.
The problem seems to take place at this moment:

DMCreate(PETSC_COMM_WORLD,swarm);
DMSetType(*swarm,DMSWARM);
DMSetDimension(*swarm,3);
DMSwarmSetType(*swarm,DMSWARM_PIC);
DMSwarmSetCellDM(*swarm,*dmcell);


Thanks a lot for your help.

Best regards,

Joauma