Re: [petsc-users] Performance problem using COO interface

2023-01-17 Thread Zhang, Junchao via petsc-users
Hi, Philip,
  Could you add -log_view and see what functions are used in the solve? Since 
it is CPU-only, perhaps with -log_view of different runs, we can easily see 
which functions slowed down.

--Junchao Zhang

From: Fackler, Philip 
Sent: Tuesday, January 17, 2023 4:13 PM
To: xolotl-psi-developm...@lists.sourceforge.net 
; petsc-users@mcs.anl.gov 

Cc: Mills, Richard Tran ; Zhang, Junchao 
; Blondel, Sophie ; Roth, Philip 

Subject: Performance problem using COO interface

In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's 
COO interface for creating the Jacobian matrix (and the Kokkos interface for 
interacting with Vec entries). As the attached plots show for one case, while 
the code for computing the RHSFunction and RHSJacobian perform similarly (or 
slightly better) after the port, the performance for the solve as a whole is 
significantly worse.

Note:
This is all CPU-only (so kokkos and kokkos-kernels are built with only the 
serial backend).
The dev version is using MatSetValuesStencil with the default implementations 
for Mat and Vec.
The port version is using MatSetValuesCOO and is run with -dm_mat_type 
aijkokkos -dm_vec_type kokkos​.
The port/def version is using MatSetValuesCOO and is run with -dm_vec_type 
kokkos​ (using the default Mat implementation).

So, this seems to be due be a performance difference in the petsc 
implementations. Please advise. Is this a known issue? Or am I missing 
something?

Thank you for the help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory


Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Zhang, Junchao via petsc-users

MPI rank distribution (e.g., 8 ranks per node or 16 ranks per node) is usually 
managed by workload managers like Slurm, PBS through your job scripts, which is 
out of petsc’s control.

From: Amin Sadeghi 
Date: Wednesday, March 25, 2020 at 4:40 PM
To: Junchao Zhang 
Cc: Mark Adams , PETSc users list 
Subject: Re: [petsc-users] Poor speed up for KSP example 45

Junchao, thank you for doing the experiment, I guess TACC Frontera nodes have 
higher memory bandwidth (maybe more modern CPU architecture, although I'm not 
familiar as to which hardware affect memory bandwidth) than Compute Canada's 
Graham.

Mark, I did as you suggested. As you suspected, running make streams yielded 
the same results, indicating that the memory bandwidth saturated at around 8 
MPI processes. I ran the experiment on multiple nodes but only requested 8 
cores per node, and here is the result:

1 node (8 cores total): 17.5s, 6X speedup
2 nodes (16 cores total): 13.5s, 7X speedup
3 nodes (24 cores total): 9.4s, 10X speedup
4 nodes (32 cores total): 8.3s, 12X speedup
5 nodes (40 cores total): 7.0s, 14X speedup
6 nodes (48 cores total): 61.4s, 2X speedup [!!!]
7 nodes (56 cores total): 4.3s, 23X speedup
8 nodes (64 cores total): 3.7s, 27X speedup

Note: as you can see, the experiment with 6 nodes showed extremely poor 
scaling, which I guess was an outlier, maybe due to some connection problem?

I also ran another experiment, requesting 2 full nodes, i.e. 64 cores, and 
here's the result:

2 nodes (64 cores total): 6.0s, 16X speedup [32 cores each node]

So, it turns out that given a fixed number of cores, i.e. 64 in our case, much 
better speedups (27X vs. 16X in our case) can be achieved if they are 
distributed among separate nodes.

Anyways, I really appreciate all your inputs.

One final question: From what I understand from Mark's comment, PETSc at the 
moment is blind to memory hierarchy, is it feasible to make PETSc aware of the 
inter and intra node communication so that partitioning is done to maximize 
performance? Or, to put it differently, is this something that PETSc devs have 
their eyes on for the future?


Sincerely,
Amin


On Wed, Mar 25, 2020 at 3:51 PM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
I repeated your experiment on one node of TACC Frontera,
1 rank: 85.0s
16 ranks: 8.2s, 10x speedup
32 ranks: 5.7s, 15x speedup

--Junchao Zhang


On Wed, Mar 25, 2020 at 1:18 PM Mark Adams 
mailto:mfad...@lbl.gov>> wrote:
Also, a better test is see where streams pretty much saturates, then run that 
many processors per node and do the same test by increasing the nodes. This 
will tell you how well your network communication is doing.

But this result has a lot of stuff in "network communication" that can be 
further evaluated. The worst thing about this, I would think, is that the 
partitioning is blind to the memory hierarchy of inter and intra node 
communication. The next thing to do is run with an initial grid that puts one 
cell per node and the do uniform refinement, until you have one cell per 
process (eg, one refinement step using 8 processes per node), partition to get 
one cell per process, then do uniform refinement to get a reasonable sized 
local problem. Alas, this is not easy to do, but it is doable.

On Wed, Mar 25, 2020 at 2:04 PM Mark Adams 
mailto:mfad...@lbl.gov>> wrote:
I would guess that you are saturating the memory bandwidth. After you make 
PETSc (make all) it will suggest that you test it (make test) and suggest that 
you run streams (make streams).

I see Matt answered but let me add that when you make streams you will seed the 
memory rate for 1,2,3, ... NP processes. If your machine is decent you should 
see very good speed up at the beginning and then it will start to saturate. You 
are seeing about 50% of perfect speedup at 16 process. I would expect that you 
will see something similar with streams. Without knowing your machine, your 
results look typical.

On Wed, Mar 25, 2020 at 1:05 PM Amin Sadeghi 
mailto:aminthefr...@gmail.com>> wrote:
Hi,

I ran KSP example 45 on a single node with 32 cores and 125GB memory using 1, 
16 and 32 MPI processes. Here's a comparison of the time spent during KSP.solve:

- 1 MPI process: ~98 sec, speedup: 1X
- 16 MPI processes: ~12 sec, speedup: ~8X
- 32 MPI processes: ~11 sec, speedup: ~9X

Since the problem size is large enough (8M unknowns), I expected a speedup much 
closer to 32X, rather than 9X. Is this expected? If yes, how can it be improved?

I've attached three log files for more details.

Sincerely,
Amin


Re: [petsc-users] Choosing VecScatter Method in Matrix-Vector Product

2020-01-27 Thread Zhang, Junchao via petsc-users

--Junchao Zhang


On Mon, Jan 27, 2020 at 10:09 AM Felix Huber 
mailto:st107...@stud.uni-stuttgart.de>> wrote:
Thank you all for you reply!

> Are you using a KSP/PC configuration which should weak scale?
Yes the system is solved with KSPSolve. There is no preconditioner yet,
but I fixed the number of CG iterations to 3 to ensure an apples to
apples comparison during the scaling measurements.

>> VecScatter has been greatly refactored (and the default implementation
>> is entirely new) since 3.7.

I now tried to use PETSc 3.11 and the code runs fine. The communication
seems to show a better weak scaling behavior now.

I'll see if we can just upgrade to 3.11.



> Anyway, I'm curious about your
> configuration and how you determine that MPI_Alltoallv/MPI_Alltoallw is
> being used.
I used the Extrae profiler which intercepts all MPI calls and logs them
into a file. This showed that Alltoall is being used for the
communication, which I found surprising. With PETSc 3.11 the Alltoall
calls are replaced by MPI_Start(all) and MPI_Wait(all), which sounds
more reasonable to me.
> This has never been a default code path, so I suspect
> something in your environment or code making this happen.

I attached some log files for some PETSc 3.7 runs on 1,19 and 115 nodes
(24 cores each) which suggest polynomial scaling (vs logarithmic
scaling). Could it be some installation setting of the PETSc version? (I
use a preinstalled PETSc)
I checked petsc 3.7.6 and did not think the vecscatter type could be set at 
configure time.  Anyway, upgrading petsc is preferred. If that is not possible, 
we can work together to see what happened.

> Can you please send representative log files which characterize the
> lack of scaling (include the full log_view)?

"Stage 1: activation" is the stage of interest, as it wraps the
KSPSolve. The number of unkowns per rank is very small in the
measurement, so most of the time should be communication. However, I
just noticed, that the stage also contains an additional setup step
which might be the reason why the MatMul takes longer than the KSPSolve.
I can repeat the measurements if necessary.
I should add, that I put a MPI_Barrier before the KSPSolve, to avoid any
previous work imbalance to effect the KSPSolve call.

You can use -log_sync, which adds an MPI_Barrier at the beginning of each 
event. Compare log_view files with and without -log_sync. If an event has much 
higher %T without -log_sync than with -log_sync, it means the code is not 
balanced. Alternatively, you can look at the Ratio column in log file without 
-log_sync.

Best regards,
Felix



Re: [petsc-users] DMDA Error

2020-01-24 Thread Zhang, Junchao via petsc-users
fs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/libfabric.so.1
 (0x2afd30344000)
libXau.so.6 => /lib64/libXau.so.6 (0x2afd3057c000)

--Junchao Zhang


On Tue, Jan 21, 2020 at 2:25 AM Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>> wrote:
Hello,

I made a test to try to reproduce the error.
To do so I modified the file $PETSC_DIR/src/dm/examples/tests/ex35.c
I attach the file in case of need.

The same error is reproduced for 1024 mpi ranks. I tested two problem sizes 
(2*512+1x2*64+1x2*256+1 and 2*1024+1x2*128+1x2*512+1) and the error occured for 
both cases, the first case is also the one I used to run before the OS and mpi 
updates.
I also run the code with -malloc_debug and nothing more appeared.

I attached the configure command I used to build a debug version of petsc.

Thank you for your time,
Sincerly.
Anthony Jourdon



De : Zhang, Junchao mailto:jczh...@mcs.anl.gov>>
Envoyé : jeudi 16 janvier 2020 16:49
À : Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>>
Cc : petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Objet : Re: [petsc-users] DMDA Error

It seems the problem is triggered by DMSetUp. You can write a small test 
creating the DMDA with the same size as your code, to see if you can reproduce 
the problem. If yes, it would be much easier for us to debug it.
--Junchao Zhang


On Thu, Jan 16, 2020 at 7:38 AM Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>> wrote:

Dear Petsc developer,


I need assistance with an error.


I run a code that uses the DMDA related functions. I'm using petsc-3.8.4.


This code used to run very well on a super computer with the OS SLES11.

Petsc was built using an intel mpi 5.1.3.223 module and intel mkl version 
2016.0.2.181

The code was running with no problem on 1024 and more mpi ranks.


Recently, the OS of the computer has been updated to RHEL7

I rebuilt Petsc using new available versions of intel mpi (2019U5) and mkl 
(2019.0.5.281) which are the same versions for compilers and mkl.

Since then I tested to run the exact same code on 8, 16, 24, 48, 512 and 1024 
mpi ranks.

Until 1024 mpi ranks no problem, but for 1024 an error related to DMDA 
appeared. I snip the first lines of the error stack here and the full error 
stack is attached.


[534]PETSC ERROR: #1 PetscGatherMessageLengths() line 120 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/sys/utils/mpimesg.c

[534]PETSC ERROR: #2 VecScatterCreate_PtoS() line 2288 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vpscat.c

[534]PETSC ERROR: #3 VecScatterCreate() line 1462 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vscat.c

[534]PETSC ERROR: #4 DMSetUp_DA_3D() line 1042 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/da3.c

[534]PETSC ERROR: #5 DMSetUp_DA() line 25 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/dareg.c

[534]PETSC ERROR: #6 DMSetUp() line 720 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/interface/dm.c



Thank you for your time,

Sincerly,


Anthony Jourdon


Re: [petsc-users] DMDA Error

2020-01-21 Thread Zhang, Junchao via petsc-users
I submitted a job and I am waiting for the result.
--Junchao Zhang


On Tue, Jan 21, 2020 at 3:03 AM Dave May 
mailto:dave.mayhe...@gmail.com>> wrote:
Hi Anthony,

On Tue, 21 Jan 2020 at 08:25, Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>> wrote:
Hello,

I made a test to try to reproduce the error.
To do so I modified the file $PETSC_DIR/src/dm/examples/tests/ex35.c
I attach the file in case of need.

The same error is reproduced for 1024 mpi ranks. I tested two problem sizes 
(2*512+1x2*64+1x2*256+1 and 2*1024+1x2*128+1x2*512+1) and the error occured for 
both cases, the first case is also the one I used to run before the OS and mpi 
updates.
I also run the code with -malloc_debug and nothing more appeared.

I attached the configure command I used to build a debug version of petsc.

The error indicates the problem occurs on the bold line below (e.g. within 
MPI_Isend())


  /* Post the Isends with the message length-info */

  for (i=0,j=0; imailto:jczh...@mcs.anl.gov>>
Envoyé : jeudi 16 janvier 2020 16:49
À : Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>>
Cc : petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Objet : Re: [petsc-users] DMDA Error

It seems the problem is triggered by DMSetUp. You can write a small test 
creating the DMDA with the same size as your code, to see if you can reproduce 
the problem. If yes, it would be much easier for us to debug it.
--Junchao Zhang


On Thu, Jan 16, 2020 at 7:38 AM Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>> wrote:

Dear Petsc developer,


I need assistance with an error.


I run a code that uses the DMDA related functions. I'm using petsc-3.8.4.


This code used to run very well on a super computer with the OS SLES11.

Petsc was built using an intel mpi 5.1.3.223 module and intel mkl version 
2016.0.2.181

The code was running with no problem on 1024 and more mpi ranks.


Recently, the OS of the computer has been updated to RHEL7

I rebuilt Petsc using new available versions of intel mpi (2019U5) and mkl 
(2019.0.5.281) which are the same versions for compilers and mkl.

Since then I tested to run the exact same code on 8, 16, 24, 48, 512 and 1024 
mpi ranks.

Until 1024 mpi ranks no problem, but for 1024 an error related to DMDA 
appeared. I snip the first lines of the error stack here and the full error 
stack is attached.


[534]PETSC ERROR: #1 PetscGatherMessageLengths() line 120 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/sys/utils/mpimesg.c

[534]PETSC ERROR: #2 VecScatterCreate_PtoS() line 2288 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vpscat.c

[534]PETSC ERROR: #3 VecScatterCreate() line 1462 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vscat.c

[534]PETSC ERROR: #4 DMSetUp_DA_3D() line 1042 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/da3.c

[534]PETSC ERROR: #5 DMSetUp_DA() line 25 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/dareg.c

[534]PETSC ERROR: #6 DMSetUp() line 720 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/interface/dm.c



Thank you for your time,

Sincerly,


Anthony Jourdon


Re: [petsc-users] DMDA Error

2020-01-16 Thread Zhang, Junchao via petsc-users
It seems the problem is triggered by DMSetUp. You can write a small test 
creating the DMDA with the same size as your code, to see if you can reproduce 
the problem. If yes, it would be much easier for us to debug it.
--Junchao Zhang


On Thu, Jan 16, 2020 at 7:38 AM Anthony Jourdon 
mailto:jourdon_anth...@hotmail.fr>> wrote:

Dear Petsc developer,


I need assistance with an error.


I run a code that uses the DMDA related functions. I'm using petsc-3.8.4.


This code used to run very well on a super computer with the OS SLES11.

Petsc was built using an intel mpi 5.1.3.223 module and intel mkl version 
2016.0.2.181

The code was running with no problem on 1024 and more mpi ranks.


Recently, the OS of the computer has been updated to RHEL7

I rebuilt Petsc using new available versions of intel mpi (2019U5) and mkl 
(2019.0.5.281) which are the same versions for compilers and mkl.

Since then I tested to run the exact same code on 8, 16, 24, 48, 512 and 1024 
mpi ranks.

Until 1024 mpi ranks no problem, but for 1024 an error related to DMDA 
appeared. I snip the first lines of the error stack here and the full error 
stack is attached.


[534]PETSC ERROR: #1 PetscGatherMessageLengths() line 120 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/sys/utils/mpimesg.c

[534]PETSC ERROR: #2 VecScatterCreate_PtoS() line 2288 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vpscat.c

[534]PETSC ERROR: #3 VecScatterCreate() line 1462 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vscat.c

[534]PETSC ERROR: #4 DMSetUp_DA_3D() line 1042 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/da3.c

[534]PETSC ERROR: #5 DMSetUp_DA() line 25 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/dareg.c

[534]PETSC ERROR: #6 DMSetUp() line 720 in 
/scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/interface/dm.c



Thank you for your time,

Sincerly,


Anthony Jourdon


Re: [petsc-users] error related to nested vector

2020-01-14 Thread Zhang, Junchao via petsc-users
Do you have a test example?
--Junchao Zhang

On Tue, Jan 14, 2020 at 4:44 AM Y. Shidi 
mailto:ys...@cam.ac.uk>> wrote:
Dear developers,

I have a 2x2 nested matrix and the corresponding nested vector.
When I running the code with field splitting, it gets the following
errors:

[0]PETSC ERROR: PetscTrFreeDefault() called from VecRestoreArray_Nest()
line 678 in /home/ys453/Sources/petsc/src/vec/vec/impls/nest/vecnest.c
[0]PETSC ERROR: Block at address 0x3f95f60 is corrupted; cannot free;
may be block not allocated with PetscMalloc()
[0]PETSC ERROR: - Error Message
--
[0]PETSC ERROR: Memory corruption:
http://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind
[0]PETSC ERROR: Bad location or corrupted memory
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.9.3, unknown
[0]PETSC ERROR: 2DPetscSpuriousTest on a arch-linux2-c-debug named
merlin by ys453 Tue Jan 14 10:36:53 2020
[0]PETSC ERROR: Configure options --download-scalapack --download-mumps
--download-parmetis --download-metis --download-ptscotch
--download-superlu_dist --download-hypre
[0]PETSC ERROR: #1 PetscTrFreeDefault() line 269 in
/home/ys453/Sources/petsc/src/sys/memory/mtr.c
[0]PETSC ERROR: #2 VecRestoreArray_Nest() line 678 in
/home/ys453/Sources/petsc/src/vec/vec/impls/nest/vecnest.c
[0]PETSC ERROR: #3 VecRestoreArrayRead() line 1835 in
/home/ys453/Sources/petsc/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: #4 VecRestoreArrayPair() line 511 in
/home/ys453/Sources/petsc/include/petscvec.h
[0]PETSC ERROR: #5 VecScatterBegin_SSToSS() line 671 in
/home/ys453/Sources/petsc/src/vec/vscat/impls/vscat.c
[0]PETSC ERROR: #6 VecScatterBegin() line 1779 in
/home/ys453/Sources/petsc/src/vec/vscat/impls/vscat.c
[0]PETSC ERROR: #7 PCApply_FieldSplit() line 1010 in
/home/ys453/Sources/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c
[0]PETSC ERROR: #8 PCApply() line 457 in
/home/ys453/Sources/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #9 KSP_PCApply() line 276 in
/home/ys453/Sources/petsc/include/petsc/private/kspimpl.h
[0]PETSC ERROR: #10 KSPFGMRESCycle() line 166 in
/home/ys453/Sources/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
[0]PETSC ERROR: #11 KSPSolve_FGMRES() line 291 in
/home/ys453/Sources/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
[0]PETSC ERROR: #12 KSPSolve() line 669 in
/home/ys453/Sources/petsc/src/ksp/ksp/interface/itfunc.c

I am not sure why it happens.

Thank you for your time.

Kind Regards,
Shidi


Re: [petsc-users] PetscOptionsGetBool error

2020-01-08 Thread Zhang, Junchao
A deprecated option won't cause segfault. From 
https://www.mcs.anl.gov/petsc/petsc-current/src/dm/label/examples/tutorials/ex1f90.F90.html,
 it seems you missed the first PETSC_NULL_OPTIONS.

--Junchao Zhang


On Wed, Jan 8, 2020 at 4:02 PM Anthony Paul Haas 
mailto:a...@email.arizona.edu>> wrote:
Hello,

I am using Petsc 3.7.6.0. with Fortran code and I am getting a segmentation 
violation for the following line:

call 
PetscOptionsGetBool(PETSC_NULL_CHARACTER,"-use_mumps_lu",flg_mumps_lu,flg,self%ierr_ps)

in which:
flg_mumps_lu and flg are defined as PetscBool and
flg_mumps_lu = PETSC_TRUE

Is the option -use_mumps_lu deprecated?

Thanks,

Anthony



Re: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad

2019-11-05 Thread Zhang, Junchao via petsc-users
Fixed in https://gitlab.com/petsc/petsc/merge_requests/2262
--Junchao Zhang


On Fri, Nov 1, 2019 at 6:51 PM Sajid Ali 
mailto:sajidsyed2...@u.northwestern.edu>> 
wrote:
Hi Junchao/Barry,

It doesn't really matter what the h5 file contains,  so I'm attaching a lightly 
edited script of src/vec/vec/examples/tutorials/ex10.c which should produce a 
vector to be used as input for the above test case. (I'm working with ` 
--with-scalar-type=complex`).

Now that I think of it, fixing this bug is not important, I can workaround the 
issue by creating a new vector with VecCreateMPI and accept the small loss in 
performance of VecPointwiseMult due to misaligned layouts. If it's a small fix 
it may be worth the time, but fixing this is not a big priority right now. If 
it's a complicated fix, this issue can serve as a note to future users.


Thank You,
Sajid Ali
Applied Physics
Northwestern University
s-sajid-ali.github.io


Re: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad

2019-11-01 Thread Zhang, Junchao via petsc-users
I know nothing about Vec FFTW, but if you can provide hdf5 files in your test, 
I will see if I can reproduce it.
--Junchao Zhang


On Fri, Nov 1, 2019 at 2:08 PM Sajid Ali via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi PETSc-developers,

I'm unable to debug a crash with VecDestroy that seems to depend only on 
whether or not a VecLoad was performed on a vector that was generated by 
duplicating one generated by MatCreateVecsFFTW.

I'm attaching two examples ex1.c and ex2.c. The first one just creates vectors 
aligned as per FFTW layout, duplicates one of them and destroys all at the end. 
A bug related to this was fixed sometime between the 3.11 release and 3.12 
release. I've tested this code with the versions 3.11.1 and 3.12.1 and as 
expected it runs with no issues for 3.12.1 and fails with 3.11.1.

Now, the second one just adds a few lines which load a vector from memory to 
the duplicated vector before destroying all. For some reason, this code fails 
for both 3.11.1 and 3.12.1 versions. I'm lost as to what may cause this error 
and would appreciate any help in how to debug this. Thanks in advance for the 
help!

PS: I've attached the two codes, ex1.c/ex2.c, the log files for both make and 
run and finally a bash script that was run to compile/log and control the 
version of petsc used.


--
Sajid Ali
Applied Physics
Northwestern University
s-sajid-ali.github.io


Re: [petsc-users] Errors with ParMETIS

2019-10-18 Thread Zhang, Junchao via petsc-users
Usually due to uninitialized variables. You can try valgrind. Read tutorial 
from page 3 of https://www.mcs.anl.gov/petsc/petsc-20/tutorial/PETSc1.pdf
--Junchao Zhang


On Fri, Oct 18, 2019 at 6:23 AM Shidi Yan via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Dear developers,

I am using ParMETIS to do dynamic load balancing for the mesh.
If my code is compiled with optimiser options (e.g., -O2 -O3), I have
the following errors when the code calls function from ParMETIS:

***ASSERTION failed on line 176 of file

externalpackages/git.parmetis/libparmetis/comm.c: j == nnbrs

externalpackages/git.parmetis/libparmetis/comm.c:176: libparmetis__CommSetup: 
Assertion `j == nnbrs' failed.

However, if the code is compiled with debugging mode ( -g), I do
not have any errors.

I am wondering is it the bug from my part.

Thank you very much for your time.

Kind Regards,
Shidi


Re: [petsc-users] CUDA-Aware MPI & PETSc

2019-10-07 Thread Zhang, Junchao via petsc-users
Hello, David,
   It took a longer time than I expected to add the CUDA-aware MPI feature in 
PETSc. It is now in PETSc-3.12, released last week. I have a little fix after 
that, so you better use petsc master.  Use petsc option -use_gpu_aware_mpi to 
enable it. On Summit, you also need jsrun --smpiargs="-gpu" to enable IBM 
Spectrum MPI's CUDA support. If you run with multiple MPI ranks per GPU, you 
also need #BSUB -alloc_flags gpumps in your job script.
  My experiments (using a simple test doing repeated MatMult) on Summit is 
mixed. With one MPI rank per GPU, I saw very good performance improvement (up 
to 25%). But with multiple ranks per GPU, I did not see improvement.  That 
sounds absurd since it should be easier for MPI ranks communicate data on the 
same GPU. I'm investigating this issue.
  If you can also evaluate this feature with your production code, that would 
be helpful.
  Thanks.
--Junchao Zhang


On Thu, Aug 22, 2019 at 11:34 AM David Gutzwiller 
mailto:david.gutzwil...@gmail.com>> wrote:
Hello Junchao,

Spectacular news!

I have our production code running on Summit (Power9 + Nvidia V100) and on 
local x86 workstations, and I can definitely provide comparative benchmark data 
with this feature once it is ready.  Just let me know when it is available for 
testing and I'll be happy to contribute.

Thanks,

-David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon>
Virus-free. 
www.avast.com<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link>

On Thu, Aug 22, 2019 at 7:22 AM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
This feature is under active development. I hope I can make it usable in a 
couple of weeks. Thanks.
--Junchao Zhang


On Wed, Aug 21, 2019 at 3:21 PM David Gutzwiller via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I'm currently using PETSc for the GPU acceleration of simple Krylov solver with 
GMRES, without preconditioning.   This is within the framework of our in-house 
multigrid solver.  I am getting a good GPU speedup on the finest grid level but 
progressively worse performance on each coarse level.   This is not surprising, 
but I still hope to squeeze out some more performance, hopefully making it 
worthwhile to run some or all of the coarse grids on the GPU.

I started investigating with nvprof / nsight and essentially came to the same 
conclusion that Xiangdong reported in a recent thread (July 16, "MemCpy (HtoD 
and DtoH) in Krylov solver").  My question is a follow-up to that thread:

The MPI communication is staged from the host, which results in some H<->D 
transfers for every mat-vec operation.   A CUDA-aware MPI implementation might 
avoid these transfers for communication between ranks that are assigned to the 
same accelerator.   Has this been implemented or tested?

In our solver we typically run with multiple MPI ranks all assigned to a single 
device, and running with a single rank is not really feasible as we still have 
a sizable amount of work for the CPU to chew through.  Thus, I think quite a 
lot of the H<->D transfers could be avoided if I can skip the MPI staging on 
the host. I am quite new to PETSc so I wanted to ask around before blindly 
digging into this.

Thanks for your help,

David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon>
Virus-free. 
www.avast.com<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link>


Re: [petsc-users] [petsc-maint] petsc ksp solver hangs

2019-09-28 Thread Zhang, Junchao via petsc-users
Does it hang with  2 or 4 processes? Which PETSc version do you use (using the 
latest is easier for us to debug)?  Did you configure PETSc with 
--with-debugging=yes COPTFLAGS="-O0 -g"  CXXOPTFLAGS="-O0 -g"
After attaching gdb to one process, you can use bt  to see its stack trace.

--Junchao Zhang


On Sat, Sep 28, 2019 at 5:33 AM Michael Wick 
mailto:michael.wick.1...@gmail.com>> wrote:
I attached a debugger to my run. The code just hangs without throwing an error 
message, interestingly. I uses 72 processors. I turned on the ksp monitor. And 
I can see it hangs either at the beginning or the end of KSP iteration. I also 
uses valgrind to debug my code on my local machine, which does not detect any 
issue. I uses fgmres + fieldsplit, which is really a standard option.

Do you have any suggestions to do?

On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
How many MPI ranks did you use? If it is done on your desktop, you can just 
attach a debugger to a MPI process to see what is going on.

--Junchao Zhang


On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint 
mailto:petsc-ma...@mcs.anl.gov>> wrote:
Hi PETSc:

I have been experiencing a code stagnation at certain KSP iterations. This 
happens rather randomly, which means the code may stop at the middle of a KSP 
solve and hangs there.

I have used valgrind and detect nothing. I just wonder if you have any 
suggestions.

Thanks!!!
M


Re: [petsc-users] Clarification of INSERT_VALUES for vec with ghost nodes

2019-09-26 Thread Zhang, Junchao via petsc-users
With VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE), the owner will get 
updated by ghost values. So in your case 1, proc0 gets either value1 or value2 
from proc1/2;  in case 2; proc0 gets either value0 or value2 from proc1/2.
In short, you could not achieve your goal with INSERT_VALUES. Though you can do 
it with other interfaces in PETSc, e.g., PetscSFReduceBegin/End, I believe it 
is better to extend VecGhostUpdate to support MAX/MIN_VALUES, because it is a 
simpler interface for you and it is very easy to add.

Could you try branch jczhang/feature-vscat-min-values to see if it works for 
you?  See the end of src/vec/vec/examples/tutorials/ex9.c for an example of the 
new functionality. Use mpirun -n 2 ./ex9 -minvalues to test it and its expected 
output is output/ex9_2.out
Petsc will have a new release this weekend. Let's see whether I can put it in 
the new release.

Thanks.
--Junchao Zhang


On Thu, Sep 26, 2019 at 3:28 AM Aulisa, Eugenio 
mailto:eugenio.aul...@ttu.edu>> wrote:




On Wed, Sep 25, 2019 at 9:11 AM Aulisa, Eugenio via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

I have a vector with ghost nodes where each process may or may not change the 
value of a specific ghost node  (using INSERT_VALUES).

At the end I would like for each process, that see a particular ghost node, to 
have the smallest of the set values.
Do you mean owner of ghost nodes gets the smallest values. That is, in your 
example below, proc0 gets Min(value0, value1, value2)?
If I can get the Min(value0, value1, value2) on the owner then I can scatter it 
forward with INSERT_VALUES to all processes that ghost it. And if there is a 
easy way to get Min(value0, value1, value2) on the owner (or on all processes) 
I would like to know.

Since I do not think there is a straightforward way to achieve that, I was 
looking at a workaround, and to do that I need to know the behavior of scatter 
reverse in the cases described below. Notice that I used the option 
INSERT_VALUES which I am not even sure is allowed.

I do not think there is a straightforward way to achieve this, but I would like 
to be wrong.

Any suggestion?



To build a work around I need to understand better the behavior of 
VecGhostUpdateBegin(...);  VecGhostUpdateEnd(...).

In particular in the documentation I do not see the option

VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE);
VecGhostUpdateEnd(v, INSERT_VALUES, SCATTER_REVERSE);

In case this is possible to be used, what is the behavior of this call in the 
following two cases?

1) Assume that node-i belongs to proc0, and is ghosted in proc1 and proc2, also
assume that the current value of node-i is value0 and proc0 does not modify it, 
but proc1 and proc2 do.

start with:
proc0 -> value0
proc1 -> value0
proc2 -> value0

change to:
proc0 -> value0
proc1 -> value1
proc2 -> value2

I assume that calling
VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE);
VecGhostUpdateEnd(v, INSERT_VALUES, SCATTER_REVERSE);
will have an unpredictable behavior as

proc0 -> either value1 or value2
proc1 -> value1
proc2 -> value2

2) Assume now that node-i belongs to proc0, and is ghosted in proc1 and proc2, 
also
assume that the current value of node-i is value0 and proc0 and proc1 do not 
modify it, but proc2 does.

start with:
proc0 -> value0
proc1 -> value0
proc2 -> value0

change to:
proc0 -> value0
proc1 -> value0
proc2 -> value2

Is the call
VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE);
VecGhostUpdateEnd(v, INSERT_VALUES, SCATTER_REVERSE);
still unpredictable?

proc0 -> either value0 or value2
proc1 -> value0
proc2 -> value2

or

proc0 -> value2  (since proc1 did not modify the original value, so it did not 
reverse scatter)
proc1 -> value0
proc2 -> value2

Thanks a lot for your help
Eugenio











Re: [petsc-users] Clarification of INSERT_VALUES for vec with ghost nodes

2019-09-25 Thread Zhang, Junchao via petsc-users

On Wed, Sep 25, 2019 at 9:11 AM Aulisa, Eugenio via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

I have a vector with ghost nodes where each process may or may not change the 
value of a specific ghost node  (using INSERT_VALUES).

At the end I would like for each process, that see a particular ghost node, to 
have the smallest of the set values.
Do you mean owner of ghost nodes gets the smallest values. That is, in your 
example below, proc0 gets Min(value0, value1, value2)?


I do not think there is a straightforward way to achieve this, but I would like 
to be wrong.

Any suggestion?



To build a work around I need to understand better the behavior of 
VecGhostUpdateBegin(...);  VecGhostUpdateEnd(...).

In particular in the documentation I do not see the option

VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE);
VecGhostUpdateEnd(v, INSERT_VALUES, SCATTER_REVERSE);

In case this is possible to be used, what is the behavior of this call in the 
following two cases?

1) Assume that node-i belongs to proc0, and is ghosted in proc1 and proc2, also
assume that the current value of node-i is value0 and proc0 does not modify it, 
but proc1 and proc2 do.

start with:
proc0 -> value0
proc1 -> value0
proc2 -> value0

change to:
proc0 -> value0
proc1 -> value1
proc2 -> value2

I assume that calling
VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE);
VecGhostUpdateEnd(v, INSERT_VALUES, SCATTER_REVERSE);
will have an unpredictable behavior as

proc0 -> either value1 or value2
proc1 -> value1
proc2 -> value2

2) Assume now that node-i belongs to proc0, and is ghosted in proc1 and proc2, 
also
assume that the current value of node-i is value0 and proc0 and proc1 do not 
modify it, but proc2 does.

start with:
proc0 -> value0
proc1 -> value0
proc2 -> value0

change to:
proc0 -> value0
proc1 -> value0
proc2 -> value2

Is the call
VecGhostUpdateBegin(v, INSERT_VALUES, SCATTER_REVERSE);
VecGhostUpdateEnd(v, INSERT_VALUES, SCATTER_REVERSE);
still unpredictable?

proc0 -> either value0 or value2
proc1 -> value0
proc2 -> value2

or

proc0 -> value2  (since proc1 did not modify the original value, so it did not 
reverse scatter)
proc1 -> value0
proc2 -> value2

Thanks a lot for your help
Eugenio









Re: [petsc-users] VecAssembly gets stuck

2019-09-13 Thread Zhang, Junchao via petsc-users
When processes get stuck, you can attach gdb to one process and back trace its 
call stack to see what it is doing, so we can have better understanding.

--Junchao Zhang


On Fri, Sep 13, 2019 at 11:31 AM José Lorenzo via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am solving a finite element problem with Dirichlet boundary conditions using 
PETSC. In the boundary conditions there are two terms: a first one that is 
known before hand (normally zero) and a second term that depends linearly on 
the unknown variable itself in the whole domain. Therefore, at every time step 
I need to iterate as the boundary condition depends on the field and the latter 
depends on the BC. Moreover, the problem is nonlinear and I use a ghosted 
vector to represent the field.

Every processor manages a portion of the domain and a portion of the boundary 
(if not interior). At every Newton iteration within the time loop the way I set 
the boundary conditions is as follows:

First, each processor computes the known term of the BC (first term) and 
inserts the values into the vector

call VecSetValues(H, nedge_own, edglocglo(diredg_loc) - 1, Hdir, INSERT_VALUES, 
ierr)
call VecAssemblyBegin(H, ierr)
call VecAssemblyEnd(H, ierr)

As far as I understand, at this stage VecAssembly will not need to communicate 
to other processors as each processor only sets values to components that 
belong to it.

Then, each processor computes its own contribution to the field-dependent term 
of the BC for the whole domain boundary as

call VecSetValues(H, nedge_all, edgappglo(diredg_app) - 1, Hself, ADD_VALUES, 
ierr)
call VecAssemblyBegin(H, ierr)
call VecAssemblyEnd(H, ierr)

In this case communication will be needed as each processor will add values to 
vector components that are not stored by it, and I guess it might get very busy 
as all the processors will need to communicate with each other.

When using this strategy I don't find any issue for problems using a small 
amount of processors, but recently I've been solving using 90 processors and 
the simulation always hangs at the second VecSetValues at some random time 
step. It works fine for some time steps but at some point it just gets stuck 
and I have to cancel the simulation.

I have managed to overcome this by making each processor contribute to its own 
components using first MPI_Reduce and then doing

call VecSetValues(H, nedge_own, edgappglo(diredg_app_loc), Hself_own, 
ADD_VALUES, ierr)
call VecAssemblyBegin(H, ierr)
call VecAssemblyEnd(H, ierr)

However I would like to understand whether there is something wrong in the code 
above.

Thank you.



Re: [petsc-users] CUDA-Aware MPI & PETSc

2019-08-22 Thread Zhang, Junchao via petsc-users
Definitely I will do. Thanks.
--Junchao Zhang


On Thu, Aug 22, 2019 at 11:34 AM David Gutzwiller 
mailto:david.gutzwil...@gmail.com>> wrote:
Hello Junchao,

Spectacular news!

I have our production code running on Summit (Power9 + Nvidia V100) and on 
local x86 workstations, and I can definitely provide comparative benchmark data 
with this feature once it is ready.  Just let me know when it is available for 
testing and I'll be happy to contribute.

Thanks,

-David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon>
Virus-free. 
www.avast.com<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link>

On Thu, Aug 22, 2019 at 7:22 AM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
This feature is under active development. I hope I can make it usable in a 
couple of weeks. Thanks.
--Junchao Zhang


On Wed, Aug 21, 2019 at 3:21 PM David Gutzwiller via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I'm currently using PETSc for the GPU acceleration of simple Krylov solver with 
GMRES, without preconditioning.   This is within the framework of our in-house 
multigrid solver.  I am getting a good GPU speedup on the finest grid level but 
progressively worse performance on each coarse level.   This is not surprising, 
but I still hope to squeeze out some more performance, hopefully making it 
worthwhile to run some or all of the coarse grids on the GPU.

I started investigating with nvprof / nsight and essentially came to the same 
conclusion that Xiangdong reported in a recent thread (July 16, "MemCpy (HtoD 
and DtoH) in Krylov solver").  My question is a follow-up to that thread:

The MPI communication is staged from the host, which results in some H<->D 
transfers for every mat-vec operation.   A CUDA-aware MPI implementation might 
avoid these transfers for communication between ranks that are assigned to the 
same accelerator.   Has this been implemented or tested?

In our solver we typically run with multiple MPI ranks all assigned to a single 
device, and running with a single rank is not really feasible as we still have 
a sizable amount of work for the CPU to chew through.  Thus, I think quite a 
lot of the H<->D transfers could be avoided if I can skip the MPI staging on 
the host. I am quite new to PETSc so I wanted to ask around before blindly 
digging into this.

Thanks for your help,

David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=icon>
Virus-free. 
www.avast.com<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail_term=link>


Re: [petsc-users] CUDA-Aware MPI & PETSc

2019-08-22 Thread Zhang, Junchao via petsc-users
This feature is under active development. I hope I can make it usable in a 
couple of weeks. Thanks.
--Junchao Zhang


On Wed, Aug 21, 2019 at 3:21 PM David Gutzwiller via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I'm currently using PETSc for the GPU acceleration of simple Krylov solver with 
GMRES, without preconditioning.   This is within the framework of our in-house 
multigrid solver.  I am getting a good GPU speedup on the finest grid level but 
progressively worse performance on each coarse level.   This is not surprising, 
but I still hope to squeeze out some more performance, hopefully making it 
worthwhile to run some or all of the coarse grids on the GPU.

I started investigating with nvprof / nsight and essentially came to the same 
conclusion that Xiangdong reported in a recent thread (July 16, "MemCpy (HtoD 
and DtoH) in Krylov solver").  My question is a follow-up to that thread:

The MPI communication is staged from the host, which results in some H<->D 
transfers for every mat-vec operation.   A CUDA-aware MPI implementation might 
avoid these transfers for communication between ranks that are assigned to the 
same accelerator.   Has this been implemented or tested?

In our solver we typically run with multiple MPI ranks all assigned to a single 
device, and running with a single rank is not really feasible as we still have 
a sizable amount of work for the CPU to chew through.  Thus, I think quite a 
lot of the H<->D transfers could be avoided if I can skip the MPI staging on 
the host. I am quite new to PETSc so I wanted to ask around before blindly 
digging into this.

Thanks for your help,

David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]
Virus-free. 
www.avast.com


Re: [petsc-users] Different behavior of code on different machines

2019-07-20 Thread Zhang, Junchao via petsc-users
Did you used the same number of MPI ranks, same build options on your pc and on 
cluster? If not, you can try to align options on your pc with those on your 
cluster to see if you can reproduce the error on your pc. You can also try 
valgrind to see if there are memory errors like use of uninitialized variables 
etc.

--Junchao Zhang


On Sat, Jul 20, 2019 at 11:35 AM Yuyun Yang 
mailto:yyan...@stanford.edu>> wrote:
I already tested on my pc with multiple processors and it works fine. I used 
the command $PETSC_DIR/$PETSC_ARCH/bin/mpiexec -n 2 since I configured my PETSc 
with MPICH, but my local computer has openmpi.

Best,
Yuyun

From: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>
Sent: Saturday, July 20, 2019 9:14 AM
To: Yuyun Yang mailto:yyan...@stanford.edu>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] Different behavior of code on different machines

You need to test on your personal computer with multiple MPI processes (e.g., 
mpirun -n 2 ...) before moving to big machines. You may also need to configure 
petsc with --with-dedugging=1 --COPTFLAGS="-O0 -g" etc to ease debugging.
--Junchao Zhang


On Sat, Jul 20, 2019 at 11:03 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello team,

I’m encountering a problem with my code’s behavior on multiple processors. When 
I run it on my personal computer it works just fine, but when I use it on our 
computing cluster it produces an error (in one of the root-finding functions, 
an assert statement is not satisfied) and aborts.

If I just run on one processor then both machines can run the code just fine, 
but they give different results (maybe due to roundoff errors).

I’m not sure how to proceed with debugging (since I usually do it on my own 
computer which didn’t seem to encounter a bug) and would appreciate your 
advice. Thank you!

Best regards,
Yuyun


Re: [petsc-users] Different behavior of code on different machines

2019-07-20 Thread Zhang, Junchao via petsc-users
You need to test on your personal computer with multiple MPI processes (e.g., 
mpirun -n 2 ...) before moving to big machines. You may also need to configure 
petsc with --with-dedugging=1 --COPTFLAGS="-O0 -g" etc to ease debugging.
--Junchao Zhang


On Sat, Jul 20, 2019 at 11:03 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello team,

I’m encountering a problem with my code’s behavior on multiple processors. When 
I run it on my personal computer it works just fine, but when I use it on our 
computing cluster it produces an error (in one of the root-finding functions, 
an assert statement is not satisfied) and aborts.

If I just run on one processor then both machines can run the code just fine, 
but they give different results (maybe due to roundoff errors).

I’m not sure how to proceed with debugging (since I usually do it on my own 
computer which didn’t seem to encounter a bug) and would appreciate your 
advice. Thank you!

Best regards,
Yuyun


Re: [petsc-users] VecGhostRestoreLocalForm

2019-07-20 Thread Zhang, Junchao via petsc-users



On Sat, Jul 20, 2019 at 5:47 AM José Lorenzo via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am not sure I understand the function VecGhostRestoreLocalForm. If I proceed 
as stated in the manual,



VecGhostUpdateBegin(x,INSERT_VALUES,SCATTER_FORWARD);



VecGhostUpdateEnd(x,INSERT_VALUES,SCATTER_FORWARD);



VecGhostGetLocalForm(x,);



VecGetArray(xlocal,);


   // access the non-ghost values in locations xvalues[0:n-1] and ghost 
values in locations xvalues[n:n+nghost];



VecRestoreArray(xlocal,);



VecGhostRestoreLocalForm(x,)


Does VecRestoreArray update the values in the local vector xlocal, and then 
VecGhostRestoreLocalForm update the values of the global vector x?

Yes, you can think  VecRestoreArray finalizes the updates to xlocal. 
VecGhostRestoreLocalForm does not update global vector. It is for bookkeeping 
purposes.
x and xlocal share the same memory that contains the actual vector data. If you 
changed ghost points through xvalues[], to get the global vector x updated, you 
have to call  VecGhostUpdateBegin/End after above code, for example, to ADD two 
ghosts.


Does one need to call these two functions?

Yes.  In PETSc, *Get and *Restore have to be paired.


Re: [petsc-users] Communication during MatAssemblyEnd

2019-07-01 Thread Zhang, Junchao via petsc-users
Jose & Ale,
   -ds_method 2 fixed the problem.   I used PETSc master (f1480a5c) and slepc 
master(675b89d7) through --download-slepc. I used MKL 
/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/
   I got the following results with 2048 processors.  MatAssemblyEnd looks 
expensive to me. I am looking into it.

--- Event Stage 5: Offdiag

BuildTwoSidedF 1 1.0 1.5201e+007345.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  13  0  0  0  0 0
MatAssemblyBegin   1 1.0 1.5201e+005371.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  13  0  0  0  0 0
MatAssemblyEnd 1 1.0 1.7720e+00 1.0 0.00e+00 0.0 5.7e+04 1.3e+05 
8.0e+00  2  0  0  0  1  39  0100100100 0
VecSet 1 1.0 2.4695e-02 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0

--Junchao Zhang


On Mon, Jul 1, 2019 at 4:28 AM Jose E. Roman 
mailto:jro...@dsic.upv.es>> wrote:
You can try the following:
- Try with a different DS method: -ds_method 1  or  -ds_method 2  (see 
DSSetMethod)
- Run with -ds_parallel synchronized (see DSSetParallel)
If it does not help, send a reproducible code to slepc-maint

Jose


> El 1 jul 2019, a las 11:10, Ale Foggia via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> escribió:
>
> Oh, I also got the same error when I switched to the newest version of SLEPc 
> (using OpenBlas), and I don't know where it is coming from.
> Can you tell me which version of SLEPc and PETSc are you using? And, are you 
> using MKL?
> Thanks for trying :)
>
> El vie., 28 jun. 2019 a las 16:57, Zhang, Junchao 
> (mailto:jczh...@mcs.anl.gov>>) escribió:
> Ran with 64 nodes and 32 ranks/node, met  slepc errors and did not know how 
> to proceed :(
>
> [363]PETSC ERROR: - Error Message 
> --
> [363]PETSC ERROR: Error in external library
> [363]PETSC ERROR: Error in LAPACK subroutine steqr: info=0
> [363]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
> for trouble shooting.
> [363]PETSC ERROR: Petsc Development GIT revision: v3.11.2-1052-gf1480a5c  GIT 
> Date: 2019-06-22 21:39:54 +
> [363]PETSC ERROR: /tmp/main.x on a arch-cray-xc40-knl-opt named nid03387 by 
> jczhang Fri Jun 28 07:26:59 2019
> [1225]PETSC ERROR: #2 DSSolve() line 586 in 
> /global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/sys/classes/ds/interface/dsops.c
> [1225]PETSC ERROR: #3 EPSSolve_KrylovSchur_Symm() line 55 in 
> /global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/impls/krylov/krylovschur/ks-symm.c
> [1225]PETSC ERROR: #4 EPSSolve() line 149 in 
> /global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/interface/epssolve.c
> [240]PETSC ERROR: #2 DSSolve() line 586 in 
> /global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/sys/classes/ds/interface/dsops.c
> [240]PETSC ERROR: #3 EPSSolve_KrylovSchur_Symm() line 55 in 
> /global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/impls/krylov/krylovschur/ks-symm.c
> [240]PETSC ERROR: #4 EPSSolve() line 149 in 
> /global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/interface/epssolve.c
>
> --Junchao Zhang
>
>
> On Fri, Jun 28, 2019 at 4:02 AM Ale Foggia 
> mailto:amfog...@gmail.com>> wrote:
> Junchao,
> I'm sorry for the late response.
>
> El mié., 26 jun. 2019 a las 16:39, Zhang, Junchao 
> (mailto:jczh...@mcs.anl.gov>>) escribió:
> Ale,
> The job got a chance to run but failed with out-of-memory, "Some of your 
> processes may have been killed by the cgroup out-of-memory handler."
>
> I mentioned that I used 1024 nodes and 32 processes on each node because the 
> application needs a lot of memory. I think that for a system of size 38, one 
> needs above 256 nodes for sure (assuming only 32 procs per node). I would try 
> with 512 if it's possible.
>
> I also tried with 128 core with ./main.x 2 ... and got a weird error message  
> "The size of the basis has to be at least equal to the number 
>  of MPI processes used."
>
> The error comes from the fact that you put a system size of only 2 which is 
> too small.
> I can also see the problem in the assembly with system sizes smaller than 38, 
> so you can try with like 30 (for which I also have a log). In that case I run 
> with 64 nodes and 32 processes per node. I think the problem may also fit in 
> 32 nodes.
>
> --Junchao Zhang
>
>
> On Tue, Jun 25, 2019 at 11:24 PM Junchao Zhang 
> mailto:jczh...@mcs.anl.gov>> wrote:
> Ale,
>   I successfully built your code and submitted a job to the NE

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-28 Thread Zhang, Junchao via petsc-users
Ran with 64 nodes and 32 ranks/node, met  slepc errors and did not know how to 
proceed :(

[363]PETSC ERROR: - Error Message 
--
[363]PETSC ERROR: Error in external library
[363]PETSC ERROR: Error in LAPACK subroutine steqr: info=0
[363]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[363]PETSC ERROR: Petsc Development GIT revision: v3.11.2-1052-gf1480a5c  GIT 
Date: 2019-06-22 21:39:54 +
[363]PETSC ERROR: /tmp/main.x on a arch-cray-xc40-knl-opt named nid03387 by 
jczhang Fri Jun 28 07:26:59 2019
[1225]PETSC ERROR: #2 DSSolve() line 586 in 
/global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/sys/classes/ds/interface/dsops.c
[1225]PETSC ERROR: #3 EPSSolve_KrylovSchur_Symm() line 55 in 
/global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/impls/krylov/krylovschur/ks-symm.c
[1225]PETSC ERROR: #4 EPSSolve() line 149 in 
/global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/interface/epssolve.c
[240]PETSC ERROR: #2 DSSolve() line 586 in 
/global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/sys/classes/ds/interface/dsops.c
[240]PETSC ERROR: #3 EPSSolve_KrylovSchur_Symm() line 55 in 
/global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/impls/krylov/krylovschur/ks-symm.c
[240]PETSC ERROR: #4 EPSSolve() line 149 in 
/global/u1/j/jczhang/petsc/arch-cray-xc40-knl-opt/externalpackages/git.slepc/src/eps/interface/epssolve.c

--Junchao Zhang


On Fri, Jun 28, 2019 at 4:02 AM Ale Foggia 
mailto:amfog...@gmail.com>> wrote:
Junchao,
I'm sorry for the late response.

El mié., 26 jun. 2019 a las 16:39, Zhang, Junchao 
(mailto:jczh...@mcs.anl.gov>>) escribió:
Ale,
The job got a chance to run but failed with out-of-memory, "Some of your 
processes may have been killed by the cgroup out-of-memory handler."

I mentioned that I used 1024 nodes and 32 processes on each node because the 
application needs a lot of memory. I think that for a system of size 38, one 
needs above 256 nodes for sure (assuming only 32 procs per node). I would try 
with 512 if it's possible.

I also tried with 128 core with ./main.x 2 ... and got a weird error message  
"The size of the basis has to be at least equal to the number   
   of MPI processes used."

The error comes from the fact that you put a system size of only 2 which is too 
small.
I can also see the problem in the assembly with system sizes smaller than 38, 
so you can try with like 30 (for which I also have a log). In that case I run 
with 64 nodes and 32 processes per node. I think the problem may also fit in 32 
nodes.

--Junchao Zhang


On Tue, Jun 25, 2019 at 11:24 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
Ale,
  I successfully built your code and submitted a job to the NERSC Cori machine 
requiring 32768 KNL cores and one and a half hours. It is estimated to run in 3 
days. If you also observed the same problem with less cores, what is your input 
arguments?  Currently, I use what in your log file, ./main.x 38 -nn -j1 1.0 -d1 
1.0 -eps_type krylovschur -eps_tol 1e-9 -log_view
  The smaller the better. Thanks.
--Junchao Zhang


On Mon, Jun 24, 2019 at 6:20 AM Ale Foggia 
mailto:amfog...@gmail.com>> wrote:
Yes, I used KNL nodes. I you can perform the test would be great. Could it be 
that I'm not using the correct configuration of the KNL nodes? These are the 
environment variables I set:
MKL_NUM_THREADS=1
OMP_NUM_THREADS=1
KMP_HW_SUBSET=1t
KMP_AFFINITY=compact
I_MPI_PIN_DOMAIN=socket
I_MPI_PIN_PROCESSOR_LIST=0-63
MKL_DYNAMIC=0

The code is in https://github.com/amfoggia/LSQuantumED and it has a readme to 
compile it and run it. When I run the test I used only 32 processors per node, 
and I used 1024 nodes in total, and it's for nspins=38.
Thank you

El vie., 21 jun. 2019 a las 20:03, Zhang, Junchao 
(mailto:jczh...@mcs.anl.gov>>) escribió:
Ale,
  Did you use Intel KNL nodes?  Mr. Hong (cc'ed) did experiments on KNL nodes  
one year ago. He used 32768 processors and called MatAssemblyEnd 118 times and 
it used only 1.5 seconds in total.  So I guess something was wrong with your 
test. If you can share your code, I can have a test on our machine to see how 
it goes.
 Thanks.
--Junchao Zhang


On Fri, Jun 21, 2019 at 11:00 AM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
MatAssembly was called once (in stage 5) and cost 2.5% of the total time.  Look 
at stage 5. It says MatAssemblyBegin calls BuildTwoSidedF, which does global 
synchronization. The high max/min ratio means load imbalance. What I do not 
understand is MatAssemblyEnd. The ratio is 1.0. It means processors are already 
synchronized. With 32768 processors, there are 1.2e+06 messages with average 
length 1.9e+06 bytes. So each processor sends 3

Re: [petsc-users] DMPlexDistributeField

2019-06-27 Thread Zhang, Junchao via petsc-users


On Thu, Jun 27, 2019 at 4:50 PM Adrian Croucher 
mailto:a.crouc...@auckland.ac.nz>> wrote:
hi

On 28/06/19 3:14 AM, Zhang, Junchao wrote:
> You can dump relevant SFs to make sure their graph is correct.


Yes, I'm doing that, and the graphs don't look correct.
Check how the graph is created and then whether the parameters to 
PetscSFSetGraph() are correct.


- Adrian

--
Dr Adrian Croucher
Senior Research Fellow
Department of Engineering Science
University of Auckland, New Zealand
email: a.crouc...@auckland.ac.nz<mailto:a.crouc...@auckland.ac.nz>
tel: +64 (0)9 923 4611



Re: [petsc-users] DMPlexDistributeField

2019-06-27 Thread Zhang, Junchao via petsc-users


On Wed, Jun 26, 2019 at 11:12 PM Adrian Croucher 
mailto:a.crouc...@auckland.ac.nz>> wrote:

hi

On 27/06/19 4:07 PM, Zhang, Junchao wrote:

 Adrian, I am working on SF but know nothing about DMPlexDistributeField. Do 
you think SF creation or communication is wrong? If yes, I'd like to know the 
detail.  I have a branch jczhang/sf-more-opts, which adds some optimizations to 
SF.  It probably won't solve your problem. But since it changes SF a lot, it's 
better to have a try.


My suspicion is that there may be a problem in DMPlexDistribute(), so that the 
distribution SF entries for DMPlex faces are not correct when overlap > 0.

You can dump relevant SFs to make sure their graph is correct.


So it's probably not a problem with SF as such.

- Adrian

--
Dr Adrian Croucher
Senior Research Fellow
Department of Engineering Science
University of Auckland, New Zealand
email: a.crouc...@auckland.ac.nz<mailto:a.crouc...@auckland.ac.nz>
tel: +64 (0)9 923 4611



Re: [petsc-users] DMPlexDistributeField

2019-06-26 Thread Zhang, Junchao via petsc-users


On Mon, Jun 24, 2019 at 6:23 PM Adrian Croucher via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

hi

Thanks Matt for the explanation about this.

I have been trying a test which does the following:

1) read in DMPlex from file

2) distribute it, with overlap = 1, using DMPlexDistribute()

3) create FVM cell and face geometry vectors using DMPlexComputeGeometryFVM()

4) re-distribute, again with overlap = 1, using DMPlexDistribute()

5) distribute the cell and face geometry vectors using DMPlexDistributeField()


Steps 4) and 5) should do essentially nothing, because the mesh has already 
been distributed (but in my actual non-test code, there is additional stuff 
between steps 3) and 4) where dual porosity cells are added to the DM).

So I expect the cell and face geometry vectors to be essentially unchanged from 
the redistribution. And the redistribution SF (from the second distribution) 
should be just an identity mapping on the cell and face points (except for the 
overlap ghost points).

This is true for the cells, but not the faces. I've attached the example code 
and mesh. It is a simple mesh with 10 cells in a horizontal line, each cell 
50x50x50 m.

If I run on 2 processes, there are 5 cells (points 0 - 4) on each rank, with 
centroids at 25, 75, 125, 175 and 225 m on rank 0, and 275, 325, 375, 425 and 
475 m on rank 1. The internal faces are the points 36, 42, 47 and 52 on rank 0, 
and 34, 37, 42, 47 and 52 on rank 1. On rank 0 these should have centroids at 
50, 100, 150 and 200 m respectively; on rank 1 they should be at 250, 300, 350 
and 400 m. This is true before redistribution.

After redistribution, the cells centroids are still correct, and the face data 
on rank 1 are OK, but the face data on rank 0 are all wrong.

If you look at the redistribution SF the entries for the rank 0 face data are 
36 <- (0,40), 42 <- (0,46), 47 <- (0,51), 52 <- (0,56), instead of the expected 
36 <- (0,36), 42 <- (0,42), 47 <- (0,47), 52 <- (0,52). The SF for the rank 1 
faces is OK.

 Adrian, I am working on SF but know nothing about DMPlexDistributeField. Do 
you think SF creation or communication is wrong? If yes, I'd like to know the 
detail.  I have a branch jczhang/sf-more-opts, which adds some optimizations to 
SF.  It probably won't solve your problem. But since it changes SF a lot, it's 
better to have a try.

If you change the overlap from 1 to 0, it works as expected. So it looks to me 
like something isn't quite right with the SF for faces when there is overlap. 
On rank 0 all the entries seem to be shifted up by 4.

I know you originally recommended using overlap = 0 for the initial 
distribution and only adding overlap for the redistribution. But then Stefano 
indicated that it should work with overlap now. And it would simplify my code 
if I could use overlap for the initial distribution (because if dual porosity 
cells are not being used, then there is no second redistribution).

Is this a bug or is there something I'm doing wrong?

- Adrian

On 23/06/19 4:39 PM, Matthew Knepley wrote:
On Fri, Jun 21, 2019 at 12:49 AM Adrian Croucher via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I have been trying to get this FVM geometry data re-distribution to work
using DMPlexDistributeField().

It seems to be working OK for the cell geometry data (cell volumes and
centroids). But it is making a mess of the face geometry data (face
normals and centroids).

Should I even expect DMPlexDistributeField() to work for redistributing
a vector of data defined on mesh faces? Or is there some reason I
haven't thought of, which means that will never work?

Sorry this took a long time. The place I was in France did not have internet.
Here is how this stuff works:

  1) You start with a Section and local vector. The Section describes layout of 
data in
   the local vector by mapping mesh points to {# dof, offset}. For a small 
example,
   suppose I had two triangles sharing an edge on a sequential mesh for 2 
procs.
   The mesh points would be

 [0, 1]:  Cells
 [2, 5]:  Vertices, where 3,4 are shared
 [6, 10]: Edges, where 8 is shared

   A Section for face normals would then look like

Process 0
[0, 5]: {0, 0}   Meaning no variables lie on cells or vertices
6:   {2, 0}   One vector per face
7:   {2, 2}
8:   {2, 4}
9:   {2, 6}
10  {2, 8}
Process 1
empty

 The vector would just have the face normal values in the canonical order. 
You can use
 PetscSectionView() to check that yours looks similar.

  2) Now we add a PetscSF describing the redistribution. An SF is a map from a 
given set of
  integers (leaves) to pairs (int, rank) called roots. Many leaves can 
point to one root. To begin,
  we provide an SF mapping mesh points to the new distribution

Process 0
0 -> {0, 0}
1 -> {2, 0}
2 -> {3, 0}
3 -> {4, 0}

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-26 Thread Zhang, Junchao via petsc-users
Ale,
The job got a chance to run but failed with out-of-memory, "Some of your 
processes may have been killed by the cgroup out-of-memory handler."
I also tried with 128 core with ./main.x 2 ... and got a weird error message  
"The size of the basis has to be at least equal to the number   
   of MPI processes used."
--Junchao Zhang


On Tue, Jun 25, 2019 at 11:24 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
Ale,
  I successfully built your code and submitted a job to the NERSC Cori machine 
requiring 32768 KNL cores and one and a half hours. It is estimated to run in 3 
days. If you also observed the same problem with less cores, what is your input 
arguments?  Currently, I use what in your log file, ./main.x 38 -nn -j1 1.0 -d1 
1.0 -eps_type krylovschur -eps_tol 1e-9 -log_view
  The smaller the better. Thanks.
--Junchao Zhang


On Mon, Jun 24, 2019 at 6:20 AM Ale Foggia 
mailto:amfog...@gmail.com>> wrote:
Yes, I used KNL nodes. I you can perform the test would be great. Could it be 
that I'm not using the correct configuration of the KNL nodes? These are the 
environment variables I set:
MKL_NUM_THREADS=1
OMP_NUM_THREADS=1
KMP_HW_SUBSET=1t
KMP_AFFINITY=compact
I_MPI_PIN_DOMAIN=socket
I_MPI_PIN_PROCESSOR_LIST=0-63
MKL_DYNAMIC=0

The code is in https://github.com/amfoggia/LSQuantumED and it has a readme to 
compile it and run it. When I run the test I used only 32 processors per node, 
and I used 1024 nodes in total, and it's for nspins=38.
Thank you

El vie., 21 jun. 2019 a las 20:03, Zhang, Junchao 
(mailto:jczh...@mcs.anl.gov>>) escribió:
Ale,
  Did you use Intel KNL nodes?  Mr. Hong (cc'ed) did experiments on KNL nodes  
one year ago. He used 32768 processors and called MatAssemblyEnd 118 times and 
it used only 1.5 seconds in total.  So I guess something was wrong with your 
test. If you can share your code, I can have a test on our machine to see how 
it goes.
 Thanks.
--Junchao Zhang


On Fri, Jun 21, 2019 at 11:00 AM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
MatAssembly was called once (in stage 5) and cost 2.5% of the total time.  Look 
at stage 5. It says MatAssemblyBegin calls BuildTwoSidedF, which does global 
synchronization. The high max/min ratio means load imbalance. What I do not 
understand is MatAssemblyEnd. The ratio is 1.0. It means processors are already 
synchronized. With 32768 processors, there are 1.2e+06 messages with average 
length 1.9e+06 bytes. So each processor sends 36 (1.2e+06/32768) ~2MB messages 
and it takes 54 seconds. Another chance is the reduction at  MatAssemblyEnd. I 
don't know why it needs 8 reductions. In my mind, one is enough. I need to look 
at the code.

Summary of Stages:   - Time --  - Flop --  --- Messages ---  -- 
Message Lengths --  -- Reductions --
Avg %Total Avg %TotalCount   %Total 
Avg %TotalCount   %Total
 0:  Main Stage: 8.5045e+02  13.0%  3.0633e+15  14.0%  8.196e+07  13.1%  
7.768e+06   13.1%  2.530e+02  13.0%
 1:Create Basis: 7.9234e-02   0.0%  0.e+00   0.0%  0.000e+00   0.0%  
0.000e+000.0%  0.000e+00   0.0%
 2:  Create Lattice: 8.3944e-05   0.0%  0.e+00   0.0%  0.000e+00   0.0%  
0.000e+000.0%  0.000e+00   0.0%
 3:   Create Hamilt: 1.0694e+02   1.6%  0.e+00   0.0%  0.000e+00   0.0%  
0.000e+000.0%  2.000e+00   0.1%
 5: Offdiag: 1.6525e+02   2.5%  0.e+00   0.0%  1.188e+06   0.2%  
1.942e+060.0%  8.000e+00   0.4%
 6: Phys quantities: 5.4045e+03  82.8%  1.8866e+16  86.0%  5.417e+08  86.7%  
7.768e+06   86.8%  1.674e+03  86.1%

--- Event Stage 5: Offdiag
BuildTwoSidedF 1 1.0 7.1565e+01 148448.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  28  0  0  0  0 0
MatAssemblyBegin   1 1.0 7.1565e+01 127783.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  28  0  0  0  0 0
MatAssemblyEnd 1 1.0 5.3762e+01 1.0  0.00e+00 0.0 1.2e+06 1.9e+06 
8.0e+00  1  0  0  0  0  33  0100100100 0
VecSet 1 1.0 7.5533e-02 9.0  0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0


--Junchao Zhang


On Fri, Jun 21, 2019 at 10:34 AM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

   The load balance is definitely out of whack.



BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatMult  138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 
0.0e+00  2  4 13 13  0  15 25100100  0 2935476
MatAssemblyBegin   1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatAssemblyEnd 1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
VecNorm2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 
2.0e+00  1  0  0  0  0   5  0  0  0  1 12780
VecCopy6 1.0

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-25 Thread Zhang, Junchao via petsc-users
Ale,
  I successfully built your code and submitted a job to the NERSC Cori machine 
requiring 32768 KNL cores and one and a half hours. It is estimated to run in 3 
days. If you also observed the same problem with less cores, what is your input 
arguments?  Currently, I use what in your log file, ./main.x 38 -nn -j1 1.0 -d1 
1.0 -eps_type krylovschur -eps_tol 1e-9 -log_view
  The smaller the better. Thanks.
--Junchao Zhang


On Mon, Jun 24, 2019 at 6:20 AM Ale Foggia 
mailto:amfog...@gmail.com>> wrote:
Yes, I used KNL nodes. I you can perform the test would be great. Could it be 
that I'm not using the correct configuration of the KNL nodes? These are the 
environment variables I set:
MKL_NUM_THREADS=1
OMP_NUM_THREADS=1
KMP_HW_SUBSET=1t
KMP_AFFINITY=compact
I_MPI_PIN_DOMAIN=socket
I_MPI_PIN_PROCESSOR_LIST=0-63
MKL_DYNAMIC=0

The code is in https://github.com/amfoggia/LSQuantumED and it has a readme to 
compile it and run it. When I run the test I used only 32 processors per node, 
and I used 1024 nodes in total, and it's for nspins=38.
Thank you

El vie., 21 jun. 2019 a las 20:03, Zhang, Junchao 
(mailto:jczh...@mcs.anl.gov>>) escribió:
Ale,
  Did you use Intel KNL nodes?  Mr. Hong (cc'ed) did experiments on KNL nodes  
one year ago. He used 32768 processors and called MatAssemblyEnd 118 times and 
it used only 1.5 seconds in total.  So I guess something was wrong with your 
test. If you can share your code, I can have a test on our machine to see how 
it goes.
 Thanks.
--Junchao Zhang


On Fri, Jun 21, 2019 at 11:00 AM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
MatAssembly was called once (in stage 5) and cost 2.5% of the total time.  Look 
at stage 5. It says MatAssemblyBegin calls BuildTwoSidedF, which does global 
synchronization. The high max/min ratio means load imbalance. What I do not 
understand is MatAssemblyEnd. The ratio is 1.0. It means processors are already 
synchronized. With 32768 processors, there are 1.2e+06 messages with average 
length 1.9e+06 bytes. So each processor sends 36 (1.2e+06/32768) ~2MB messages 
and it takes 54 seconds. Another chance is the reduction at  MatAssemblyEnd. I 
don't know why it needs 8 reductions. In my mind, one is enough. I need to look 
at the code.

Summary of Stages:   - Time --  - Flop --  --- Messages ---  -- 
Message Lengths --  -- Reductions --
Avg %Total Avg %TotalCount   %Total 
Avg %TotalCount   %Total
 0:  Main Stage: 8.5045e+02  13.0%  3.0633e+15  14.0%  8.196e+07  13.1%  
7.768e+06   13.1%  2.530e+02  13.0%
 1:Create Basis: 7.9234e-02   0.0%  0.e+00   0.0%  0.000e+00   0.0%  
0.000e+000.0%  0.000e+00   0.0%
 2:  Create Lattice: 8.3944e-05   0.0%  0.e+00   0.0%  0.000e+00   0.0%  
0.000e+000.0%  0.000e+00   0.0%
 3:   Create Hamilt: 1.0694e+02   1.6%  0.e+00   0.0%  0.000e+00   0.0%  
0.000e+000.0%  2.000e+00   0.1%
 5: Offdiag: 1.6525e+02   2.5%  0.e+00   0.0%  1.188e+06   0.2%  
1.942e+060.0%  8.000e+00   0.4%
 6: Phys quantities: 5.4045e+03  82.8%  1.8866e+16  86.0%  5.417e+08  86.7%  
7.768e+06   86.8%  1.674e+03  86.1%

--- Event Stage 5: Offdiag
BuildTwoSidedF 1 1.0 7.1565e+01 148448.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  28  0  0  0  0 0
MatAssemblyBegin   1 1.0 7.1565e+01 127783.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  28  0  0  0  0 0
MatAssemblyEnd 1 1.0 5.3762e+01 1.0  0.00e+00 0.0 1.2e+06 1.9e+06 
8.0e+00  1  0  0  0  0  33  0100100100 0
VecSet 1 1.0 7.5533e-02 9.0  0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0


--Junchao Zhang


On Fri, Jun 21, 2019 at 10:34 AM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

   The load balance is definitely out of whack.



BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatMult  138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 
0.0e+00  2  4 13 13  0  15 25100100  0 2935476
MatAssemblyBegin   1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatAssemblyEnd 1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
VecNorm2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 
2.0e+00  1  0  0  0  0   5  0  0  0  1 12780
VecCopy6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
VecAXPY2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 41000838
VecScatterBegin  138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 
0.0e+00  1  0 13 13  0   4  0100100  0 0
VecScatterEnd138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   4  0  0  0  0 0
VecSetRandom   1 

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Zhang, Junchao via petsc-users
ot; or "Event Stage 
> 5: Offdiag").
>
> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao 
> (mailto:jczh...@mcs.anl.gov>>) escribió:
>
>
> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia 
> mailto:amfog...@gmail.com>> wrote:
> Thanks both of you for your answers,
>
> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. 
> (mailto:bsm...@mcs.anl.gov>>) escribió:
>
>   Note that this is a one time cost if the nonzero structure of the matrix 
> stays the same. It will not happen in future MatAssemblies.
>
> > On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users 
> > mailto:petsc-users@mcs.anl.gov>> wrote:
> >
> > Those messages were used to build MatMult communication pattern for the 
> > matrix. They were not part of the matrix entries-passing you imagined, but 
> > indeed happened in MatAssemblyEnd. If you want to make sure processors do 
> > not set remote entries, you can use 
> > MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an 
> > error when an off-proc entry is set.
>
> I started being concerned about this when I saw that the assembly was taking 
> a few hundreds of seconds in my code, like 180 seconds, which for me it's a 
> considerable time. Do you think (or maybe you need more information to answer 
> this) that this time is "reasonable" for communicating the pattern for the 
> matrix? I already checked that I'm not setting any remote entries.
> It is not reasonable. Could you send log view of that test with 180 seconds 
> MatAssembly?
>
> Also I see (in my code) that even if there are no messages being passed 
> during MatAssemblyBegin, it is taking time and the "ratio" is very big.
>
> >
> >
> > --Junchao Zhang
> >
> >
> > On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users 
> > mailto:petsc-users@mcs.anl.gov>> wrote:
> > Hello all!
> >
> > During the conference I showed you a problem happening during 
> > MatAssemblyEnd in a particular code that I have. Now, I tried the same with 
> > a simple code (a symmetric problem corresponding to the Laplacian operator 
> > in 1D, from the SLEPc Hands-On exercises). As I understand (and please, 
> > correct me if I'm wrong), in this case the elements of the matrix are 
> > computed locally by each process so there should not be any communication 
> > during the assembly. However, in the log I get that there are messages 
> > being passed. Also, the number of messages changes with the number of 
> > processes used and the size of the matrix. Could you please help me 
> > understand this?
> >
> > I attach the code I used and the log I get for a small problem.
> >
> > Cheers,
> > Ale
> >
>
> 



Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Zhang, Junchao via petsc-users


On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia 
mailto:amfog...@gmail.com>> wrote:
Thanks both of you for your answers,

El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. 
(mailto:bsm...@mcs.anl.gov>>) escribió:

  Note that this is a one time cost if the nonzero structure of the matrix 
stays the same. It will not happen in future MatAssemblies.

> On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
>
> Those messages were used to build MatMult communication pattern for the 
> matrix. They were not part of the matrix entries-passing you imagined, but 
> indeed happened in MatAssemblyEnd. If you want to make sure processors do not 
> set remote entries, you can use 
> MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an 
> error when an off-proc entry is set.

I started being concerned about this when I saw that the assembly was taking a 
few hundreds of seconds in my code, like 180 seconds, which for me it's a 
considerable time. Do you think (or maybe you need more information to answer 
this) that this time is "reasonable" for communicating the pattern for the 
matrix? I already checked that I'm not setting any remote entries.
It is not reasonable. Could you send log view of that test with 180 seconds 
MatAssembly?

Also I see (in my code) that even if there are no messages being passed during 
MatAssemblyBegin, it is taking time and the "ratio" is very big.

>
>
> --Junchao Zhang
>
>
> On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
> Hello all!
>
> During the conference I showed you a problem happening during MatAssemblyEnd 
> in a particular code that I have. Now, I tried the same with a simple code (a 
> symmetric problem corresponding to the Laplacian operator in 1D, from the 
> SLEPc Hands-On exercises). As I understand (and please, correct me if I'm 
> wrong), in this case the elements of the matrix are computed locally by each 
> process so there should not be any communication during the assembly. 
> However, in the log I get that there are messages being passed. Also, the 
> number of messages changes with the number of processes used and the size of 
> the matrix. Could you please help me understand this?
>
> I attach the code I used and the log I get for a small problem.
>
> Cheers,
> Ale
>



Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-20 Thread Zhang, Junchao via petsc-users
Those messages were used to build MatMult communication pattern for the matrix. 
They were not part of the matrix entries-passing you imagined, but indeed 
happened in MatAssemblyEnd. If you want to make sure processors do not set 
remote entries, you can use MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), 
which will generate an error when an off-proc entry is set.


--Junchao Zhang


On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello all!

During the conference I showed you a problem happening during MatAssemblyEnd in 
a particular code that I have. Now, I tried the same with a simple code (a 
symmetric problem corresponding to the Laplacian operator in 1D, from the SLEPc 
Hands-On exercises). As I understand (and please, correct me if I'm wrong), in 
this case the elements of the matrix are computed locally by each process so 
there should not be any communication during the assembly. However, in the log 
I get that there are messages being passed. Also, the number of messages 
changes with the number of processes used and the size of the matrix. Could you 
please help me understand this?

I attach the code I used and the log I get for a small problem.

Cheers,
Ale



Re: [petsc-users] Memory growth issue

2019-06-05 Thread Zhang, Junchao via petsc-users
Sanjay,
   You have one more reason to use VecScatter, which is heavily used and 
well-tested.
--Junchao Zhang


On Wed, Jun 5, 2019 at 5:47 PM Sanjay Govindjee 
mailto:s...@berkeley.edu>> wrote:
I found the bug (naturally in my own code).  When I made the MPI_Wait( )
changes, I missed one location where this
was needed.   See the attached graphs for openmpi and mpich using
CG+Jacobi and GMRES+BJacobi.

Interesting that openmpi did not care about this but mpich did. Also
interesting that the memory was growing so much when the size of the
data packets going back and forth where just a few hundred bytes.

Thanks for your efforts and patience.

-sanjay


On 6/5/19 2:38 PM, Smith, Barry F. wrote:
>Are you reusing the same KSP the whole time, just making calls to 
> KSPSolve, or are you creating a new KSP object?
>
>Do you make any calls to KSPReset()?
>
>Are you doing any MPI_Comm_dup()?
>
>Are you attaching any attributes to MPI communicators?
>
> Thanks
>
>> On Jun 5, 2019, at 1:18 AM, Sanjay Govindjee 
>> mailto:s...@berkeley.edu>> wrote:
>>
>> Junchao,
>>
>>Attached is a graph of total RSS from my Mac using openmpi and mpich 
>> (installed with --download-openmpi and --download-mpich).
>>
>>The difference is pretty stark!  The WaitAll( ) in my part of the code 
>> fixed the run away memory
>> problem using openmpi but definitely not with mpich.
>>
>>Tomorrow I hope to get my linux box set up; unfortunately it needs an OS 
>> update :(
>> Then I can try to run there and reproduce the same (or find out it is a Mac 
>> quirk, though the
>> reason I started looking at this was that a use on an HPC system pointed it 
>> out to me).
>>
>> -sanjay
>>
>> PS: To generate the data, all I did was place a call to 
>> PetscMemoryGetCurrentUsage( ) right after KSPSolve( ), followed by an 
>> MPI_AllReduce( ) to sum across the job (4 processors).
>>
>> On 6/4/19 4:27 PM, Zhang, Junchao wrote:
>>> Hi, Sanjay,
>>>I managed to use Valgrind massif + MPICH master + PETSc master. I ran 
>>> ex5 500 time steps with "mpirun -n 4 valgrind --tool=massif 
>>> --max-snapshots=200 --detailed-freq=1 ./ex5 -da_grid_x 512 -da_grid_y 512 
>>> -ts_type beuler -ts_max_steps 500 -malloc"
>>>I visualized the output with massif-visualizer. From the attached 
>>> picture, we can see the total heap size keeps constant most of the time and 
>>> is NOT monotonically increasing.  We can also see MPI only allocated memory 
>>> at initialization time and kept it. So it is unlikely that MPICH keeps 
>>> allocating memory in each KSPSolve call.
>>>From graphs you sent, I can only see RSS is randomly increased after 
>>> KSPSolve, but that does not mean heap size keeps increasing.  I recommend 
>>> you also profile your code with valgrind massif and visualize it. I failed 
>>> to install massif-visualizer on MacBook and CentOS. But I easily got it 
>>> installed on Ubuntu.
>>>I want you to confirm that with the MPI_Waitall fix, you still run out 
>>> of memory with MPICH (but not OpenMPI).  If needed, I can hack MPICH to get 
>>> its current memory usage so that we can calculate its difference after each 
>>> KSPSolve call.
>>>
>>> 
>>>
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Mon, Jun 3, 2019 at 6:36 PM Sanjay Govindjee 
>>> mailto:s...@berkeley.edu>> wrote:
>>> Junchao,
>>>I won't be feasible to share the code but I will run a similar test as 
>>> you have done (large problem); I will
>>> try with both MPICH and OpenMPI.  I also agree that deltas are not ideal as 
>>> there they do not account for latency in the freeing of memory
>>> etc.  But I will note when we have the memory growth issue latency 
>>> associated with free( ) appears not to be in play since the total
>>> memory footprint grows monotonically.
>>>
>>>I'll also have a look at massif.  If you figure out the interface, and 
>>> can send me the lines to instrument the code with that will save me
>>> some time.
>>> -sanjay
>>> On 6/3/19 3:17 PM, Zhang, Junchao wrote:
>>>> Sanjay & Barry,
>>>>Sorry, I made a mistake that I said I could reproduced Sanjay's 
>>>> experiments. I found 1) to correctly use PetscMallocGetCurrentUsage() when 
>>>> petsc is configured without debugging, I have to add -malloc to run the 
>>>> program. 2) I have to instrument the code outside of KS

Re: [petsc-users] Memory growth issue

2019-06-05 Thread Zhang, Junchao via petsc-users
OK, I see. I mistakenly read  PetscMemoryGetCurrentUsage as 
PetscMallocGetCurrentUsage.  You should also do PetscMallocGetCurrentUsage(), 
so that we know whether the increased memory is allocated by PETSc.

On Wed, Jun 5, 2019, 9:58 AM Sanjay GOVINDJEE 
mailto:s...@berkeley.edu>> wrote:
PetscMemoryGetCurrentUsage( ) is just a cover for rgetusage( ), so the use of 
the function is unrelated to Petsc.  The only difference here is mpich versus 
openmpi.
Notwithstanding, I can make a plot of the sum of the deltas around kspsolve.

Sent from my iPad

On Jun 5, 2019, at 7:22 AM, Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:

Sanjay,
  It sounds like the memory is allocated by PETSc, since you call 
PetscMemoryGetCurrentUsage().  Make sure you use the latest PETSc version. You 
can also do an experiment that puts two PetscMemoryGetCurrentUsage() before & 
after KSPSolve(), calculates the delta, and then sums over processes, so we 
know whether the memory is allocated in KSPSolve().

--Junchao Zhang


On Wed, Jun 5, 2019 at 1:19 AM Sanjay Govindjee 
mailto:s...@berkeley.edu>> wrote:
Junchao,

  Attached is a graph of total RSS from my Mac using openmpi and mpich 
(installed with --download-openmpi and --download-mpich).

  The difference is pretty stark!  The WaitAll( ) in my part of the code fixed 
the run away memory
problem using openmpi but definitely not with mpich.

  Tomorrow I hope to get my linux box set up; unfortunately it needs an OS 
update :(
Then I can try to run there and reproduce the same (or find out it is a Mac 
quirk, though the
reason I started looking at this was that a use on an HPC system pointed it out 
to me).

-sanjay

PS: To generate the data, all I did was place a call to 
PetscMemoryGetCurrentUsage( ) right after KSPSolve( ), followed by an 
MPI_AllReduce( ) to sum across the job (4 processors).

On 6/4/19 4:27 PM, Zhang, Junchao wrote:
Hi, Sanjay,
  I managed to use Valgrind massif + MPICH master + PETSc master. I ran ex5 500 
time steps with "mpirun -n 4 valgrind --tool=massif --max-snapshots=200 
--detailed-freq=1 ./ex5 -da_grid_x 512 -da_grid_y 512 -ts_type beuler 
-ts_max_steps 500 -malloc"
  I visualized the output with massif-visualizer. From the attached picture, we 
can see the total heap size keeps constant most of the time and is NOT 
monotonically increasing.  We can also see MPI only allocated memory at 
initialization time and kept it. So it is unlikely that MPICH keeps allocating 
memory in each KSPSolve call.
  From graphs you sent, I can only see RSS is randomly increased after 
KSPSolve, but that does not mean heap size keeps increasing.  I recommend you 
also profile your code with valgrind massif and visualize it. I failed to 
install massif-visualizer on MacBook and CentOS. But I easily got it installed 
on Ubuntu.
  I want you to confirm that with the MPI_Waitall fix, you still run out of 
memory with MPICH (but not OpenMPI).  If needed, I can hack MPICH to get its 
current memory usage so that we can calculate its difference after each 
KSPSolve call.




--Junchao Zhang


On Mon, Jun 3, 2019 at 6:36 PM Sanjay Govindjee 
mailto:s...@berkeley.edu>> wrote:
Junchao,
  I won't be feasible to share the code but I will run a similar test as you 
have done (large problem); I will
try with both MPICH and OpenMPI.  I also agree that deltas are not ideal as 
there they do not account for latency in the freeing of memory
etc.  But I will note when we have the memory growth issue latency associated 
with free( ) appears not to be in play since the total
memory footprint grows monotonically.

  I'll also have a look at massif.  If you figure out the interface, and can 
send me the lines to instrument the code with that will save me
some time.
-sanjay
On 6/3/19 3:17 PM, Zhang, Junchao wrote:
Sanjay & Barry,
  Sorry, I made a mistake that I said I could reproduced Sanjay's experiments. 
I found 1) to correctly use PetscMallocGetCurrentUsage() when petsc is 
configured without debugging, I have to add -malloc to run the program. 2) I 
have to instrument the code outside of KSPSolve(). In my case, it is in 
SNESSolve_NEWTONLS. In old experiments, I did it inside KSPSolve. Since 
KSPSolve can recursively call KSPSolve, the old results were misleading.
 With these fixes, I measured differences of RSS and Petsc malloc before/after 
KSPSolve. I did experiments on MacBook using 
src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c with commands like 
mpirun -n 4 ./ex5 -da_grid_x 64 -da_grid_y 64 -ts_type beuler -ts_max_steps 500 
-malloc.
 I find if the grid size is small, I can see a non-zero RSS-delta randomly, 
either with one mpi rank or multiple ranks, with MPICH or OpenMPI. If I 
increase grid sizes, e.g., -da_grid_x 256 -da_grid_y 256, I only see non-zero 
RSS-delta randomly at the first few iterations (with MPICH or OpenMPI). When 
the computer workload is high by simultaneously running ex5-openmpi and 
ex5-mpich, th

Re: [petsc-users] Memory growth issue

2019-06-03 Thread Zhang, Junchao via petsc-users


On Mon, Jun 3, 2019 at 5:23 PM Stefano Zampini 
mailto:stefano.zamp...@gmail.com>> wrote:


On Jun 4, 2019, at 1:17 AM, Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Sanjay & Barry,
  Sorry, I made a mistake that I said I could reproduced Sanjay's experiments. 
I found 1) to correctly use PetscMallocGetCurrentUsage() when petsc is 
configured without debugging, I have to add -malloc to run the program. 2) I 
have to instrument the code outside of KSPSolve(). In my case, it is in 
SNESSolve_NEWTONLS. In old experiments, I did it inside KSPSolve. Since 
KSPSolve can recursively call KSPSolve, the old results were misleading.
 With these fixes, I measured differences of RSS and Petsc malloc before/after 
KSPSolve. I did experiments on MacBook using 
src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c with commands like 
mpirun -n 4 ./ex5 -da_grid_x 64 -da_grid_y 64 -ts_type beuler -ts_max_steps 500 
-malloc.
 I find if the grid size is small, I can see a non-zero RSS-delta randomly, 
either with one mpi rank or multiple ranks, with MPICH or OpenMPI. If I 
increase grid sizes, e.g., -da_grid_x 256 -da_grid_y 256, I only see non-zero 
RSS-delta randomly at the first few iterations (with MPICH or OpenMPI). When 
the computer workload is high by simultaneously running ex5-openmpi and 
ex5-mpich, the MPICH one pops up much more non-zero RSS-delta. But "Malloc 
Delta" behavior is stable across all runs. There is only one nonzero malloc 
delta value in the first KSPSolve call. All remaining are zero. Something like 
this:
mpirun -n 4 ./ex5-mpich -da_grid_x 256 -da_grid_y 256 -ts_type beuler 
-ts_max_steps 500 -malloc
RSS Delta=   32489472, Malloc Delta=   26290304, RSS End=  136114176
RSS Delta=  32768, Malloc Delta=  0, RSS End=  138510336
RSS Delta=  0, Malloc Delta=  0, RSS End=  138522624
RSS Delta=  0, Malloc Delta=  0, RSS End=  138539008
So I think I can conclude there is no unfreed memory in KSPSolve() allocated by 
PETSc.  Has MPICH allocated unfreed memory in KSPSolve? That is possible and I 
am trying to find a way like PetscMallocGetCurrentUsage() to measure that. 
Also, I think RSS delta is not a good way to measure memory allocation. It is 
dynamic and depends on state of the computer (swap, shared libraries loaded 
etc) when running the code. We should focus on malloc instead.  If there was a 
valgrind tool, like performance profiling tools,  that can let users measure 
memory allocated but not freed in a user specified code segment, that would be 
very helpful in this case. But I have not found one.


Junchao

Have you ever tried Massif? http://valgrind.org/docs/manual/ms-manual.html

No. I came across it but not familiar with it.  I did not find APIs to call to 
get current memory usage. Will look at it further. Thanks.


Sanjay, did you say currently you could run with OpenMPI without out of memory, 
but with MPICH, you ran out of memory?  Is it feasible to share your code so 
that I can test with? Thanks.

--Junchao Zhang

On Sat, Jun 1, 2019 at 3:21 AM Sanjay Govindjee 
mailto:s...@berkeley.edu>> wrote:
Barry,

If you look at the graphs I generated (on my Mac),  you will see that
OpenMPI and MPICH have very different values (along with the fact that
MPICH does not seem to adhere
to the standard (for releasing MPI_ISend resources following and MPI_Wait).

-sanjay

PS: I agree with Barry's assessment; this is really not that acceptable.

On 6/1/19 1:00 AM, Smith, Barry F. wrote:
>Junchao,
>
>   This is insane. Either the OpenMPI library or something in the OS 
> underneath related to sockets and interprocess communication is grabbing 
> additional space for each round of MPI communication!  Does MPICH have the 
> same values or different values than OpenMP? When you run on Linux do you get 
> the same values as Apple or different. --- Same values seem to indicate the 
> issue is inside OpenMPI/MPICH different values indicates problem is more 
> likely at the OS level. Does this happen only with the default VecScatter 
> that uses blocking MPI, what happens with PetscSF under Vec? Is it somehow 
> related to PETSc's use of nonblocking sends and receives? One could 
> presumably use valgrind to see exactly what lines in what code are causing 
> these increases. I don't think we can just shrug and say this is the way it 
> is, we need to track down and understand the cause (and if possible fix).
>
>Barry
>
>
>> On May 31, 2019, at 2:53 PM, Zhang, Junchao 
>> mailto:jczh...@mcs.anl.gov>> wrote:
>>
>> Sanjay,
>> I tried petsc with MPICH and OpenMPI on my Macbook. I inserted 
>> PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the beginning and 
>> end of KSPSolve and then computed the delta and summed over processes. Th

Re: [petsc-users] Memory growth issue

2019-06-03 Thread Zhang, Junchao via petsc-users
Sanjay & Barry,
  Sorry, I made a mistake that I said I could reproduced Sanjay's experiments. 
I found 1) to correctly use PetscMallocGetCurrentUsage() when petsc is 
configured without debugging, I have to add -malloc to run the program. 2) I 
have to instrument the code outside of KSPSolve(). In my case, it is in 
SNESSolve_NEWTONLS. In old experiments, I did it inside KSPSolve. Since 
KSPSolve can recursively call KSPSolve, the old results were misleading.
 With these fixes, I measured differences of RSS and Petsc malloc before/after 
KSPSolve. I did experiments on MacBook using 
src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c with commands like 
mpirun -n 4 ./ex5 -da_grid_x 64 -da_grid_y 64 -ts_type beuler -ts_max_steps 500 
-malloc.
 I find if the grid size is small, I can see a non-zero RSS-delta randomly, 
either with one mpi rank or multiple ranks, with MPICH or OpenMPI. If I 
increase grid sizes, e.g., -da_grid_x 256 -da_grid_y 256, I only see non-zero 
RSS-delta randomly at the first few iterations (with MPICH or OpenMPI). When 
the computer workload is high by simultaneously running ex5-openmpi and 
ex5-mpich, the MPICH one pops up much more non-zero RSS-delta. But "Malloc 
Delta" behavior is stable across all runs. There is only one nonzero malloc 
delta value in the first KSPSolve call. All remaining are zero. Something like 
this:
mpirun -n 4 ./ex5-mpich -da_grid_x 256 -da_grid_y 256 -ts_type beuler 
-ts_max_steps 500 -malloc
RSS Delta=   32489472, Malloc Delta=   26290304, RSS End=  136114176
RSS Delta=  32768, Malloc Delta=  0, RSS End=  138510336
RSS Delta=  0, Malloc Delta=  0, RSS End=  138522624
RSS Delta=  0, Malloc Delta=  0, RSS End=  138539008
So I think I can conclude there is no unfreed memory in KSPSolve() allocated by 
PETSc.  Has MPICH allocated unfreed memory in KSPSolve? That is possible and I 
am trying to find a way like PetscMallocGetCurrentUsage() to measure that. 
Also, I think RSS delta is not a good way to measure memory allocation. It is 
dynamic and depends on state of the computer (swap, shared libraries loaded 
etc) when running the code. We should focus on malloc instead.  If there was a 
valgrind tool, like performance profiling tools,  that can let users measure 
memory allocated but not freed in a user specified code segment, that would be 
very helpful in this case. But I have not found one.

Sanjay, did you say currently you could run with OpenMPI without out of memory, 
but with MPICH, you ran out of memory?  Is it feasible to share your code so 
that I can test with? Thanks.

--Junchao Zhang

On Sat, Jun 1, 2019 at 3:21 AM Sanjay Govindjee 
mailto:s...@berkeley.edu>> wrote:
Barry,

If you look at the graphs I generated (on my Mac),  you will see that
OpenMPI and MPICH have very different values (along with the fact that
MPICH does not seem to adhere
to the standard (for releasing MPI_ISend resources following and MPI_Wait).

-sanjay

PS: I agree with Barry's assessment; this is really not that acceptable.

On 6/1/19 1:00 AM, Smith, Barry F. wrote:
>Junchao,
>
>   This is insane. Either the OpenMPI library or something in the OS 
> underneath related to sockets and interprocess communication is grabbing 
> additional space for each round of MPI communication!  Does MPICH have the 
> same values or different values than OpenMP? When you run on Linux do you get 
> the same values as Apple or different. --- Same values seem to indicate the 
> issue is inside OpenMPI/MPICH different values indicates problem is more 
> likely at the OS level. Does this happen only with the default VecScatter 
> that uses blocking MPI, what happens with PetscSF under Vec? Is it somehow 
> related to PETSc's use of nonblocking sends and receives? One could 
> presumably use valgrind to see exactly what lines in what code are causing 
> these increases. I don't think we can just shrug and say this is the way it 
> is, we need to track down and understand the cause (and if possible fix).
>
>Barry
>
>
>> On May 31, 2019, at 2:53 PM, Zhang, Junchao 
>> mailto:jczh...@mcs.anl.gov>> wrote:
>>
>> Sanjay,
>> I tried petsc with MPICH and OpenMPI on my Macbook. I inserted 
>> PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the beginning and 
>> end of KSPSolve and then computed the delta and summed over processes. Then 
>> I tested with src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c
>> With OpenMPI,
>> mpirun -n 4 ./ex5 -da_grid_x 128 -da_grid_y 128 -ts_type beuler 
>> -ts_max_steps 500 > 128.log
>> grep -n -v "RSS Delta= 0, Malloc Delta= 0" 128.log
>> 1:RSS Delta= 69632, Malloc Delta= 0
>> 2:RSS Delta= 69632, Malloc Delta= 0
>> 3:

Re: [petsc-users] Memory growth issue

2019-06-01 Thread Zhang, Junchao via petsc-users


On Sat, Jun 1, 2019 at 3:21 AM Sanjay Govindjee 
mailto:s...@berkeley.edu>> wrote:
Barry,

If you look at the graphs I generated (on my Mac),  you will see that
OpenMPI and MPICH have very different values (along with the fact that
MPICH does not seem to adhere
to the standard (for releasing MPI_ISend resources following and MPI_Wait).

-sanjay
PS: I agree with Barry's assessment; this is really not that acceptable.

I also agree. I am doing various experiments to know why.

On 6/1/19 1:00 AM, Smith, Barry F. wrote:
>Junchao,
>
>   This is insane. Either the OpenMPI library or something in the OS 
> underneath related to sockets and interprocess communication is grabbing 
> additional space for each round of MPI communication!  Does MPICH have the 
> same values or different values than OpenMP? When you run on Linux do you get 
> the same values as Apple or different. --- Same values seem to indicate the 
> issue is inside OpenMPI/MPICH different values indicates problem is more 
> likely at the OS level. Does this happen only with the default VecScatter 
> that uses blocking MPI, what happens with PetscSF under Vec? Is it somehow 
> related to PETSc's use of nonblocking sends and receives? One could 
> presumably use valgrind to see exactly what lines in what code are causing 
> these increases. I don't think we can just shrug and say this is the way it 
> is, we need to track down and understand the cause (and if possible fix).
>
>Barry
>
>
>> On May 31, 2019, at 2:53 PM, Zhang, Junchao 
>> mailto:jczh...@mcs.anl.gov>> wrote:
>>
>> Sanjay,
>> I tried petsc with MPICH and OpenMPI on my Macbook. I inserted 
>> PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the beginning and 
>> end of KSPSolve and then computed the delta and summed over processes. Then 
>> I tested with src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c
>> With OpenMPI,
>> mpirun -n 4 ./ex5 -da_grid_x 128 -da_grid_y 128 -ts_type beuler 
>> -ts_max_steps 500 > 128.log
>> grep -n -v "RSS Delta= 0, Malloc Delta= 0" 128.log
>> 1:RSS Delta= 69632, Malloc Delta= 0
>> 2:RSS Delta= 69632, Malloc Delta= 0
>> 3:RSS Delta= 69632, Malloc Delta= 0
>> 4:RSS Delta= 69632, Malloc Delta= 0
>> 9:RSS Delta=9.25286e+06, Malloc Delta= 0
>> 22:RSS Delta= 49152, Malloc Delta= 0
>> 44:RSS Delta= 20480, Malloc Delta= 0
>> 53:RSS Delta= 49152, Malloc Delta= 0
>> 66:RSS Delta=  4096, Malloc Delta= 0
>> 97:RSS Delta= 16384, Malloc Delta= 0
>> 119:RSS Delta= 20480, Malloc Delta= 0
>> 141:RSS Delta= 53248, Malloc Delta= 0
>> 176:RSS Delta= 16384, Malloc Delta= 0
>> 308:RSS Delta= 16384, Malloc Delta= 0
>> 352:RSS Delta= 16384, Malloc Delta= 0
>> 550:RSS Delta= 16384, Malloc Delta= 0
>> 572:RSS Delta= 16384, Malloc Delta= 0
>> 669:RSS Delta= 40960, Malloc Delta= 0
>> 924:RSS Delta= 32768, Malloc Delta= 0
>> 1694:RSS Delta= 20480, Malloc Delta= 0
>> 2099:RSS Delta= 16384, Malloc Delta= 0
>> 2244:RSS Delta= 20480, Malloc Delta= 0
>> 3001:RSS Delta= 16384, Malloc Delta= 0
>> 5883:RSS Delta= 16384, Malloc Delta= 0
>>
>> If I increased the grid
>> mpirun -n 4 ./ex5 -da_grid_x 512 -da_grid_y 512 -ts_type beuler 
>> -ts_max_steps 500 -malloc_test >512.log
>> grep -n -v "RSS Delta= 0, Malloc Delta= 0" 512.log
>> 1:RSS Delta=1.05267e+06, Malloc Delta= 0
>> 2:RSS Delta=1.05267e+06, Malloc Delta= 0
>> 3:RSS Delta=1.05267e+06, Malloc Delta= 0
>> 4:RSS Delta=1.05267e+06, Malloc Delta= 0
>> 13:RSS Delta=1.24932e+08, Malloc Delta= 0
>>
>> So we did see RSS increase in 4k-page sizes after KSPSolve. As long as no 
>> memory leaks, why do you care about it? Is it because you run out of memory?
>>
>> On Thu, May 30, 2019 at 1:59 PM Smith, Barry F. 
>> mailto:bsm...@mcs.anl.gov>> wrote:
>>
>> Thanks for the update. So the current conclusions are that using the 
>> Waitall in your code
>>
>> 1) solves the memory issue with OpenMPI in your code
>>
>> 2) does not solve the memory issue with PETSc KSPSolve
>>
>> 3) MPICH has memory issues both for your code and PETSc KSPSolve (despite) 
>> the wait all fix?
>>
>> If you literately just comment out the call to KSPSolve() with OpenMPI is 
>> there no growth in memory usage?

Re: [petsc-users] Memory growth issue

2019-05-31 Thread Zhang, Junchao via petsc-users


On Fri, May 31, 2019 at 3:48 PM Sanjay Govindjee via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Thanks Stefano.

Reading the manual pages a bit more carefully,
I think I can see what I should be doing.  Which should be roughly to

1. Set up target Seq vectors on PETSC_COMM_SELF
2. Use ISCreateGeneral to create ISs for the target Vecs  and the source Vec 
which will be MPI on PETSC_COMM_WORLD.
3. Create the scatter context with VecScatterCreate
4. Call VecScatterBegin/End on each process (instead of using my prior routine).

Lingering questions:

a. Is there any performance advantage/disadvantage to creating a single 
parallel target Vec instead
of multiple target Seq Vecs (in terms of the scatter operation)?
No performance difference. But pay attention, if you use seq vec, the indices 
in IS are locally numbered; if you use MPI vec, the indices are globally 
numbered.


b. The data that ends up in the target on each processor needs to be in an 
application
array.  Is there a clever way to 'move' the data from the scatter target to the 
array (short
of just running a loop over it and copying)?

See VecGetArray, VecGetArrayRead etc, which pull the data out of Vecs without 
memory copying.
 -sanjay



On 5/31/19 12:02 PM, Stefano Zampini wrote:


On May 31, 2019, at 9:50 PM, Sanjay Govindjee via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Matt,
  Here is the process as it currently stands:

1) I have a PETSc Vec (sol), which come from a KSPSolve

2) Each processor grabs its section of sol via VecGetOwnershipRange and 
VecGetArrayReadF90
and inserts parts of its section of sol in a local array (locarr) using a 
complex but easily computable mapping.

3) The routine you are looking at then exchanges various parts of the locarr 
between the processors.


You need a VecScatter object 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate

4) Each processor then does computations using its updated locarr.

Typing it out this way, I guess the answer to your question is "yes."  I have a 
global Vec and I want its values
sent in a complex but computable way to local vectors on each process.

-sanjay
On 5/31/19 3:37 AM, Matthew Knepley wrote:
On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi Juanchao,
Thanks for the hints below, they will take some time to absorb as the vectors 
that are being  moved around
are actually partly petsc vectors and partly local process vectors.

Is this code just doing a global-to-local map? Meaning, does it just map all 
the local unknowns to some global
unknown on some process? We have an even simpler interface for that, where we 
make the VecScatter
automatically,

  
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate

Then you can use it with Vecs, Mats, etc.

  Thanks,

 Matt

Attached is the modified routine that now works (on leaking memory) with 
openmpi.

-sanjay
On 5/30/19 8:41 PM, Zhang, Junchao wrote:

Hi, Sanjay,
  Could you send your modified data exchange code (psetb.F) with MPI_Waitall? 
See other inlined comments below. Thanks.

On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Lawrence,
Thanks for taking a look!  This is what I had been wondering about -- my
knowledge of MPI is pretty minimal and
this origins of the routine were from a programmer we hired a decade+
back from NERSC.  I'll have to look into
VecScatter.  It will be great to dispense with our roll-your-own
routines (we even have our own reduceALL scattered around the code).
Petsc VecScatter has a very simple interface and you definitely should go with. 
 With VecScatter, you can think in familiar vectors and indices instead of the 
low level MPI_Send/Recv. Besides that, PETSc has optimized VecScatter so that 
communication is efficient.

Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI
but it still persists with MPICH.  Graphs attached.
I'm going to run with openmpi for now (but I guess I really still need
to figure out what is wrong with MPICH and WaitALL;
I'll try Barry's suggestion of
--download-mpich-configure-arguments="--enable-error-messages=all
--enable-g" later today and report back).

Regarding MPI_Barrier, it was put in due a problem that some processes
were finishing up sending and receiving and exiting the subroutine
before the receiving processes had completed (which resulted in data
loss as the buffers are freed after the call to the routine).
MPI_Barrier was the solution proposed
to us.  I don't think I can dispense with it, but will think about some
more.
After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can safely 
free the send buffer without worry that the receive has not completed. MPI 
guarantees the receiver can get the data, for examp

Re: [petsc-users] Memory growth issue

2019-05-31 Thread Zhang, Junchao via petsc-users
Sanjay,
I tried petsc with MPICH and OpenMPI on my Macbook. I inserted 
PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the beginning and end 
of KSPSolve and then computed the delta and summed over processes. Then I 
tested with src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c
With OpenMPI,
mpirun -n 4 ./ex5 -da_grid_x 128 -da_grid_y 128 -ts_type beuler -ts_max_steps 
500 > 128.log
grep -n -v "RSS Delta= 0, Malloc Delta= 0" 128.log
1:RSS Delta= 69632, Malloc Delta= 0
2:RSS Delta= 69632, Malloc Delta= 0
3:RSS Delta= 69632, Malloc Delta= 0
4:RSS Delta= 69632, Malloc Delta= 0
9:RSS Delta=9.25286e+06, Malloc Delta= 0
22:RSS Delta= 49152, Malloc Delta= 0
44:RSS Delta= 20480, Malloc Delta= 0
53:RSS Delta= 49152, Malloc Delta= 0
66:RSS Delta=  4096, Malloc Delta= 0
97:RSS Delta= 16384, Malloc Delta= 0
119:RSS Delta= 20480, Malloc Delta= 0
141:RSS Delta= 53248, Malloc Delta= 0
176:RSS Delta= 16384, Malloc Delta= 0
308:RSS Delta= 16384, Malloc Delta= 0
352:RSS Delta= 16384, Malloc Delta= 0
550:RSS Delta= 16384, Malloc Delta= 0
572:RSS Delta= 16384, Malloc Delta= 0
669:RSS Delta= 40960, Malloc Delta= 0
924:RSS Delta= 32768, Malloc Delta= 0
1694:RSS Delta= 20480, Malloc Delta= 0
2099:RSS Delta= 16384, Malloc Delta= 0
2244:RSS Delta= 20480, Malloc Delta= 0
3001:RSS Delta= 16384, Malloc Delta= 0
5883:RSS Delta= 16384, Malloc Delta= 0

If I increased the grid
mpirun -n 4 ./ex5 -da_grid_x 512 -da_grid_y 512 -ts_type beuler -ts_max_steps 
500 -malloc_test >512.log
grep -n -v "RSS Delta= 0, Malloc Delta= 0" 512.log
1:RSS Delta=1.05267e+06, Malloc Delta= 0
2:RSS Delta=1.05267e+06, Malloc Delta= 0
3:RSS Delta=1.05267e+06, Malloc Delta= 0
4:RSS Delta=1.05267e+06, Malloc Delta= 0
13:RSS Delta=1.24932e+08, Malloc Delta= 0

So we did see RSS increase in 4k-page sizes after KSPSolve. As long as no 
memory leaks, why do you care about it? Is it because you run out of memory?

On Thu, May 30, 2019 at 1:59 PM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

   Thanks for the update. So the current conclusions are that using the Waitall 
in your code

1) solves the memory issue with OpenMPI in your code

2) does not solve the memory issue with PETSc KSPSolve

3) MPICH has memory issues both for your code and PETSc KSPSolve (despite) the 
wait all fix?

If you literately just comment out the call to KSPSolve() with OpenMPI is there 
no growth in memory usage?


Both 2 and 3 are concerning, indicate possible memory leak bugs in MPICH and 
not freeing all MPI resources in KSPSolve()

Junchao, can you please investigate 2 and 3 with, for example, a TS example 
that uses the linear solver (like with -ts_type beuler)? Thanks


  Barry



> On May 30, 2019, at 1:47 PM, Sanjay Govindjee 
> mailto:s...@berkeley.edu>> wrote:
>
> Lawrence,
> Thanks for taking a look!  This is what I had been wondering about -- my 
> knowledge of MPI is pretty minimal and
> this origins of the routine were from a programmer we hired a decade+ back 
> from NERSC.  I'll have to look into
> VecScatter.  It will be great to dispense with our roll-your-own routines (we 
> even have our own reduceALL scattered around the code).
>
> Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but 
> it still persists with MPICH.  Graphs attached.
> I'm going to run with openmpi for now (but I guess I really still need to 
> figure out what is wrong with MPICH and WaitALL;
> I'll try Barry's suggestion of 
> --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" 
> later today and report back).
>
> Regarding MPI_Barrier, it was put in due a problem that some processes were 
> finishing up sending and receiving and exiting the subroutine
> before the receiving processes had completed (which resulted in data loss as 
> the buffers are freed after the call to the routine). MPI_Barrier was the 
> solution proposed
> to us.  I don't think I can dispense with it, but will think about some more.
>
> I'm not so sure about using MPI_IRecv as it will require a bit of rewriting 
> since right now I process the received
> data sequentially after each blocking MPI_Recv -- clearly slower but easier 
> to code.
>
> Thanks again for the help.
>
> -sanjay
>
> On 5/30/19 4:48 AM, Lawrence Mitchell wrote:
>> Hi Sanjay,
>>
>>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users 
>>> mailto:petsc-users@mcs.anl.gov>> wrote:
>>>
>>> The problem seems to persist but with a different signature.  Graphs 
>>> attached as before.
>>>
>>> Totals with MPICH (NB: single run)
>>>
>>> For the CG/Jacobi  data_exchange_total = 41,385,984; kspsolve_total 
>>> = 38,289,408
>>> For 

Re: [petsc-users] Memory growth issue

2019-05-30 Thread Zhang, Junchao via petsc-users

Hi, Sanjay,
  Could you send your modified data exchange code (psetb.F) with MPI_Waitall? 
See other inlined comments below. Thanks.

On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Lawrence,
Thanks for taking a look!  This is what I had been wondering about -- my
knowledge of MPI is pretty minimal and
this origins of the routine were from a programmer we hired a decade+
back from NERSC.  I'll have to look into
VecScatter.  It will be great to dispense with our roll-your-own
routines (we even have our own reduceALL scattered around the code).
Petsc VecScatter has a very simple interface and you definitely should go with. 
 With VecScatter, you can think in familiar vectors and indices instead of the 
low level MPI_Send/Recv. Besides that, PETSc has optimized VecScatter so that 
communication is efficient.

Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI
but it still persists with MPICH.  Graphs attached.
I'm going to run with openmpi for now (but I guess I really still need
to figure out what is wrong with MPICH and WaitALL;
I'll try Barry's suggestion of
--download-mpich-configure-arguments="--enable-error-messages=all
--enable-g" later today and report back).

Regarding MPI_Barrier, it was put in due a problem that some processes
were finishing up sending and receiving and exiting the subroutine
before the receiving processes had completed (which resulted in data
loss as the buffers are freed after the call to the routine).
MPI_Barrier was the solution proposed
to us.  I don't think I can dispense with it, but will think about some
more.
After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can safely 
free the send buffer without worry that the receive has not completed. MPI 
guarantees the receiver can get the data, for example, through internal 
buffering.

I'm not so sure about using MPI_IRecv as it will require a bit of
rewriting since right now I process the received
data sequentially after each blocking MPI_Recv -- clearly slower but
easier to code.

Thanks again for the help.

-sanjay

On 5/30/19 4:48 AM, Lawrence Mitchell wrote:
> Hi Sanjay,
>
>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users 
>> mailto:petsc-users@mcs.anl.gov>> wrote:
>>
>> The problem seems to persist but with a different signature.  Graphs 
>> attached as before.
>>
>> Totals with MPICH (NB: single run)
>>
>> For the CG/Jacobi  data_exchange_total = 41,385,984; kspsolve_total 
>> = 38,289,408
>> For the GMRES/BJACOBI  data_exchange_total = 41,324,544; kspsolve_total 
>> = 41,324,544
>>
>> Just reading the MPI docs I am wondering if I need some sort of 
>> MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine?
>> I would have thought that with the blocking receives and the MPI_Barrier 
>> that everything will have fully completed and cleaned up before
>> all processes exited the routine, but perhaps I am wrong on that.
>
> Skimming the fortran code you sent you do:
>
> for i in ...:
> call MPI_Isend(..., req, ierr)
>
> for i in ...:
> call MPI_Recv(..., ierr)
>
> But you never call MPI_Wait on the request you got back from the Isend. So 
> the MPI library will never free the data structures it created.
>
> The usual pattern for these non-blocking communications is to allocate an 
> array for the requests of length nsend+nrecv and then do:
>
> for i in nsend:
> call MPI_Isend(..., req[i], ierr)
> for j in nrecv:
> call MPI_Irecv(..., req[nsend+j], ierr)
>
> call MPI_Waitall(req, ..., ierr)
>
> I note also there's no need for the Barrier at the end of the routine, this 
> kind of communication does neighbourwise synchronisation, no need to add 
> (unnecessary) global synchronisation too.
>
> As an aside, is there a reason you don't use PETSc's VecScatter to manage 
> this global to local exchange?
>
> Cheers,
>
> Lawrence



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Zhang, Junchao via petsc-users
Yes, see MatGetRow 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRow.html
--Junchao Zhang


On Wed, May 29, 2019 at 2:28 PM Manav Bhatia via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

   Once a MPI-AIJ matrix has been assembled, is there a method to get the 
nonzero I-J locations? I see one for sequential matrices here: 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
 , but not for parallel matrices.

Regards,
Manav




Re: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS

2019-05-28 Thread Zhang, Junchao via petsc-users
Also works with PathScale EKOPath Compiler Suite installed on MCS machines.

$ pathcc -c check-pic.c -fPIC
$ pathcc -c check-pic.c
check-pic.c:2:2: error: "no-PIC"
#error "no-PIC"
 ^
1 error generated.

--Junchao Zhang


On Tue, May 28, 2019 at 1:54 PM Smith, Barry F. via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

  Works for Intel and PGI compiles (the version I checked)

bsmith@es:~$ pgcc check-pic.c  -PIC
pgcc-Error-Unknown switch: -PIC
bsmith@es:~$ pgcc check-pic.c  -fPIC
bsmith@es:~$ pgcc check-pic.c
PGC-F-0249-#error --  "no-PIC" (check-pic.c: 2)
PGC/x86-64 Linux 19.3-0: compilation aborted
bsmith@es:~$ icc check-pic.c
check-pic.c(2): error: #error directive: "no-PIC"
  #error "no-PIC"
   ^

compilation aborted for check-pic.c (code 2)
bsmith@es:~$ icc check-pic.c -PIC
icc: command line warning #10006: ignoring unknown option '-PIC'
check-pic.c(2): error: #error directive: "no-PIC"
  #error "no-PIC"
   ^

compilation aborted for check-pic.c (code 2)
bsmith@es:~$ icc check-pic.c -fPIC
bsmith@es:~$


You are the man!


> On May 28, 2019, at 12:29 PM, Lisandro Dalcin via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
>
>
>
> On Tue, 28 May 2019 at 18:19, Jed Brown 
> mailto:j...@jedbrown.org>> wrote:
> Lisandro Dalcin via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> writes:
>
> > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users <
> > petsc-users@mcs.anl.gov> wrote:
> >
> >> Configure.log shows '--with-pic=1' - hence this error.
> >>
> >> Remove '--with-pic=1' and retry.
> >>
> >>
> > Nonsense. Why this behavior? Building a static library with PIC code is a
> > perfectly valid use case.
>
> And that's what will happen because Inge passed -fPIC in CFLAGS et al.
>
> Do you know how we could confirm that PIC code is generated without
> attempting to use shared libraries?
>
>
> I know how to do it with the `readelf` command for ELF objects. I even know 
> how to do it compile-time for GCC and clang. Maybe Intel also works this way. 
> I do not know about a general solution, though.
>
> $ cat check-pic.c
> #ifndef __PIC__
> #error "no-PIC"
> #endif
>
> $ gcc -c check-pic.c -fPIC
>
> $ clang -c check-pic.c -fPIC
>
> $ gcc -c check-pic.c
> check-pic.c:2:2: error: #error "no-PIC"
> 2 | #error "no-PIC"
>   |  ^
>
> $ clang -c check-pic.c
> check-pic.c:2:2: error: "no-PIC"
> #error "no-PIC"
>  ^
> 1 error generated.
>
> --
> Lisandro Dalcin
> 
> Research Scientist
> Extreme Computing Research Center (ECRC)
> King Abdullah University of Science and Technology (KAUST)
> http://ecrc.kaust.edu.sa/



Re: [petsc-users] Question about parallel Vectors and communicators

2019-05-13 Thread Zhang, Junchao via petsc-users
 The index sets provide possible i, j in scatter "y[j] = x[i]". Each process 
provides a portion of the i and j of the whole scatter. The only requirement of 
VecScatterCreate is that on each process, local sizes of ix and iy must be 
equal (a process can provide empty ix and iy).  A process's i and j can point 
to anyplace in their vector (not constrained to the vector's local part)
 The interpretation of ix and iy is not dependent on their communicator, 
instead, is dependent on their associated vector. Let P and S stand for 
parallel and sequential vectors respectively, there are four combinations of 
vecscatters: PtoP, PtoS, StoP and StoS.  The assumption is: if x is parallel, 
then ix contains global indices of x. If x is sequential, ix contains local 
indices of x. Similarly for y and iy.
 So, index sets created with PETSC_COMM_SELF can perfectly include global 
indices. That is why I always use PETSC_COMM_SELF to create index sets for 
VecScatter. It makes things easier to understand.
 The quote you gave is also confusing to me. If you use PETSC_COMM_SELF, it 
means only the current process uses the IS. That sounds ok since other 
processes can not get a reference to this IS.
 Maybe, other petsc developers can explain when parallel communicators are 
useful for index sets.  My feeling is that they are useless at least for 
VecScatter.

--Junchao Zhang


On Mon, May 13, 2019 at 9:07 AM GIRET Jean-Christophe 
mailto:jean-christophe.gi...@irt-saintexupery.com>>
 wrote:
Hello,

Thank you all for you answers and examples, it’s now very clear: the trick is 
to alias a Vec on a subcomm with a Vec on the parent comm, and to make the comm 
through Scatter on the parent comm. I have also been able to implement it with 
petsc4py.

Junchao, thank you for your example. It is indeed very clear. Although I 
understand how the exchanges are made through the Vecs defined on the parent 
comms, I am wondering why ISCreateStride is defined on the communicator 
PETSC_COMM_SELF and not on the parent communicator spanning the Vecs used for 
the Scatter operations.

When I read the documentation, I see: “The communicator, comm, should consist 
of all processes that will be using the IS.” I would say in that case that it 
is the same communicator used for the ‘exchange’ vectors.

I am surely misunderstanding something here, but I didn’t find any answer while 
googling. Any hint on that?

Again, thank you all for your great support,
Best,
JC



De : Zhang, Junchao [mailto:jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>]
Envoyé : vendredi 10 mai 2019 22:01
À : GIRET Jean-Christophe
Cc : Mark Adams; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Objet : Re: [petsc-users] Question about parallel Vectors and communicators

Jean-Christophe,
  I added a petsc example at 
https://bitbucket.org/petsc/petsc/pull-requests/1652/add-an-example-to-show-transfer-vectors/diff#chg-src/vec/vscat/examples/ex9.c
  It shows how to transfer vectors from a parent communicator to vectors on a 
child communicator. It also shows how to transfer vectors from a subcomm to 
vectors on another subcomm. The two subcomms are not required to cover all 
processes in PETSC_COMM_WORLD.
  Hope it helps you better understand Vec and VecScatter.
--Junchao Zhang


On Thu, May 9, 2019 at 11:34 AM GIRET Jean-Christophe via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

Thanks Mark and Jed for your quick answers.

So the idea is to define all the Vecs on the world communicator, and perform 
the communications using traditional scatter objects? The data would still be 
accessible on the two sub-communicators as they are both subsets of the 
MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or 
the IS for the scatter. Is that right?

I’m currently trying, without success, to perform a Scatter from a MPI Vec 
defined on a subcomm to another Vec defined on the world comm, and vice-versa. 
But I don’t know if it’s possible.

I can imagine that trying doing that seems a bit strange. However, I’m dealing 
with code coupling (and linear algebra for the main part of the code), and my 
idea was trying to use the Vec data structures to perform data exchange between 
some parts of the software which would have their own communicator. It would 
eliminate the need to re-implement an ad-hoc solution.

An option would be to stick on the world communicator for all the PETSc part, 
but I could face some situations where my Vecs could be small while I would 
have to run the whole simulation on an important number of core for the coupled 
part. I imagine that It may not really serve the linear system solving part in 
terms of performance. Another one would be perform all the PETSc operations on 
a sub-communicator and use “raw” MPI communications between the communicators 
to perform the data exchange for the coupling part.

Thanks again for your support,
Best regards,
Jean-Christophe

De : Mark Adams [

Re: [petsc-users] Question about parallel Vectors and communicators

2019-05-10 Thread Zhang, Junchao via petsc-users
Jean-Christophe,
  I added a petsc example at 
https://bitbucket.org/petsc/petsc/pull-requests/1652/add-an-example-to-show-transfer-vectors/diff#chg-src/vec/vscat/examples/ex9.c
  It shows how to transfer vectors from a parent communicator to vectors on a 
child communicator. It also shows how to transfer vectors from a subcomm to 
vectors on another subcomm. The two subcomms are not required to cover all 
processes in PETSC_COMM_WORLD.
  Hope it helps you better understand Vec and VecScatter.
--Junchao Zhang


On Thu, May 9, 2019 at 11:34 AM GIRET Jean-Christophe via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

Thanks Mark and Jed for your quick answers.

So the idea is to define all the Vecs on the world communicator, and perform 
the communications using traditional scatter objects? The data would still be 
accessible on the two sub-communicators as they are both subsets of the 
MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or 
the IS for the scatter. Is that right?

I’m currently trying, without success, to perform a Scatter from a MPI Vec 
defined on a subcomm to another Vec defined on the world comm, and vice-versa. 
But I don’t know if it’s possible.

I can imagine that trying doing that seems a bit strange. However, I’m dealing 
with code coupling (and linear algebra for the main part of the code), and my 
idea was trying to use the Vec data structures to perform data exchange between 
some parts of the software which would have their own communicator. It would 
eliminate the need to re-implement an ad-hoc solution.

An option would be to stick on the world communicator for all the PETSc part, 
but I could face some situations where my Vecs could be small while I would 
have to run the whole simulation on an important number of core for the coupled 
part. I imagine that It may not really serve the linear system solving part in 
terms of performance. Another one would be perform all the PETSc operations on 
a sub-communicator and use “raw” MPI communications between the communicators 
to perform the data exchange for the coupling part.

Thanks again for your support,
Best regards,
Jean-Christophe

De : Mark Adams [mailto:mfad...@lbl.gov]
Envoyé : mardi 7 mai 2019 21:39
À : GIRET Jean-Christophe
Cc : petsc-users@mcs.anl.gov
Objet : Re: [petsc-users] Question about parallel Vectors and communicators



On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Dear PETSc users,

I would like to use Petsc4Py for a project extension, which consists mainly of:

-  Storing data and matrices on several rank/nodes which could not fit 
on a single node.

-  Performing some linear algebra in a parallel fashion (solving sparse 
linear system for instance)

-  Exchanging those data structures (parallel vectors) between 
non-overlapping MPI communicators, created for instance by splitting 
MPI_COMM_WORLD.

While the two first items seems to be well addressed by PETSc, I am wondering 
about the last one.

Is it possible to access the data of a vector, defined on a communicator from 
another, non-overlapping communicator? From what I have seen from the 
documentation and the several threads on the user mailing-list, I would say no. 
But maybe I am missing something? If not, is it possible to transfer a vector 
defined on a given communicator on a communicator which is a subset of the 
previous one?

If you are sending to a subset of processes then VecGetSubVec + Jed's tricks 
might work.

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html


Best regards,
Jean-Christophe




Re: [petsc-users] Command line option -memory_info

2019-05-07 Thread Zhang, Junchao via petsc-users
https://www.mcs.anl.gov/petsc/documentation/changes/37.html has
  PetscMemoryShowUsage() and -memory_info changed to PetscMemoryView() and 
-memory_view

--Junchao Zhang


On Tue, May 7, 2019 at 6:56 PM Sanjay Govindjee via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I was trying to clean up some old scripts we have for running our codes
which include the command line option -memory_info.
I went digging in the manuals to try and figure out what this used to do
and what has replaced its functionality but I wasn't able
to figure it out.  Does anyone recall the earlier functionality for this
option? and/or know its "replacement"?
-sanjay



Re: [petsc-users] Quick question about ISCreateGeneral

2019-04-30 Thread Zhang, Junchao via petsc-users


On Tue, Apr 30, 2019 at 11:42 AM Sajid Ali via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi PETSc Developers,

I see that in the examples for ISCreateGeneral, the index sets are created by 
copying values from int arrays (which were created by PetscMalloc1 which is not 
collective).

If I the ISCreateGeneral is called with PETSC_COMM_WORLD and the int arrays on 
each rank are independently created, does the index set created concatenate all 
the int-arrays into one ? If not, what needs to be done to get such an index 
set ?
From my understanding, they are independently created and not concatenated.  I 
like index sets created with PETSC_COMM_SELF. They are easy to understand.

PS: For context, I want to write a fftshift convenience function (like numpy, 
MATLAB) but for large distributed vectors. I thought that I could do this with 
VecScatter and two index sets, one shifted and one un-shifted.
To achieve this, index sets created with PETSC_COMM_SELF are enough. They just 
need to contain global indices to indicate the MPI vector to MPI vector 
scatter. You can think each process provides one piece of the scatter.

Thank You,
Sajid Ali
Applied Physics
Northwestern University


Re: [petsc-users] questions regarding simple petsc matrix vector operation

2019-04-24 Thread Zhang, Junchao via petsc-users
How many MPI ranks do you use? The following line is suspicious.  I guess you 
do not want a vector of global length 1.
66   VecSetSizes(b,PETSC_DECIDE,1);

--Junchao Zhang


On Wed, Apr 24, 2019 at 4:14 PM Karl Lin via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, there

I have been trying to get a simple program run with the following code:

 12 int main(int argc,char **args)
 13 {
 14   PetscErrorCode ierr;
 15   MatA;
 16   Mat   AT;
 17   MatN;
 18   char   name[1024];
 19   char   vname[1024];
 20   char   pass[1024];
 21   PetscBool  flg;
 22   Vecb,x,u,Ab,Au;
 23   PetscViewerviewer;/* viewer */
 24   PetscMPIIntrank,size;
 25
 26   KSPQRsolver;
 27   PC pc;
 28   PetscInt   its;
 29   PetscReal  norm;
 30
 31   PetscInt   n1, n2, n3, np1, np2, np3, p, jj;
 32
 33   PetscInt   *cols, *dnz, *onz;
 34   PetscScalar*vals;
 35
 36   ierr = PetscInitialize(,,0,help);if (ierr) return ierr;
 37
 38   ierr = MPI_Comm_size(PETSC_COMM_WORLD,);CHKERRQ(ierr);
 39   ierr = MPI_Comm_rank(PETSC_COMM_WORLD,);CHKERRQ(ierr);
 40
 41   PetscMalloc1(1, );
 42   PetscMalloc1(1, );
 43
 44   dnz[0]=2;
 45   onz[0]=1;
 46
 47   MatCreateMPIAIJMKL(PETSC_COMM_WORLD, 1, 2, 1, 2, 2, dnz, 2, onz, ); 
CHKERRQ(ierr);
 48
 49   PetscMalloc1(2, );
 50   PetscMalloc1(2, );
 51
 52   jj = rank;
 53   cols[0]=0; cols[1]=1;
 54   vals[0]=1.0;vals[1]=1.0;
 55
 56   MatSetValues(A, 1, , 2, cols, vals, INSERT_VALUES);
 57
 58   MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
 59   MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
 60
 61   VecCreate(PETSC_COMM_WORLD,);
 62   VecSetSizes(x,PETSC_DECIDE,2);
 63   VecSetFromOptions(x);
 64
 65   VecCreate(PETSC_COMM_WORLD,);
 66   VecSetSizes(b,PETSC_DECIDE,1);
 67   VecSetFromOptions(b);
 68
 69   VecCreate(PETSC_COMM_WORLD,);
 70   VecSetSizes(u,PETSC_DECIDE,1);
 71   VecSetFromOptions(u);
 72
 73   VecSet(b, 2.0);
 74   VecSet(u, 0.0);
 75   VecSet(x, 0.0);
 76
 77   MatMult(A, x, u);
 78
 79   VecView(x, PETSC_VIEWER_STDOUT_WORLD);
 80   VecView(b, PETSC_VIEWER_STDOUT_WORLD);
 81   VecView(u, PETSC_VIEWER_STDOUT_WORLD);
 82
 83   VecAXPY(u,-1.0,b);

However, it always crashes at line 83 even with single process saying:
[0]PETSC ERROR: 

[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019

I can't figure out why this would happen. The printout from VecView shows every 
vec value is correct. I will greatly appreciate any tips.

Regards,
Karl



Re: [petsc-users] Preallocation of sequential matrix

2019-04-23 Thread Zhang, Junchao via petsc-users
The error message has
[0]PETSC ERROR: New nonzero at (61,124) caused a malloc
[0]PETSC ERROR: New nonzero at (124,186) caused a malloc
You can check your code to see if you allocated spots for these nonzeros.

--Junchao Zhang


On Tue, Apr 23, 2019 at 8:57 PM Maahi Talukder via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Dear All,


I am trying to preallocate the no of nonzeros in my matrix using the parameter 
'nnz'. Here 'row' is the parameter 'nnz'. The part of the code that does that 
is the following-

..
Do j = 2,xmax-1

Do i = 2,ymax-1

a = (ymax-2)*(j-2)+i-1-1

If(j.eq.2 .and. i .ge. 3 .and. i .le. (ymax-2))then
row(a) = 6

else if (j.eq.(xmax-1) .and. i.ge.3 .and. i .le. (ymax-2)) then
row(a) = 6

else if(i.eq.2 .and. j.ge.3 .and. j.le.(xmax-2))then
row(a) = 6

else if(i.eq.(ymax-1) .and. j.ge.3 .and. j.le.(xmax-2)) then
row(a) = 6

else if(i.eq.2 .and. j.eq.2) then
row(a) = 4

else if (i.eq.2 .and. j .eq. (xmax-1)) then
row(a)= 4

else if (i.eq.(ymax-1) .and. j .eq. 2) then
row(a) = 4

else if (i .eq. (ymax-1) .and. j .eq. (xmax-1)) then
row(a) = 4

else
row(a) = 9

end if


end do

end do


call MatCreateSeqAIJ(PETSC_COMM_SELF,N,N,ze,row,Mp,ierr)

.

But I get the following error message :

[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Argument out of range
[0]PETSC ERROR: New nonzero at (61,124) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off 
this check
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[0]PETSC ERROR: ./Test5 on a arch-opt named CB272PP-THINK1 by maahi Tue Apr 23 
21:39:26 2019
[0]PETSC ERROR: Configure options --with-debugging=0 --download-fblaslapack=1 
PETSC_ARCH=arch-opt
[0]PETSC ERROR: #1 MatSetValues_SeqAIJ() line 481 in 
/home/maahi/petsc/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #2 MatSetValues() line 1349 in 
/home/maahi/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Argument out of range
[0]PETSC ERROR: New nonzero at (124,186) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off 
this check
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[0]PETSC ERROR: ./Test5 on a arch-opt named CB272PP-THINK1 by maahi Tue Apr 23 
21:39:26 2019
[0]PETSC ERROR: Configure options --with-debugging=0 --download-fblaslapack=1 
PETSC_ARCH=arch-opt
[0]PETSC ERROR: #3 MatSetValues_SeqAIJ() line 481 in 
/home/maahi/petsc/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #4 MatSetValues() line 1349 in 
/home/maahi/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: - Error Message 
--
.

But instead of using 'nnz', if I put a upper bound for 'nz',  the code works 
fine.

Any idea what went wrong?

Thanks,
Maahi Talukder



Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-04 Thread Zhang, Junchao via petsc-users
I updated the branch and made a PR. I tried to do MPI_SUM on MPI_CHAR. We do 
not have UnpackAdd on this type (we are right). But unfortunately, MPICH's 
MPI_Reduce_local did not report errors (it should) so we did not generate an 
error either.

--Junchao Zhang


On Thu, Apr 4, 2019 at 10:37 AM Jed Brown 
mailto:j...@jedbrown.org>> wrote:
Fande Kong via petsc-users 
mailto:petsc-users@mcs.anl.gov>> writes:

> Hi Jed,
>
> One more question. Is it fine to use the same SF to exchange two groups of
> data at the same time? What is the better way to do this

This should work due to the non-overtaking property defined by MPI.

> Fande Kong,
>
>  ierr =
> PetscSFReduceBegin(ptap->sf,MPIU_INT,rmtspace,space,MPIU_REPLACE);CHKERRQ(ierr);
>  ierr =
> PetscSFReduceBegin(ptap->sf,MPI_CHAR,rmtspace2,space2,MPIU_REPLACE);CHKERRQ(ierr);
>  Doing some calculations
>  ierr =
> PetscSFReduceEnd(ptap->sf,MPIU_INT,rmtspace,space,MPIU_REPLACE);CHKERRQ(ierr);
>  ierr =
> PetscSFReduceEnd(ptap->sf,MPI_CHAR,rmtspace2,space2,MPIU_REPLACE);CHKERRQ(ierr);


Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-03 Thread Zhang, Junchao via petsc-users


On Wed, Apr 3, 2019, 10:29 PM Fande Kong 
mailto:fdkong...@gmail.com>> wrote:
Thanks for the reply.  It is not necessary for me to use MPI_SUM.  I think the 
better choice is MPIU_REPLACE. Doesn’t MPIU_REPLACE work for any mpi_datatype?
Yes.
Fande


On Apr 3, 2019, at 9:15 PM, Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:


On Wed, Apr 3, 2019 at 3:41 AM Lisandro Dalcin via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
IIRC, MPI_CHAR is for ASCII text data. Also, remember that in C the signedness 
of plain `char` is implementation (or platform?) dependent.
 I'm not sure MPI_Reduce() is supposed to / should  handle MPI_CHAR, you should 
use MPI_{SIGNED|UNSIGNED}_CHAR for that. Note however that MPI_SIGNED_CHAR is 
from MPI 2.0.

MPI standard chapter 5.9.3, says "MPI_CHAR, MPI_WCHAR, and MPI_CHARACTER (which 
represent printable characters) cannot be used in reduction operations"
So Fande's code and Jed's branch have problems. To fix that, we have to add 
support for signed char, unsigned char, and char in PetscSF.  The first two 
types support add, mult, logical and bitwise operations. The last is a dumb 
type, only supports pack/unpack. With this fix, PetscSF/MPI would raise error 
on Fande's code. I can come up with a fix tomorrow.


On Wed, 3 Apr 2019 at 07:01, Fande Kong via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi All,

There were some error messages when using PetscSFReduceBegin with MPI_CHAR.

ierr = 
PetscSFReduceBegin(ptap->sf,MPI_CHAR,rmtspace,space,MPI_SUM);CHKERRQ(ierr);


My question would be: Does PetscSFReduceBegin suppose work with MPI_CHAR? If 
not, should we document somewhere?

Thanks

Fande,


[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: No support for this operation for this object type
[0]PETSC ERROR: No support for type size not divisible by 4
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e  GIT 
Date: 2019-04-02 17:37:18 -0600
[0]PETSC ERROR: [1]PETSC ERROR: - Error Message 
--
[1]PETSC ERROR: No support for this operation for this object type
[1]PETSC ERROR: No support for type size not divisible by 4
[1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e  GIT 
Date: 2019-04-02 17:37:18 -0600
[1]PETSC ERROR: ./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once named 
fn605731.local by kongf Tue Apr  2 21:48:41 2019
[1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes 
--with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 
--download-parmetis=1 --download-superlu_dist=1 
PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch 
--download-party --download-chaco --with-cxx-dialect=C++11
[1]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[1]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[1]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once named fn605731.local by 
kongf Tue Apr  2 21:48:41 2019
[0]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes 
--with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 
--download-parmetis=1 --download-superlu_dist=1 
PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch 
--download-party --download-chaco --with-cxx-dialect=C++11
[0]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[0]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[0]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[0]PETSC ERROR: #4 PetscSFReduceBegin() line 1208 in 
/Users/kongf/projects/petsc/src/vec/is/sf/interface/sf.c
[0]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIAIJ_allatonce() line 850 in 
/Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
[0]PETSC ERROR: #6 MatPtAP_MPIAIJ_MPIAIJ() line 202 in 
/Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
[0]PETSC ERROR: #7 MatPtAP() line 9429 in 
/Users/kongf/projects/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: #8 main() line 58 in 
/Users/kongf/projects/petsc/src/mat/examples/tests/ex90.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -matptap_via allatonce
[0]PETSC ERROR: End of Error Message ---send entire error 
message to petsc-ma...@mcs.a

Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-03 Thread Zhang, Junchao via petsc-users

On Wed, Apr 3, 2019 at 3:41 AM Lisandro Dalcin via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
IIRC, MPI_CHAR is for ASCII text data. Also, remember that in C the signedness 
of plain `char` is implementation (or platform?) dependent.
 I'm not sure MPI_Reduce() is supposed to / should  handle MPI_CHAR, you should 
use MPI_{SIGNED|UNSIGNED}_CHAR for that. Note however that MPI_SIGNED_CHAR is 
from MPI 2.0.

MPI standard chapter 5.9.3, says "MPI_CHAR, MPI_WCHAR, and MPI_CHARACTER (which 
represent printable characters) cannot be used in reduction operations"
So Fande's code and Jed's branch have problems. To fix that, we have to add 
support for signed char, unsigned char, and char in PetscSF.  The first two 
types support add, mult, logical and bitwise operations. The last is a dumb 
type, only supports pack/unpack. With this fix, PetscSF/MPI would raise error 
on Fande's code. I can come up with a fix tomorrow.


On Wed, 3 Apr 2019 at 07:01, Fande Kong via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi All,

There were some error messages when using PetscSFReduceBegin with MPI_CHAR.

ierr = 
PetscSFReduceBegin(ptap->sf,MPI_CHAR,rmtspace,space,MPI_SUM);CHKERRQ(ierr);


My question would be: Does PetscSFReduceBegin suppose work with MPI_CHAR? If 
not, should we document somewhere?

Thanks

Fande,


[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: No support for this operation for this object type
[0]PETSC ERROR: No support for type size not divisible by 4
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e  GIT 
Date: 2019-04-02 17:37:18 -0600
[0]PETSC ERROR: [1]PETSC ERROR: - Error Message 
--
[1]PETSC ERROR: No support for this operation for this object type
[1]PETSC ERROR: No support for type size not divisible by 4
[1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e  GIT 
Date: 2019-04-02 17:37:18 -0600
[1]PETSC ERROR: ./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once named 
fn605731.local by kongf Tue Apr  2 21:48:41 2019
[1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes 
--with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 
--download-parmetis=1 --download-superlu_dist=1 
PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch 
--download-party --download-chaco --with-cxx-dialect=C++11
[1]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[1]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[1]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once named fn605731.local by 
kongf Tue Apr  2 21:48:41 2019
[0]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes 
--with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 
--download-parmetis=1 --download-superlu_dist=1 
PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch 
--download-party --download-chaco --with-cxx-dialect=C++11
[0]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[0]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[0]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in 
/Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
[0]PETSC ERROR: #4 PetscSFReduceBegin() line 1208 in 
/Users/kongf/projects/petsc/src/vec/is/sf/interface/sf.c
[0]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIAIJ_allatonce() line 850 in 
/Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
[0]PETSC ERROR: #6 MatPtAP_MPIAIJ_MPIAIJ() line 202 in 
/Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
[0]PETSC ERROR: #7 MatPtAP() line 9429 in 
/Users/kongf/projects/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: #8 main() line 58 in 
/Users/kongf/projects/petsc/src/mat/examples/tests/ex90.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -matptap_via allatonce
[0]PETSC ERROR: End of Error Message ---send entire error 
message to petsc-ma...@mcs.anl.gov--
[1]PETSC ERROR: #4 PetscSFReduceBegin() line 1208 in 
/Users/kongf/projects/petsc/src/vec/is/sf/interface/sf.c
[1]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIAIJ_allatonce() line 850 in 
/Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
[1]PETSC ERROR: #6 MatPtAP_MPIAIJ_MPIAIJ() line 202 in 

Re: [petsc-users] MPI Communication times

2019-03-23 Thread Zhang, Junchao via petsc-users
Before further looking into it, can you try these:
 * It seems you used petsc 3.9.4. Could you update to petsc master branch? We 
have an optimization (after 3.9.4) that is very useful for VecScatter on DMDA 
vectors.
* To measure performance, you do not want that many printfs.
* Only measure the parallel part of your program, i.e., skip the init and I/O 
part. You can use petsc stages, see src/vec/vscat/examples/ex4.c for an example
* Since your grid is 3000 x 200 x 100, so can you measure with 60 and 240 
processors? It is easy to do analysis with balanced partition.

Thanks.
--Junchao Zhang


On Fri, Mar 22, 2019 at 6:53 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
This is a 3D fluid dynamics code, it uses arakawa C type grids and curvilinear 
coordinates in nonhydrostatic navier stokes, we also add realistic 
stratification (Temperature / Density) and subgrid scale for turbulence. What 
we are solving here is just a seamount with a velocity forcing from one side 
and is just 5 pressure solvers or iterations.

PETSc is used via the DMDAs to set up the grids and arrays and do (almost) 
every calculation in a distributed manner, the pressure solver is implicit and 
carried out with the KSP module. I/O is still serial.

I am attaching the run outputs with the format 60mNP.txt with NP the number of 
processors used. These are large files you can read with tail -n 140 [filename] 
for the -log_view part

Thanks for your help,



On Fri, Mar 22, 2019 at 3:40 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:

On Fri, Mar 22, 2019 at 4:55 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
No, is the same problem running with different number of processors, i have 
data from 1 to 20 processors in increments of 20 processors/1 node, and 
additionally for 1 processor.

That means you used strong scaling. If we combine VecScatterBegin/End, from 20 
cores, to 100, 200 cores, it took 2%, 13%, 18% of the execution time 
respectively. It looks very unscalable. I do not know why.
VecScatterBegin took the same time with 100 and 200 cores. My explanation is  
VecScatterBegin just packs data and then calls non-blocking MPI_Isend. However, 
VecScatterEnd has to wait for data to come.
Could you tell us more about your problem, for example, is it 2D or 3D, what is 
the communication pattern, how many neighbors each rank has. Also attach the 
whole log files for -log_view so that we can know the problem better.
Thanks.

On Fri, Mar 22, 2019 at 2:48 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Did you change problem size with different runs?

On Fri, Mar 22, 2019 at 4:09 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Hello,

I repeated the timings with the -log_sync option and now i get for 200 
processors / 20 nodes:


Event Count  Time (sec) Flop
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s


VecScatterBarrie3014 1.0 5.6771e+01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   5  0  0  0  0 0
VecScatterBegin3014 1.0 3.1684e+01 2.0 0.00e+00 0.0 4.2e+06 1.1e+06 2.8e+01 
 4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.1383e+02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

With 100 processors / 10 nodes:

VecScatterBarrie3010 1.0 7.4430e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  7  0  0  0  0   7  0  0  0  0 0
VecScatterBegin3010 1.0 3.8504e+01 2.4 0.00e+00 0.0 1.6e+06 2.0e+06 2.8e+01 
 4  0 71 66  0   4  0 71 66  0 0
VecScatterEnd   2972 1.0 8.5158e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0   9  0  0  0  0 0

And with 20 processors / 1 node:

VecScatterBarrie2596 1.0 4.0614e+01 7.3 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0 0
VecScatterBegin 2596 1.0 1.4970e+01 1.3 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  1  0 81 61  0   1  0 81 61  0 0
VecScatterEnd   2558 1.0 1.4903e+01 1.3 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0 0

Can you help me interpret this? what i see is the End portion taking more 
relative time and Begin staying the same beyond one node, also Barrier and 
Begin counts are the same every time, but how do i estimate communication times 
from here?

Thanks,


On Wed, Mar 20, 2019 at 3:24 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Forgot to mention long VecScatter time might also due to local memory copies. 
If the communication pattern has large local to local (self to self)  scatter, 
which often happens thanks to locality, 

Re: [petsc-users] MPI Communication times

2019-03-22 Thread Zhang, Junchao via petsc-users

On Fri, Mar 22, 2019 at 4:55 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
No, is the same problem running with different number of processors, i have 
data from 1 to 20 processors in increments of 20 processors/1 node, and 
additionally for 1 processor.

That means you used strong scaling. If we combine VecScatterBegin/End, from 20 
cores, to 100, 200 cores, it took 2%, 13%, 18% of the execution time 
respectively. It looks very unscalable. I do not know why.
VecScatterBegin took the same time with 100 and 200 cores. My explanation is  
VecScatterBegin just packs data and then calls non-blocking MPI_Isend. However, 
VecScatterEnd has to wait for data to come.
Could you tell us more about your problem, for example, is it 2D or 3D, what is 
the communication pattern, how many neighbors each rank has. Also attach the 
whole log files for -log_view so that we can know the problem better.
Thanks.

On Fri, Mar 22, 2019 at 2:48 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Did you change problem size with different runs?

On Fri, Mar 22, 2019 at 4:09 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Hello,

I repeated the timings with the -log_sync option and now i get for 200 
processors / 20 nodes:


Event Count  Time (sec) Flop
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s


VecScatterBarrie3014 1.0 5.6771e+01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   5  0  0  0  0 0
VecScatterBegin3014 1.0 3.1684e+01 2.0 0.00e+00 0.0 4.2e+06 1.1e+06 2.8e+01 
 4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.1383e+02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

With 100 processors / 10 nodes:

VecScatterBarrie3010 1.0 7.4430e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  7  0  0  0  0   7  0  0  0  0 0
VecScatterBegin3010 1.0 3.8504e+01 2.4 0.00e+00 0.0 1.6e+06 2.0e+06 2.8e+01 
 4  0 71 66  0   4  0 71 66  0 0
VecScatterEnd   2972 1.0 8.5158e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0   9  0  0  0  0 0

And with 20 processors / 1 node:

VecScatterBarrie2596 1.0 4.0614e+01 7.3 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0 0
VecScatterBegin 2596 1.0 1.4970e+01 1.3 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  1  0 81 61  0   1  0 81 61  0 0
VecScatterEnd   2558 1.0 1.4903e+01 1.3 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0 0

Can you help me interpret this? what i see is the End portion taking more 
relative time and Begin staying the same beyond one node, also Barrier and 
Begin counts are the same every time, but how do i estimate communication times 
from here?

Thanks,


On Wed, Mar 20, 2019 at 3:24 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Forgot to mention long VecScatter time might also due to local memory copies. 
If the communication pattern has large local to local (self to self)  scatter, 
which often happens thanks to locality, then the memory copy time is counted in 
VecScatter. You can analyze your code's communication pattern to see if it is 
the case.

--Junchao Zhang


On Wed, Mar 20, 2019 at 4:44 PM Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:


On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Thanks for your answer, so for example i have a log for 200 cores across 10 
nodes that reads:


Event   Count  Time (sec) Flop  
   --- Global ---  --- Stage ---   Total
Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
--
VecScatterBegin 3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 4.2e+06 1.1e+06 
2.8e+01  4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

While for 20 nodes at one node i have:
 What does that mean?
VecScatterBegin 2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  2  0 81 61  0   2  0 81 61  0 0
VecScatterEnd   2558 1.0 8.0344e+01 7.9 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0   3  0  0  0  0 0

Where do i see the max/min ratio in here? and why End step is all 0.

Re: [petsc-users] MPI Communication times

2019-03-22 Thread Zhang, Junchao via petsc-users
Did you change problem size with different runs?

On Fri, Mar 22, 2019 at 4:09 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Hello,

I repeated the timings with the -log_sync option and now i get for 200 
processors / 20 nodes:


Event Count  Time (sec) Flop
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s


VecScatterBarrie3014 1.0 5.6771e+01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   5  0  0  0  0 0
VecScatterBegin3014 1.0 3.1684e+01 2.0 0.00e+00 0.0 4.2e+06 1.1e+06 2.8e+01 
 4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.1383e+02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

With 100 processors / 10 nodes:

VecScatterBarrie3010 1.0 7.4430e+01 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  7  0  0  0  0   7  0  0  0  0 0
VecScatterBegin3010 1.0 3.8504e+01 2.4 0.00e+00 0.0 1.6e+06 2.0e+06 2.8e+01 
 4  0 71 66  0   4  0 71 66  0 0
VecScatterEnd   2972 1.0 8.5158e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0   9  0  0  0  0 0

And with 20 processors / 1 node:

VecScatterBarrie2596 1.0 4.0614e+01 7.3 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0 0
VecScatterBegin 2596 1.0 1.4970e+01 1.3 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  1  0 81 61  0   1  0 81 61  0 0
VecScatterEnd   2558 1.0 1.4903e+01 1.3 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0 0

Can you help me interpret this? what i see is the End portion taking more 
relative time and Begin staying the same beyond one node, also Barrier and 
Begin counts are the same every time, but how do i estimate communication times 
from here?

Thanks,


On Wed, Mar 20, 2019 at 3:24 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Forgot to mention long VecScatter time might also due to local memory copies. 
If the communication pattern has large local to local (self to self)  scatter, 
which often happens thanks to locality, then the memory copy time is counted in 
VecScatter. You can analyze your code's communication pattern to see if it is 
the case.

--Junchao Zhang


On Wed, Mar 20, 2019 at 4:44 PM Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:


On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Thanks for your answer, so for example i have a log for 200 cores across 10 
nodes that reads:


Event   Count  Time (sec) Flop  
   --- Global ---  --- Stage ---   Total
Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
--
VecScatterBegin 3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 4.2e+06 1.1e+06 
2.8e+01  4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

While for 20 nodes at one node i have:
 What does that mean?
VecScatterBegin 2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  2  0 81 61  0   2  0 81 61  0 0
VecScatterEnd   2558 1.0 8.0344e+01 7.9 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0   3  0  0  0  0 0

Where do i see the max/min ratio in here? and why End step is all 0.0e00 in 
both but still grows from 3% to 14% of total time? It seems i would need to run 
again with the -log_sync option, is this correct?

e.g., 2.1, 7.9. MPI send/recv are in VecScatterBegin(). VecScatterEnd() only 
does MPI_Wait. That is why it has zero messages. Yes, run with -log_sync and 
see what happens.

Different question, can't i estimate the total communication time if i had a 
typical communication time per MPI message times the number of MPI messages 
reported in the log? or it doesn't work like that?

Probably not work because you have multiple processes doing send/recv at the 
same time. They might saturate the bandwidth. Petsc also does 
computation/communication overlapping.

Thanks.





On Wed, Mar 20, 2019 at 2:02 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
See the "Mess   AvgLen  Reduct" number in each log stage.  Mess is the total 
number of messages sent in an event over all processes.  AvgLen is average 
message len. Reduct is the num

Re: [petsc-users] Valgrind Issue With Ghosted Vectors

2019-03-21 Thread Zhang, Junchao via petsc-users


On Thu, Mar 21, 2019 at 1:57 PM Derek Gaston via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
It sounds like you already tracked this down... but for completeness here is 
what track-origins gives:

==262923== Conditional jump or move depends on uninitialised value(s)
==262923==at 0x73C6548: VecScatterMemcpyPlanCreate_Index (vscat.c:294)
==262923==by 0x73DBD97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)
==262923==by 0x73DE6AE: VecScatterCreateCommon_PtoS_MPI1 
(vpscat_mpi1.c:2328)
==262923==by 0x73DFFEA: VecScatterCreateLocal_PtoS_MPI1 (vpscat_mpi1.c:2202)
==262923==by 0x73C7A51: VecScatterCreate_PtoS (vscat.c:608)
==262923==by 0x73C9E8A: VecScatterSetUp_vectype_private (vscat.c:857)
==262923==by 0x73CBE5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
==262923==by 0x7413D39: VecScatterSetUp (vscatfce.c:212)
==262923==by 0x7412D73: VecScatterCreateWithData (vscreate.c:333)
==262923==by 0x747A232: VecCreateGhostWithArray (pbvec.c:685)
==262923==by 0x747A90D: VecCreateGhost (pbvec.c:741)
==262923==by 0x5C7FFD6: libMesh::PetscVector::init(unsigned long, 
unsigned long, std::vector > 
const&, bool, libMesh::ParallelType) (petsc_vector.h:752)
==262923==  Uninitialised value was created by a heap allocation

I checked the code but could not figure out what was wrong.  Perhaps you should 
use 64-bit integers and see whether the warning still exists.  Please remember 
to incorporate Stefano's bug fix.

==262923==at 0x402DDC6: memalign (vg_replace_malloc.c:899)
==262923==by 0x7359702: PetscMallocAlign (mal.c:41)
==262923==by 0x7359C70: PetscMallocA (mal.c:390)
==262923==by 0x73DECF0: VecScatterCreateLocal_PtoS_MPI1 (vpscat_mpi1.c:2061)
==262923==by 0x73C7A51: VecScatterCreate_PtoS (vscat.c:608)
==262923==by 0x73C9E8A: VecScatterSetUp_vectype_private (vscat.c:857)
==262923==by 0x73CBE5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
==262923==by 0x7413D39: VecScatterSetUp (vscatfce.c:212)
==262923==by 0x7412D73: VecScatterCreateWithData (vscreate.c:333)
==262923==by 0x747A232: VecCreateGhostWithArray (pbvec.c:685)
==262923==by 0x747A90D: VecCreateGhost (pbvec.c:741)
==262923==by 0x5C7FFD6: libMesh::PetscVector::init(unsigned long, 
unsigned long, std::vector > 
const&, bool, libMesh::ParallelType) (petsc_vector.h:752)


BTW: This turned out not to be my actual problem.  My actual problem was just 
some stupidity on my part... just a simple input parameter issue to my code 
(should have had better error checking!).

But: It sounds like my digging may have uncovered something real here... so it 
wasn't completely useless :-)

Thanks for your help everyone!

Derek



On Thu, Mar 21, 2019 at 10:38 AM Stefano Zampini 
mailto:stefano.zamp...@gmail.com>> wrote:


Il giorno mer 20 mar 2019 alle ore 23:40 Derek Gaston via petsc-users 
mailto:petsc-users@mcs.anl.gov>> ha scritto:
Trying to track down some memory corruption I'm seeing on larger scale runs 
(3.5B+ unknowns).

Uhm are you using 32bit indices? is it possible there's integer overflow 
somewhere?


Was able to run Valgrind on it... and I'm seeing quite a lot of uninitialized 
value errors coming from ghost updating.  Here are some of the traces:

==87695== Conditional jump or move depends on uninitialised value(s)
==87695==at 0x73236D3: PetscMallocAlign (mal.c:28)
==87695==by 0x7323C70: PetscMallocA (mal.c:390)
==87695==by 0x739048E: VecScatterMemcpyPlanCreate_Index (vscat.c:284)
==87695==by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)
==64730==by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)
==64730==by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
==64730==by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)
==64730==by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)
==64730==by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)
==64730==by 0x744490D: VecCreateGhost (pbvec.c:741)

==133582== Conditional jump or move depends on uninitialised value(s)
==133582==at 0x4030384: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)
==133582==by 0x739E4F9: PetscMemcpy (petscsys.h:1649)
==133582==by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack 
(vecscatterimpl.h:150)
==133582==by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)
==133582==by 0x73DD964: VecScatterBegin (vscatfce.c:110)
==133582==by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)

This is from a Git checkout of PETSc... the hash I branched from is: 
0e667e8fea4aa from December 23rd (updating would be really hard at this point 
as I've completed 90% of my dissertation with this version... and changing 
PETSc now would be pretty painful!).

Any ideas?  Is it possible it's in my code?  Is it possible that there are 
later PETSc commits that already fix this?

Thanks for any help,
Derek



--
Stefano


Re: [petsc-users] Valgrind Issue With Ghosted Vectors

2019-03-21 Thread Zhang, Junchao via petsc-users
Yes, it does.  It is a bug.
--Junchao Zhang


On Thu, Mar 21, 2019 at 11:16 AM Balay, Satish 
mailto:ba...@mcs.anl.gov>> wrote:
Does maint also need this fix?

Satish

On Thu, 21 Mar 2019, Stefano Zampini via petsc-users wrote:

> Derek
>
> I have fixed the optimized plan few weeks ago
>
> https://bitbucket.org/petsc/petsc/commits/c3caad8634d376283f7053f3b388606b45b3122c
>
> Maybe this will fix your problem too?
>
> Stefano
>
>
> Il Gio 21 Mar 2019, 04:21 Zhang, Junchao via petsc-users <
> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> ha scritto:
>
> > Hi, Derek,
> >   Try to apply this tiny (but dirty) patch on your version of PETSc to
> > disable the VecScatterMemcpyPlan optimization to see if it helps.
> >   Thanks.
> > --Junchao Zhang
> >
> > On Wed, Mar 20, 2019 at 6:33 PM Junchao Zhang 
> > mailto:jczh...@mcs.anl.gov>> wrote:
> >
> >> Did you see the warning with small scale runs?  Is it possible to provide
> >> a test code?
> >> You mentioned "changing PETSc now would be pretty painful". Is it because
> >> it will affect your performance (but not your code)?  If yes, could you try
> >> PETSc master and run you code with or without -vecscatter_type sf.  I want
> >> to isolate the problem and see if it is due to possible bugs in VecScatter.
> >> If the above suggestion is not feasible, I will disable VecScatterMemcpy.
> >> It is an optimization I added. Sorry I did not have an option to turn off
> >> it because I thought it was always useful:)  I will provide you a patch
> >> later to disable it. With that you can run again to isolate possible bugs
> >> in VecScatterMemcpy.
> >> Thanks.
> >> --Junchao Zhang
> >>
> >>
> >> On Wed, Mar 20, 2019 at 5:40 PM Derek Gaston via petsc-users <
> >> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
> >>
> >>> Trying to track down some memory corruption I'm seeing on larger scale
> >>> runs (3.5B+ unknowns).  Was able to run Valgrind on it... and I'm seeing
> >>> quite a lot of uninitialized value errors coming from ghost updating.  
> >>> Here
> >>> are some of the traces:
> >>>
> >>> ==87695== Conditional jump or move depends on uninitialised value(s)
> >>> ==87695==at 0x73236D3: PetscMallocAlign (mal.c:28)
> >>> ==87695==by 0x7323C70: PetscMallocA (mal.c:390)
> >>> ==87695==by 0x739048E: VecScatterMemcpyPlanCreate_Index (vscat.c:284)
> >>> ==87695==by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP
> >>> (vpscat_mpi1.c:312)
> >>> ==64730==by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)
> >>> ==64730==by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
> >>> ==64730==by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)
> >>> ==64730==by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)
> >>> ==64730==by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)
> >>> ==64730==by 0x744490D: VecCreateGhost (pbvec.c:741)
> >>>
> >>> ==133582== Conditional jump or move depends on uninitialised value(s)
> >>> ==133582==at 0x4030384: memcpy@@GLIBC_2.14
> >>> (vg_replace_strmem.c:1034)
> >>> ==133582==by 0x739E4F9: PetscMemcpy (petscsys.h:1649)
> >>> ==133582==by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack
> >>> (vecscatterimpl.h:150)
> >>> ==133582==by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)
> >>> ==133582==by 0x73DD964: VecScatterBegin (vscatfce.c:110)
> >>> ==133582==by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)
> >>>
> >>> This is from a Git checkout of PETSc... the hash I branched from is:
> >>> 0e667e8fea4aa from December 23rd (updating would be really hard at this
> >>> point as I've completed 90% of my dissertation with this version... and
> >>> changing PETSc now would be pretty painful!).
> >>>
> >>> Any ideas?  Is it possible it's in my code?  Is it possible that there
> >>> are later PETSc commits that already fix this?
> >>>
> >>> Thanks for any help,
> >>> Derek
> >>>
> >>>
>



Re: [petsc-users] Valgrind Issue With Ghosted Vectors

2019-03-21 Thread Zhang, Junchao via petsc-users
Thanks to Stefano for fixing this bug.  His fix is easy to apply (two-line 
change) and therefore should be tried first.

--Junchao Zhang


On Thu, Mar 21, 2019 at 3:02 AM Stefano Zampini 
mailto:stefano.zamp...@gmail.com>> wrote:
Derek

I have fixed the optimized plan few weeks ago

https://bitbucket.org/petsc/petsc/commits/c3caad8634d376283f7053f3b388606b45b3122c

Maybe this will fix your problem too?

Stefano


Il Gio 21 Mar 2019, 04:21 Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> ha scritto:
Hi, Derek,
  Try to apply this tiny (but dirty) patch on your version of PETSc to disable 
the VecScatterMemcpyPlan optimization to see if it helps.
  Thanks.
--Junchao Zhang

On Wed, Mar 20, 2019 at 6:33 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
Did you see the warning with small scale runs?  Is it possible to provide a 
test code?
You mentioned "changing PETSc now would be pretty painful". Is it because it 
will affect your performance (but not your code)?  If yes, could you try PETSc 
master and run you code with or without -vecscatter_type sf.  I want to isolate 
the problem and see if it is due to possible bugs in VecScatter.
If the above suggestion is not feasible, I will disable VecScatterMemcpy. It is 
an optimization I added. Sorry I did not have an option to turn off it because 
I thought it was always useful:)  I will provide you a patch later to disable 
it. With that you can run again to isolate possible bugs in VecScatterMemcpy.
Thanks.
--Junchao Zhang


On Wed, Mar 20, 2019 at 5:40 PM Derek Gaston via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Trying to track down some memory corruption I'm seeing on larger scale runs 
(3.5B+ unknowns).  Was able to run Valgrind on it... and I'm seeing quite a lot 
of uninitialized value errors coming from ghost updating.  Here are some of the 
traces:

==87695== Conditional jump or move depends on uninitialised value(s)
==87695==at 0x73236D3: PetscMallocAlign (mal.c:28)
==87695==by 0x7323C70: PetscMallocA (mal.c:390)
==87695==by 0x739048E: VecScatterMemcpyPlanCreate_Index (vscat.c:284)
==87695==by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)
==64730==by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)
==64730==by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
==64730==by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)
==64730==by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)
==64730==by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)
==64730==by 0x744490D: VecCreateGhost (pbvec.c:741)

==133582== Conditional jump or move depends on uninitialised value(s)
==133582==at 0x4030384: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)
==133582==by 0x739E4F9: PetscMemcpy (petscsys.h:1649)
==133582==by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack 
(vecscatterimpl.h:150)
==133582==by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)
==133582==by 0x73DD964: VecScatterBegin (vscatfce.c:110)
==133582==by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)

This is from a Git checkout of PETSc... the hash I branched from is: 
0e667e8fea4aa from December 23rd (updating would be really hard at this point 
as I've completed 90% of my dissertation with this version... and changing 
PETSc now would be pretty painful!).

Any ideas?  Is it possible it's in my code?  Is it possible that there are 
later PETSc commits that already fix this?

Thanks for any help,
Derek



Re: [petsc-users] Valgrind Issue With Ghosted Vectors

2019-03-20 Thread Zhang, Junchao via petsc-users
Hi, Derek,
  Try to apply this tiny (but dirty) patch on your version of PETSc to disable 
the VecScatterMemcpyPlan optimization to see if it helps.
  Thanks.
--Junchao Zhang

On Wed, Mar 20, 2019 at 6:33 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
Did you see the warning with small scale runs?  Is it possible to provide a 
test code?
You mentioned "changing PETSc now would be pretty painful". Is it because it 
will affect your performance (but not your code)?  If yes, could you try PETSc 
master and run you code with or without -vecscatter_type sf.  I want to isolate 
the problem and see if it is due to possible bugs in VecScatter.
If the above suggestion is not feasible, I will disable VecScatterMemcpy. It is 
an optimization I added. Sorry I did not have an option to turn off it because 
I thought it was always useful:)  I will provide you a patch later to disable 
it. With that you can run again to isolate possible bugs in VecScatterMemcpy.
Thanks.
--Junchao Zhang


On Wed, Mar 20, 2019 at 5:40 PM Derek Gaston via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Trying to track down some memory corruption I'm seeing on larger scale runs 
(3.5B+ unknowns).  Was able to run Valgrind on it... and I'm seeing quite a lot 
of uninitialized value errors coming from ghost updating.  Here are some of the 
traces:

==87695== Conditional jump or move depends on uninitialised value(s)
==87695==at 0x73236D3: PetscMallocAlign (mal.c:28)
==87695==by 0x7323C70: PetscMallocA (mal.c:390)
==87695==by 0x739048E: VecScatterMemcpyPlanCreate_Index (vscat.c:284)
==87695==by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)
==64730==by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)
==64730==by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
==64730==by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)
==64730==by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)
==64730==by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)
==64730==by 0x744490D: VecCreateGhost (pbvec.c:741)

==133582== Conditional jump or move depends on uninitialised value(s)
==133582==at 0x4030384: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)
==133582==by 0x739E4F9: PetscMemcpy (petscsys.h:1649)
==133582==by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack 
(vecscatterimpl.h:150)
==133582==by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)
==133582==by 0x73DD964: VecScatterBegin (vscatfce.c:110)
==133582==by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)

This is from a Git checkout of PETSc... the hash I branched from is: 
0e667e8fea4aa from December 23rd (updating would be really hard at this point 
as I've completed 90% of my dissertation with this version... and changing 
PETSc now would be pretty painful!).

Any ideas?  Is it possible it's in my code?  Is it possible that there are 
later PETSc commits that already fix this?

Thanks for any help,
Derek



vscat.patch
Description: vscat.patch


Re: [petsc-users] Valgrind Issue With Ghosted Vectors

2019-03-20 Thread Zhang, Junchao via petsc-users
Did you see the warning with small scale runs?  Is it possible to provide a 
test code?
You mentioned "changing PETSc now would be pretty painful". Is it because it 
will affect your performance (but not your code)?  If yes, could you try PETSc 
master and run you code with or without -vecscatter_type sf.  I want to isolate 
the problem and see if it is due to possible bugs in VecScatter.
If the above suggestion is not feasible, I will disable VecScatterMemcpy. It is 
an optimization I added. Sorry I did not have an option to turn off it because 
I thought it was always useful:)  I will provide you a patch later to disable 
it. With that you can run again to isolate possible bugs in VecScatterMemcpy.
Thanks.
--Junchao Zhang


On Wed, Mar 20, 2019 at 5:40 PM Derek Gaston via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Trying to track down some memory corruption I'm seeing on larger scale runs 
(3.5B+ unknowns).  Was able to run Valgrind on it... and I'm seeing quite a lot 
of uninitialized value errors coming from ghost updating.  Here are some of the 
traces:

==87695== Conditional jump or move depends on uninitialised value(s)
==87695==at 0x73236D3: PetscMallocAlign (mal.c:28)
==87695==by 0x7323C70: PetscMallocA (mal.c:390)
==87695==by 0x739048E: VecScatterMemcpyPlanCreate_Index (vscat.c:284)
==87695==by 0x73A5D97: VecScatterMemcpyPlanCreate_PtoP (vpscat_mpi1.c:312)
==64730==by 0x7393E8A: VecScatterSetUp_vectype_private (vscat.c:857)
==64730==by 0x7395E5D: VecScatterSetUp_MPI1 (vpscat_mpi1.c:2543)
==64730==by 0x73DDD39: VecScatterSetUp (vscatfce.c:212)
==64730==by 0x73DCD73: VecScatterCreateWithData (vscreate.c:333)
==64730==by 0x7444232: VecCreateGhostWithArray (pbvec.c:685)
==64730==by 0x744490D: VecCreateGhost (pbvec.c:741)

==133582== Conditional jump or move depends on uninitialised value(s)
==133582==at 0x4030384: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)
==133582==by 0x739E4F9: PetscMemcpy (petscsys.h:1649)
==133582==by 0x739E4F9: VecScatterMemcpyPlanExecute_Pack 
(vecscatterimpl.h:150)
==133582==by 0x739E4F9: VecScatterBeginMPI1_1 (vpscat_mpi1.h:69)
==133582==by 0x73DD964: VecScatterBegin (vscatfce.c:110)
==133582==by 0x744E195: VecGhostUpdateBegin (commonmpvec.c:225)

This is from a Git checkout of PETSc... the hash I branched from is: 
0e667e8fea4aa from December 23rd (updating would be really hard at this point 
as I've completed 90% of my dissertation with this version... and changing 
PETSc now would be pretty painful!).

Any ideas?  Is it possible it's in my code?  Is it possible that there are 
later PETSc commits that already fix this?

Thanks for any help,
Derek



Re: [petsc-users] MPI Communication times

2019-03-20 Thread Zhang, Junchao via petsc-users
Forgot to mention long VecScatter time might also due to local memory copies. 
If the communication pattern has large local to local (self to self)  scatter, 
which often happens thanks to locality, then the memory copy time is counted in 
VecScatter. You can analyze your code's communication pattern to see if it is 
the case.

--Junchao Zhang


On Wed, Mar 20, 2019 at 4:44 PM Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:


On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Thanks for your answer, so for example i have a log for 200 cores across 10 
nodes that reads:


Event   Count  Time (sec) Flop  
   --- Global ---  --- Stage ---   Total
Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
--
VecScatterBegin 3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 4.2e+06 1.1e+06 
2.8e+01  4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

While for 20 nodes at one node i have:
 What does that mean?
VecScatterBegin 2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  2  0 81 61  0   2  0 81 61  0 0
VecScatterEnd   2558 1.0 8.0344e+01 7.9 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0   3  0  0  0  0 0

Where do i see the max/min ratio in here? and why End step is all 0.0e00 in 
both but still grows from 3% to 14% of total time? It seems i would need to run 
again with the -log_sync option, is this correct?

e.g., 2.1, 7.9. MPI send/recv are in VecScatterBegin(). VecScatterEnd() only 
does MPI_Wait. That is why it has zero messages. Yes, run with -log_sync and 
see what happens.

Different question, can't i estimate the total communication time if i had a 
typical communication time per MPI message times the number of MPI messages 
reported in the log? or it doesn't work like that?

Probably not work because you have multiple processes doing send/recv at the 
same time. They might saturate the bandwidth. Petsc also does 
computation/communication overlapping.

Thanks.





On Wed, Mar 20, 2019 at 2:02 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
See the "Mess   AvgLen  Reduct" number in each log stage.  Mess is the total 
number of messages sent in an event over all processes.  AvgLen is average 
message len. Reduct is the number of global reduction.
Each event like VecScatterBegin/End has a maximal execution time over all 
processes, and a max/min ratio.  %T is sum(execution time of the event on each 
process)/sum(execution time of the stage on each process). %T indicates how 
expensive the event is. It is a number you should pay attention to.
If your code is imbalanced (i.e., with a big max/min ratio), then the 
performance number is skewed and becomes misleading because some processes are 
just waiting for others. Then, besides -log_view, you can add -log_sync, which 
adds an extra MPI_Barrier for each event to let them start at the same time. 
With that, it is easier to interpret the number.
src/vec/vscat/examples/ex4.c is a tiny example for VecScatter logging.

--Junchao Zhang


On Wed, Mar 20, 2019 at 2:58 PM Manuel Valera via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am working on timing my model, which we made MPI scalable using petsc DMDAs, 
i want to know more about the output log and how to calculate a total 
communication times for my runs, so far i see we have "MPI Messages" and "MPI 
Messages Lengths" in the log, along VecScatterEnd and VecScatterBegin reports.

My question is, how do i interpret these number to get a rough estimate on how 
much overhead we have just from MPI communications times in my model runs?

Thanks,




Re: [petsc-users] MPI Communication times

2019-03-20 Thread Zhang, Junchao via petsc-users


On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Thanks for your answer, so for example i have a log for 200 cores across 10 
nodes that reads:


Event   Count  Time (sec) Flop  
   --- Global ---  --- Stage ---   Total
Max Ratio  Max Ratio   Max  Ratio  Mess   
Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
--
VecScatterBegin 3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 4.2e+06 1.1e+06 
2.8e+01  4  0 63 56  0   4  0 63 56  0 0
VecScatterEnd   2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0 0

While for 20 nodes at one node i have:
 What does that mean?
VecScatterBegin 2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 1.2e+05 4.0e+06 
3.0e+01  2  0 81 61  0   2  0 81 61  0 0
VecScatterEnd   2558 1.0 8.0344e+01 7.9 0.00e+00 0.0  0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0   3  0  0  0  0 0

Where do i see the max/min ratio in here? and why End step is all 0.0e00 in 
both but still grows from 3% to 14% of total time? It seems i would need to run 
again with the -log_sync option, is this correct?

e.g., 2.1, 7.9. MPI send/recv are in VecScatterBegin(). VecScatterEnd() only 
does MPI_Wait. That is why it has zero messages. Yes, run with -log_sync and 
see what happens.

Different question, can't i estimate the total communication time if i had a 
typical communication time per MPI message times the number of MPI messages 
reported in the log? or it doesn't work like that?

Probably not work because you have multiple processes doing send/recv at the 
same time. They might saturate the bandwidth. Petsc also does 
computation/communication overlapping.

Thanks.





On Wed, Mar 20, 2019 at 2:02 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
See the "Mess   AvgLen  Reduct" number in each log stage.  Mess is the total 
number of messages sent in an event over all processes.  AvgLen is average 
message len. Reduct is the number of global reduction.
Each event like VecScatterBegin/End has a maximal execution time over all 
processes, and a max/min ratio.  %T is sum(execution time of the event on each 
process)/sum(execution time of the stage on each process). %T indicates how 
expensive the event is. It is a number you should pay attention to.
If your code is imbalanced (i.e., with a big max/min ratio), then the 
performance number is skewed and becomes misleading because some processes are 
just waiting for others. Then, besides -log_view, you can add -log_sync, which 
adds an extra MPI_Barrier for each event to let them start at the same time. 
With that, it is easier to interpret the number.
src/vec/vscat/examples/ex4.c is a tiny example for VecScatter logging.

--Junchao Zhang


On Wed, Mar 20, 2019 at 2:58 PM Manuel Valera via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am working on timing my model, which we made MPI scalable using petsc DMDAs, 
i want to know more about the output log and how to calculate a total 
communication times for my runs, so far i see we have "MPI Messages" and "MPI 
Messages Lengths" in the log, along VecScatterEnd and VecScatterBegin reports.

My question is, how do i interpret these number to get a rough estimate on how 
much overhead we have just from MPI communications times in my model runs?

Thanks,




Re: [petsc-users] MPI Communication times

2019-03-20 Thread Zhang, Junchao via petsc-users
See the "Mess   AvgLen  Reduct" number in each log stage.  Mess is the total 
number of messages sent in an event over all processes.  AvgLen is average 
message len. Reduct is the number of global reduction.
Each event like VecScatterBegin/End has a maximal execution time over all 
processes, and a max/min ratio.  %T is sum(execution time of the event on each 
process)/sum(execution time of the stage on each process). %T indicates how 
expensive the event is. It is a number you should pay attention to.
If your code is imbalanced (i.e., with a big max/min ratio), then the 
performance number is skewed and becomes misleading because some processes are 
just waiting for others. Then, besides -log_view, you can add -log_sync, which 
adds an extra MPI_Barrier for each event to let them start at the same time. 
With that, it is easier to interpret the number.
src/vec/vscat/examples/ex4.c is a tiny example for VecScatter logging.

--Junchao Zhang


On Wed, Mar 20, 2019 at 2:58 PM Manuel Valera via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am working on timing my model, which we made MPI scalable using petsc DMDAs, 
i want to know more about the output log and how to calculate a total 
communication times for my runs, so far i see we have "MPI Messages" and "MPI 
Messages Lengths" in the log, along VecScatterEnd and VecScatterBegin reports.

My question is, how do i interpret these number to get a rough estimate on how 
much overhead we have just from MPI communications times in my model runs?

Thanks,




Re: [petsc-users] PCFieldSplit with MatNest

2019-03-13 Thread Zhang, Junchao via petsc-users
Manuel,
  Could you try to add this line
 sbaij->free_imax_ilen = PETSC_TRUE;
 after line 2431 in 
/opt/PETSc_library/petsc-3.10.4/src/mat/impls/sbaij/seq/sbaij.c

 PS: Matt, this bug looks unrelated to my VecRestoreArrayRead_Nest fix.

--Junchao Zhang


On Wed, Mar 13, 2019 at 9:05 AM Matthew Knepley 
mailto:knep...@gmail.com>> wrote:
On Wed, Mar 13, 2019 at 9:44 AM Manuel Colera Rico via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Yes:

[ 0]8416 bytes MatCreateSeqSBAIJWithArrays() line 2431 in
/opt/PETSc_library/petsc-3.10.4/src/mat/impls/sbaij/seq/sbaij.c
[ 0]8416 bytes MatCreateSeqSBAIJWithArrays() line 2431 in
/opt/PETSc_library/petsc-3.10.4/src/mat/impls/sbaij/seq/sbaij.c
[ 0]4544 bytes MatCreateSeqSBAIJWithArrays() line 2431 in
/opt/PETSc_library/petsc-3.10.4/src/mat/impls/sbaij/seq/sbaij.c
[ 0]4544 bytes MatCreateSeqSBAIJWithArrays() line 2431 in
/opt/PETSc_library/petsc-3.10.4/src/mat/impls/sbaij/seq/sbaij.c

Junchao, do imax and ilen get missed in the Destroy with the user provides 
arrays?

  
https://bitbucket.org/petsc/petsc/src/06a3e802b3873ffbfd04b71a0821522327dd9b04/src/mat/impls/sbaij/seq/sbaij.c#lines-2431

Matt

I have checked that I have destroyed all the MatNest matrices and all
the submatrices individually.

Manuel

---

On 3/13/19 2:28 PM, Jed Brown wrote:
> Is there any output if you run with -malloc_dump?
>
> Manuel Colera Rico via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> writes:
>
>> Hi, Junchao,
>>
>> I have installed the newest version of PETSc and it works fine. I just
>> get the following memory leak warning:
>>
>> Direct leak of 28608 byte(s) in 12 object(s) allocated from:
>>   #0 0x7f1ddd5caa38 in __interceptor_memalign
>> ../../../../gcc-8.1.0/libsanitizer/asan/asan_malloc_linux.cc:111
>>   #1 0x7f1ddbef1213 in PetscMallocAlign
>> (/opt/PETSc_library/petsc-3.10.4/mcr_20190313/lib/libpetsc.so.3.10+0x150213)
>>
>> Thank you,
>>
>> Manuel
>>
>> ---
>>
>> On 3/12/19 7:08 PM, Zhang, Junchao wrote:
>>> Hi, Manuel,
>>>I recently fixed a problem in VecRestoreArrayRead. Basically, I
>>> added VecRestoreArrayRead_Nest. Could you try the master branch of
>>> PETSc to see if it fixes your problem?
>>>Thanks.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Mon, Mar 11, 2019 at 6:56 AM Manuel Colera Rico via petsc-users
>>> mailto:petsc-users@mcs.anl.gov> 
>>> <mailto:petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>> wrote:
>>>
>>>  Hello,
>>>
>>>  I need to solve a 2*2 block linear system. The matrices A_00, A_01,
>>>  A_10, A_11 are constructed separately via
>>>  MatCreateSeqAIJWithArrays and
>>>  MatCreateSeqSBAIJWithArrays. Then, I construct the full system matrix
>>>  with MatCreateNest, and use MatNestGetISs and PCFieldSplitSetIS to
>>>  set
>>>  up the PC, trying to follow the procedure described here:
>>>  
>>> https://www.mcs.anl.gov/petsc/petsc-current/src/snes/examples/tutorials/ex70.c.html.
>>>
>>>  However, when I run the code with Leak Sanitizer, I get the
>>>  following error:
>>>
>>>  =
>>>  ==54927==ERROR: AddressSanitizer: attempting free on address which
>>>  was
>>>  not malloc()-ed: 0x62751ab8 in thread T0
>>>   #0 0x7fbd95c08f30 in __interceptor_free
>>>  ../../../../gcc-8.1.0/libsanitizer/asan/asan_malloc_linux.cc:66
>>>   #1 0x7fbd92b99dcd in PetscFreeAlign
>>>  
>>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x146dcd)
>>>   #2 0x7fbd92ce0178 in VecRestoreArray_Nest
>>>  
>>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x28d178)
>>>   #3 0x7fbd92cd627d in VecRestoreArrayRead
>>>  
>>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x28327d)
>>>   #4 0x7fbd92d1189e in VecScatterBegin_SSToSS
>>>  
>>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x2be89e)
>>>   #5 0x7fbd92d1a414 in VecScatterBegin
>>>  
>>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x2c7414)
>>>   #6 0x7fbd934a999c in PCApply_FieldSplit
>>>  
>>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa5699c)
>>>   #7 0x7fbd93369071 in PCApply
>>>  
>>> (/opt/

Re: [petsc-users] PetscScatterCreate type mismatch after update.

2019-03-12 Thread Zhang, Junchao via petsc-users
Maybe you should delete your PETSC_ARCH directory and recompile it?  I tested 
my branch. It should not that easily fail :)

--Junchao Zhang


On Tue, Mar 12, 2019 at 8:20 PM Manuel Valera 
mailto:mvaler...@sdsu.edu>> wrote:
Hi Mr Zhang, thanks for your reply,

I just checked your branch out, reconfigured and recompiled and i am still 
getting the same error from my last email (null argument, when expected a valid 
pointer), do you have any idea why this can be happening?

Thanks so much,

Manuel

On Tue, Mar 12, 2019 at 6:09 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Manuel,
I was working on a branch to revert the VecScatterCreate to 
VecScatterCreateWithData change. The change broke PETSc API and I think we do 
not need it. I had planed to do a pull request after my another PR is merged.
But since it already affects you,  you can try this branch now, which is 
jczhang/fix-vecscattercreate-api

Thanks.
--Junchao Zhang


On Tue, Mar 12, 2019 at 5:58 PM Jed Brown via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Did you just update to 'master'?  See VecScatter changes:

https://www.mcs.anl.gov/petsc/documentation/changes/dev.html

Manuel Valera via petsc-users 
mailto:petsc-users@mcs.anl.gov>> writes:

> Hello,
>
> I just updated petsc from the repo to the latest master branch version, and
> a compilation problem popped up, it seems like the variable types are not
> being acknowledged properly, what i have in a minimum working example
> fashion is:
>
> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> USE petscvec
>> USE petscdmda
>> USE petscdm
>> USE petscis
>> USE petscksp
>> IS :: ScalarIS
>> IS :: DummyIS
>> VecScatter :: LargerToSmaller,to0,from0
>> VecScatter :: SmallerToLarger
>> PetscInt, ALLOCATABLE  :: pScalarDA(:), pDummyDA(:)
>> PetscScalar:: rtol
>> Vec:: Vec1
>> Vec:: Vec2
>> ! Create index sets
>> allocate( pScalarDA(0:(gridx-1)*(gridy-1)*(gridz-1)-1) ,
>> pDummyDA(0:(gridx-1)*(gridy-1)*(gridz-1)-1) )
>> iter=0
>> do k=0,gridz-2
>> kplane = k*gridx*gridy
>> do j=0,gridy-2
>> do i=0,gridx-2
>> pScalarDA(iter) = kplane + j*(gridx) + i
>> iter = iter+1
>> enddo
>> enddo
>> enddo
>> pDummyDA = (/ (ind, ind=0,((gridx-1)*(gridy-1)*(gridz-1))-1) /)
>> call
>> ISCreateGeneral(PETSC_COMM_WORLD,(gridx-1)*(gridy-1)*(gridz-1), &
>>
>>  pScalarDA,PETSC_COPY_VALUES,ScalarIS,ierr)
>> call
>> ISCreateGeneral(PETSC_COMM_WORLD,(gridx-1)*(gridy-1)*(gridz-1), &
>>
>>  pDummyDA,PETSC_COPY_VALUES,DummyIS,ierr)
>> deallocate(pScalarDA,pDummyDA, STAT=ierr)
>> ! Create VecScatter contexts: LargerToSmaller & SmallerToLarger
>> call DMDACreateNaturalVector(daScalars,Vec1,ierr)
>> call DMDACreateNaturalVector(daDummy,Vec2,ierr)
>> call
>> VecScatterCreate(Vec1,ScalarIS,Vec2,DummyIS,LargerToSmaller,ierr)
>> call
>> VecScatterCreate(Vec2,DummyIS,Vec1,ScalarIS,SmallerToLarger,ierr)
>> call VecDestroy(Vec1,ierr)
>> call VecDestroy(Vec2,ierr)
>
>
> And the error i get is the part i cannot really understand:
>
> matrixobjs.f90:99.34:
>> call
>> VecScatterCreate(Vec1,ScalarIS,Vec2,DummyIS,LargerToSmaller,ie
>>  1
>> Error: Type mismatch in argument 'a' at (1); passed TYPE(tvec) to
>> INTEGER(4)
>> matrixobjs.f90:100.34:
>> call
>> VecScatterCreate(Vec2,DummyIS,Vec1,ScalarIS,SmallerToLarger,ie
>>  1
>> Error: Type mismatch in argument 'a' at (1); passed TYPE(tvec) to
>> INTEGER(4)
>> make[1]: *** [matrixobjs.o] Error 1
>> make[1]: Leaving directory `/usr/scratch/valera/ParGCCOM-Master/Src'
>> make: *** [gcmSeamount] Error 2
>
>
> What i find hard to understand is why/where my code is finding an integer
> type? as you can see from the MWE header the variables types look correct,
>
> Any help is appreaciated,
>
> Thanks,


Re: [petsc-users] PetscScatterCreate type mismatch after update.

2019-03-12 Thread Zhang, Junchao via petsc-users
Manuel,
I was working on a branch to revert the VecScatterCreate to 
VecScatterCreateWithData change. The change broke PETSc API and I think we do 
not need it. I had planed to do a pull request after my another PR is merged.
But since it already affects you,  you can try this branch now, which is 
jczhang/fix-vecscattercreate-api

Thanks.
--Junchao Zhang


On Tue, Mar 12, 2019 at 5:58 PM Jed Brown via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Did you just update to 'master'?  See VecScatter changes:

https://www.mcs.anl.gov/petsc/documentation/changes/dev.html

Manuel Valera via petsc-users 
mailto:petsc-users@mcs.anl.gov>> writes:

> Hello,
>
> I just updated petsc from the repo to the latest master branch version, and
> a compilation problem popped up, it seems like the variable types are not
> being acknowledged properly, what i have in a minimum working example
> fashion is:
>
> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> USE petscvec
>> USE petscdmda
>> USE petscdm
>> USE petscis
>> USE petscksp
>> IS :: ScalarIS
>> IS :: DummyIS
>> VecScatter :: LargerToSmaller,to0,from0
>> VecScatter :: SmallerToLarger
>> PetscInt, ALLOCATABLE  :: pScalarDA(:), pDummyDA(:)
>> PetscScalar:: rtol
>> Vec:: Vec1
>> Vec:: Vec2
>> ! Create index sets
>> allocate( pScalarDA(0:(gridx-1)*(gridy-1)*(gridz-1)-1) ,
>> pDummyDA(0:(gridx-1)*(gridy-1)*(gridz-1)-1) )
>> iter=0
>> do k=0,gridz-2
>> kplane = k*gridx*gridy
>> do j=0,gridy-2
>> do i=0,gridx-2
>> pScalarDA(iter) = kplane + j*(gridx) + i
>> iter = iter+1
>> enddo
>> enddo
>> enddo
>> pDummyDA = (/ (ind, ind=0,((gridx-1)*(gridy-1)*(gridz-1))-1) /)
>> call
>> ISCreateGeneral(PETSC_COMM_WORLD,(gridx-1)*(gridy-1)*(gridz-1), &
>>
>>  pScalarDA,PETSC_COPY_VALUES,ScalarIS,ierr)
>> call
>> ISCreateGeneral(PETSC_COMM_WORLD,(gridx-1)*(gridy-1)*(gridz-1), &
>>
>>  pDummyDA,PETSC_COPY_VALUES,DummyIS,ierr)
>> deallocate(pScalarDA,pDummyDA, STAT=ierr)
>> ! Create VecScatter contexts: LargerToSmaller & SmallerToLarger
>> call DMDACreateNaturalVector(daScalars,Vec1,ierr)
>> call DMDACreateNaturalVector(daDummy,Vec2,ierr)
>> call
>> VecScatterCreate(Vec1,ScalarIS,Vec2,DummyIS,LargerToSmaller,ierr)
>> call
>> VecScatterCreate(Vec2,DummyIS,Vec1,ScalarIS,SmallerToLarger,ierr)
>> call VecDestroy(Vec1,ierr)
>> call VecDestroy(Vec2,ierr)
>
>
> And the error i get is the part i cannot really understand:
>
> matrixobjs.f90:99.34:
>> call
>> VecScatterCreate(Vec1,ScalarIS,Vec2,DummyIS,LargerToSmaller,ie
>>  1
>> Error: Type mismatch in argument 'a' at (1); passed TYPE(tvec) to
>> INTEGER(4)
>> matrixobjs.f90:100.34:
>> call
>> VecScatterCreate(Vec2,DummyIS,Vec1,ScalarIS,SmallerToLarger,ie
>>  1
>> Error: Type mismatch in argument 'a' at (1); passed TYPE(tvec) to
>> INTEGER(4)
>> make[1]: *** [matrixobjs.o] Error 1
>> make[1]: Leaving directory `/usr/scratch/valera/ParGCCOM-Master/Src'
>> make: *** [gcmSeamount] Error 2
>
>
> What i find hard to understand is why/where my code is finding an integer
> type? as you can see from the MWE header the variables types look correct,
>
> Any help is appreaciated,
>
> Thanks,


Re: [petsc-users] PCFieldSplit with MatNest

2019-03-12 Thread Zhang, Junchao via petsc-users
Hi, Manuel,
  I recently fixed a problem in VecRestoreArrayRead. Basically, I added 
VecRestoreArrayRead_Nest. Could you try the master branch of PETSc to see if it 
fixes your problem?
  Thanks.

--Junchao Zhang


On Mon, Mar 11, 2019 at 6:56 AM Manuel Colera Rico via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I need to solve a 2*2 block linear system. The matrices A_00, A_01,
A_10, A_11 are constructed separately via MatCreateSeqAIJWithArrays and
MatCreateSeqSBAIJWithArrays. Then, I construct the full system matrix
with MatCreateNest, and use MatNestGetISs and PCFieldSplitSetIS to set
up the PC, trying to follow the procedure described here:
https://www.mcs.anl.gov/petsc/petsc-current/src/snes/examples/tutorials/ex70.c.html.

However, when I run the code with Leak Sanitizer, I get the following error:

=
==54927==ERROR: AddressSanitizer: attempting free on address which was
not malloc()-ed: 0x62751ab8 in thread T0
 #0 0x7fbd95c08f30 in __interceptor_free
../../../../gcc-8.1.0/libsanitizer/asan/asan_malloc_linux.cc:66
 #1 0x7fbd92b99dcd in PetscFreeAlign
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x146dcd)
 #2 0x7fbd92ce0178 in VecRestoreArray_Nest
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x28d178)
 #3 0x7fbd92cd627d in VecRestoreArrayRead
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x28327d)
 #4 0x7fbd92d1189e in VecScatterBegin_SSToSS
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x2be89e)
 #5 0x7fbd92d1a414 in VecScatterBegin
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x2c7414)
 #6 0x7fbd934a999c in PCApply_FieldSplit
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa5699c)
 #7 0x7fbd93369071 in PCApply
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x916071)
 #8 0x7fbd934efe77 in KSPInitialResidual
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa9ce77)
 #9 0x7fbd9350272c in KSPSolve_GMRES
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xaaf72c)
 #10 0x7fbd934e3c01 in KSPSolve
(/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa90c01)

Disabling Leak Sanitizer also outputs an "invalid pointer" error.

Did I forget something when writing the code?

Thank you,

Manuel

---



Re: [petsc-users] Compute the sum of the absolute values of the off-block diagonal entries of each row

2019-03-04 Thread Zhang, Junchao via petsc-users


On Mon, Mar 4, 2019 at 10:39 AM Matthew Knepley via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
On Mon, Mar 4, 2019 at 11:28 AM Cyrill Vonplanta via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Dear Petsc Users,

I am trying to implement a variant of the $l^1$-Gauss-Seidel smoother from 
https://doi.org/10.1137/100798806 (eq. 6.1 and below). One of the main issues 
is that I need to compute the sum  $\sum_j |a_{i_j}|$ of the matrix entries 
that are not part of the local diagonal block. I was looking for something like 
MatGetRowSumAbs but it looks like it hasn't been made yet.

I guess i have to come up with something myself, but would you know of some 
workaround for this without going too deep into PETCs?

MatGetOwnershipRange(A, , );
for (r = rS; r < rE; ++r) {
  sum = 0.0;
  MatGetRow(A, r, , , );
  for (c = 0; c < ncols; ++c) if ((cols[c] < rS) || (cols[c] >= rE)) sum += 
PetscAbsScalar(vals[c]);
}
Perhaps PETSc should have a MatGetRemoteRow (or MatGetRowOffDiagonalBlock) (A, 
r, , , ).  MatGetRow() internally has to allocate memory and 
sort indices and values from local diagonal block and off-diagonal block. It is 
totally a waste in this case -- users do not care column indices and the local 
block.  With MatGetRemoteRow(A, r, , NULL, ), PETSc just needs to 
set an integer and a pointer.


  Thanks,

 Matt

Best Cyrill
--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


Re: [petsc-users] Problem in loading Matrix Market format

2019-02-28 Thread Zhang, Junchao via petsc-users
Eda,
  An update to ex72 is merged to PETSc master branch just now. It now can read 
matrices either symmetric or non-symmetric in Matrix Market format, and output 
a petsc binary matrix in MATSBAIJ format (for symmetric) or MATAIJ format (for 
non-symmetric). See help in source code for usage.
--Junchao Zhang


On Tue, Feb 12, 2019 at 1:50 AM Eda Oktay via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am trying to load matrix in Matrix Market format. I found an example on mat 
file  (ex78) whih can be tested by using .dat file. Since .dat file and .mtx 
file are similar  in structure (specially afiro_A.dat file is similar to 
amesos2_test_mat0.mtx since they both have 3 columns and the columns represent 
the same properties), I tried to run ex78 by using amesos2_test_mat0.mtx 
instead of afiro_A.dat. However, I got the error "Badly formatted input file". 
Here is the full error message:

[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Badly formatted input file

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
[0]PETSC ERROR: ./ex78 on a arch-linux2-c-debug named 
7330.wls.metu.edu.tr by edaoktay Tue Feb 12 
10:47:58 2019
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
--with-fc=gfortran --with-cxx-dialect=C++11 --download-openblas 
--download-metis --download-parmetis --download-superlu_dist --download-slepc 
--download-mpich
[0]PETSC ERROR: #1 main() line 73 in 
/home/edaoktay/petsc-3.10.3/src/mat/examples/tests/ex78.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -Ain 
/home/edaoktay/petsc-3.10.3/share/petsc/datafiles/matrices/amesos2_test_mat0.mtx
[0]PETSC ERROR: End of Error Message ---send entire error 
message to petsc-ma...@mcs.anl.gov--
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor

I know there is also an example (ex72) for Matrix Market format but in 
description, it is only proper for symmmetric and lower triangle, so I decided 
to use ex78.

Best regards,

Eda


Re: [petsc-users] AddressSanitizer: attempting free on address which was not malloc()-ed

2019-02-27 Thread Zhang, Junchao via petsc-users
Try the following to see if you can catch the bug easily: 1) Get error code for 
each petsc function and check it with CHKERRQ; 2) Link your code with a petsc 
library with debugging enabled (configured with --with-debugging=1); 3) Run 
your code with valgrind

--Junchao Zhang


On Wed, Feb 27, 2019 at 9:04 PM Yuyun Yang 
mailto:yyan...@stanford.edu>> wrote:
Hi Junchao,

This code actually involves a lot of classes and is pretty big. Might be an 
overkill for me to send everything to you. I'd like to know if I see this sort 
of error message, which points to this domain file, is it possible that the 
problem happens in another file (whose operations are linked to this one)? If 
so, I'll debug a little more and maybe send you more useful information later.

Best regards,
Yuyun

Get Outlook for iOS<https://aka.ms/o0ukef>
____
From: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>
Sent: Wednesday, February 27, 2019 6:24:13 PM
To: Yuyun Yang
Cc: Matthew Knepley; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
was not malloc()-ed

Could you provide a compilable and runnable test so I can try it?
--Junchao Zhang


On Wed, Feb 27, 2019 at 7:34 PM Yuyun Yang 
mailto:yyan...@stanford.edu>> wrote:
Thanks, I fixed that, but I’m not actually calling the testScatters() function 
in my implementation (in the constructor, the only functions I called are 
setFields and setScatters). So the problem couldn’t have been that?

Best,
Yuyun

From: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>
Sent: Wednesday, February 27, 2019 10:50 AM
To: Yuyun Yang mailto:yyan...@stanford.edu>>
Cc: Matthew Knepley mailto:knep...@gmail.com>>; 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
was not malloc()-ed


On Wed, Feb 27, 2019 at 10:41 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I called VecDestroy() in the destructor for this object – is that not the right 
way to do it?
In Domain::testScatters(), you have many VecDuplicate(,), You need to 
VecDestroy() before doing new VecDuplicate(,);
How do I implement CHECK ALL RETURN CODES?
For each PETSc function, do ierr = ...;  CHKERRQ(ierr);

From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Wednesday, February 27, 2019 7:24 AM
To: Yuyun Yang mailto:yyan...@stanford.edu>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
was not malloc()-ed

You call VecDuplicate() a bunch, but VecDestroy() only once in the bottom 
function. This is wrong.
Also, CHECK ALL RETURN CODES. This is the fastest way to find errors.

   Matt

On Wed, Feb 27, 2019 at 2:06 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello team,

I ran into the address sanitizer error that I hope you could help me with. I 
don’t really know what’s wrong with the way the code frees memory. The relevant 
code file is attached. The line number following domain.cpp specifically 
referenced to the vector _q, which seems a little odd, since some other vectors 
are constructed and freed the same way.

==1719==ERROR: AddressSanitizer: attempting free on address which was not 
malloc()-ed: 0x61f076c0 in thread T0
#0 0x7fbf195282ca in __interceptor_free 
(/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x7fbf1706f895 in PetscFreeAlign 
/home/yyy910805/petsc/src/sys/memory/mal.c:87
#2 0x7fbf1731a898 in VecDestroy_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec2.c:788
#3 0x7fbf1735f795 in VecDestroy 
/home/yyy910805/petsc/src/vec/vec/interface/vector.c:408
#4 0x40dd0a in Domain::~Domain() 
/home/yyy910805/scycle/source/domain.cpp:132
#5 0x40b479 in main /home/yyy910805/scycle/source/main.cpp:242
#6 0x7fbf14d2082f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#7 0x4075d8 in _start (/home/yyy910805/scycle/source/main+0x4075d8)

0x61f076c0 is located 1600 bytes inside of 3220-byte region 
[0x61f07080,0x61f07d14)
allocated by thread T0 here:
#0 0x7fbf19528b32 in __interceptor_memalign 
(/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98b32)
#1 0x7fbf1706f7e0 in PetscMallocAlign 
/home/yyy910805/petsc/src/sys/memory/mal.c:41
#2 0x7fbf17073022 in PetscTrMallocDefault 
/home/yyy910805/petsc/src/sys/memory/mtr.c:183
#3 0x7fbf170710a1 in PetscMallocA 
/home/yyy910805/petsc/src/sys/memory/mal.c:397
#4 0x7fbf17326fb0 in VecCreate_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec3.c:35
#5 0x7fbf1736f560 in VecSetType 
/home/yyy910805/petsc/src/vec/vec/interface/vecreg.c:51
#6 0x7fbf1731afae in VecDuplicate_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec2.c:807
#7 0x7fbf1735eff7 in VecDuplicate 
/home/yyy910805/petsc/src/vec/vec/i

Re: [petsc-users] AddressSanitizer: attempting free on address which was not malloc()-ed

2019-02-27 Thread Zhang, Junchao via petsc-users
Could you provide a compilable and runnable test so I can try it?
--Junchao Zhang


On Wed, Feb 27, 2019 at 7:34 PM Yuyun Yang 
mailto:yyan...@stanford.edu>> wrote:
Thanks, I fixed that, but I’m not actually calling the testScatters() function 
in my implementation (in the constructor, the only functions I called are 
setFields and setScatters). So the problem couldn’t have been that?

Best,
Yuyun

From: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>
Sent: Wednesday, February 27, 2019 10:50 AM
To: Yuyun Yang mailto:yyan...@stanford.edu>>
Cc: Matthew Knepley mailto:knep...@gmail.com>>; 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
was not malloc()-ed


On Wed, Feb 27, 2019 at 10:41 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I called VecDestroy() in the destructor for this object – is that not the right 
way to do it?
In Domain::testScatters(), you have many VecDuplicate(,), You need to 
VecDestroy() before doing new VecDuplicate(,);
How do I implement CHECK ALL RETURN CODES?
For each PETSc function, do ierr = ...;  CHKERRQ(ierr);

From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Wednesday, February 27, 2019 7:24 AM
To: Yuyun Yang mailto:yyan...@stanford.edu>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
was not malloc()-ed

You call VecDuplicate() a bunch, but VecDestroy() only once in the bottom 
function. This is wrong.
Also, CHECK ALL RETURN CODES. This is the fastest way to find errors.

   Matt

On Wed, Feb 27, 2019 at 2:06 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello team,

I ran into the address sanitizer error that I hope you could help me with. I 
don’t really know what’s wrong with the way the code frees memory. The relevant 
code file is attached. The line number following domain.cpp specifically 
referenced to the vector _q, which seems a little odd, since some other vectors 
are constructed and freed the same way.

==1719==ERROR: AddressSanitizer: attempting free on address which was not 
malloc()-ed: 0x61f076c0 in thread T0
#0 0x7fbf195282ca in __interceptor_free 
(/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x7fbf1706f895 in PetscFreeAlign 
/home/yyy910805/petsc/src/sys/memory/mal.c:87
#2 0x7fbf1731a898 in VecDestroy_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec2.c:788
#3 0x7fbf1735f795 in VecDestroy 
/home/yyy910805/petsc/src/vec/vec/interface/vector.c:408
#4 0x40dd0a in Domain::~Domain() 
/home/yyy910805/scycle/source/domain.cpp:132
#5 0x40b479 in main /home/yyy910805/scycle/source/main.cpp:242
#6 0x7fbf14d2082f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#7 0x4075d8 in _start (/home/yyy910805/scycle/source/main+0x4075d8)

0x61f076c0 is located 1600 bytes inside of 3220-byte region 
[0x61f07080,0x61f07d14)
allocated by thread T0 here:
#0 0x7fbf19528b32 in __interceptor_memalign 
(/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98b32)
#1 0x7fbf1706f7e0 in PetscMallocAlign 
/home/yyy910805/petsc/src/sys/memory/mal.c:41
#2 0x7fbf17073022 in PetscTrMallocDefault 
/home/yyy910805/petsc/src/sys/memory/mtr.c:183
#3 0x7fbf170710a1 in PetscMallocA 
/home/yyy910805/petsc/src/sys/memory/mal.c:397
#4 0x7fbf17326fb0 in VecCreate_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec3.c:35
#5 0x7fbf1736f560 in VecSetType 
/home/yyy910805/petsc/src/vec/vec/interface/vecreg.c:51
#6 0x7fbf1731afae in VecDuplicate_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec2.c:807
#7 0x7fbf1735eff7 in VecDuplicate 
/home/yyy910805/petsc/src/vec/vec/interface/vector.c:379
#8 0x4130de in Domain::setFields() 
/home/yyy910805/scycle/source/domain.cpp:431
#9 0x40c60a in Domain::Domain(char const*) 
/home/yyy910805/scycle/source/domain.cpp:57
#10 0x40b433 in main /home/yyy910805/scycle/source/main.cpp:242
#11 0x7fbf14d2082f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

SUMMARY: AddressSanitizer: bad-free ??:0 __interceptor_free
==1719==ABORTING

Thanks very much!
Yuyun


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-users] Direct PETSc to use MCDRAM on KNL and other optimizations for KNL

2019-02-27 Thread Zhang, Junchao via petsc-users



On Wed, Feb 27, 2019 at 7:03 PM Sajid Ali 
mailto:sajidsyed2...@u.northwestern.edu>> 
wrote:

Hi Junchao,

I’m confused with the syntax. If I submit the following as my job script, I get 
an error :

#!/bin/bash
#SBATCH --job-name=petsc_test
#SBATCH -N 1
#SBATCH -C knl,quad,flat
#SBATCH -p apsxrmd
#SBATCH --time=1:00:00

module load intel/18.0.3-d6gtsxs
module load intel-parallel-studio/cluster.2018.3-xvnfrfz
module load numactl-2.0.12-intel-18.0.3-wh44iog
srun -n 64 -c 64 --cpu_bind=cores numactl -m 1 aps ./ex_modify -ts_type cn 
-prop_steps 25 -pc_type gamg -ts_monitor -log_view


The error is :
srun: cluster configuration lacks support for cpu binding

This cluster does not support cpu binding.  You need to remove 
--cpu_bind=cores. In addition, I don't know what is the 'aps' argument


srun: error: Unable to create step for job 916208: More processors requested 
than permitted

I remember the product of -n -c has to be 256.  You can try srun -n 64 -c 4 
numactl -m 1 ./ex_modify ...


I’m following the advice as given at slide 33 of 
https://www.nersc.gov/assets/Uploads/02-using-cori-knl-nodes-20170609.pdf

For further info, I’m using LCRC at ANL.

Thank You,
Sajid Ali
Applied Physics
Northwestern University


Re: [petsc-users] Direct PETSc to use MCDRAM on KNL and other optimizations for KNL

2019-02-27 Thread Zhang, Junchao via petsc-users
Use srun  numactl -m 1 ./app OR srun  numactl -p 1 
./app
See bottom of 
https://www.nersc.gov/users/computational-systems/cori/configuration/knl-processor-modes/

--Junchao Zhang


On Wed, Feb 27, 2019 at 4:16 PM Sajid Ali via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

I ran a TS integrator for 25 steps on a Broadwell-Xeon and Xeon-Phi (KNL). The 
problem size is 5000x5000 and I'm using scalar=complex.

The program takes 125 seconds to run on Xeon and 451 seconds on KNL !

The first thing I want to change is to convert the memory access for the 
program on KNL from DRAM to MCDRAM. I did run the problem in an interactive 
SLURM job and specified -C quad,flat and yet I see DRAM is being used.

I'm attaching the PETSc log files and Intel APS reports as well. Any help on 
how I should change my runtime parameters on KNL will be highly appreciated. 
Thanks in advance.

--
Sajid Ali
Applied Physics
Northwestern University


Re: [petsc-users] AddressSanitizer: attempting free on address which was not malloc()-ed

2019-02-27 Thread Zhang, Junchao via petsc-users

On Wed, Feb 27, 2019 at 10:41 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I called VecDestroy() in the destructor for this object – is that not the right 
way to do it?
In Domain::testScatters(), you have many VecDuplicate(,), You need to 
VecDestroy() before doing new VecDuplicate(,);
How do I implement CHECK ALL RETURN CODES?
For each PETSc function, do ierr = ...;  CHKERRQ(ierr);

From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Wednesday, February 27, 2019 7:24 AM
To: Yuyun Yang mailto:yyan...@stanford.edu>>
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
was not malloc()-ed

You call VecDuplicate() a bunch, but VecDestroy() only once in the bottom 
function. This is wrong.
Also, CHECK ALL RETURN CODES. This is the fastest way to find errors.

   Matt

On Wed, Feb 27, 2019 at 2:06 AM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello team,

I ran into the address sanitizer error that I hope you could help me with. I 
don’t really know what’s wrong with the way the code frees memory. The relevant 
code file is attached. The line number following domain.cpp specifically 
referenced to the vector _q, which seems a little odd, since some other vectors 
are constructed and freed the same way.

==1719==ERROR: AddressSanitizer: attempting free on address which was not 
malloc()-ed: 0x61f076c0 in thread T0
#0 0x7fbf195282ca in __interceptor_free 
(/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x7fbf1706f895 in PetscFreeAlign 
/home/yyy910805/petsc/src/sys/memory/mal.c:87
#2 0x7fbf1731a898 in VecDestroy_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec2.c:788
#3 0x7fbf1735f795 in VecDestroy 
/home/yyy910805/petsc/src/vec/vec/interface/vector.c:408
#4 0x40dd0a in Domain::~Domain() 
/home/yyy910805/scycle/source/domain.cpp:132
#5 0x40b479 in main /home/yyy910805/scycle/source/main.cpp:242
#6 0x7fbf14d2082f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#7 0x4075d8 in _start (/home/yyy910805/scycle/source/main+0x4075d8)

0x61f076c0 is located 1600 bytes inside of 3220-byte region 
[0x61f07080,0x61f07d14)
allocated by thread T0 here:
#0 0x7fbf19528b32 in __interceptor_memalign 
(/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98b32)
#1 0x7fbf1706f7e0 in PetscMallocAlign 
/home/yyy910805/petsc/src/sys/memory/mal.c:41
#2 0x7fbf17073022 in PetscTrMallocDefault 
/home/yyy910805/petsc/src/sys/memory/mtr.c:183
#3 0x7fbf170710a1 in PetscMallocA 
/home/yyy910805/petsc/src/sys/memory/mal.c:397
#4 0x7fbf17326fb0 in VecCreate_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec3.c:35
#5 0x7fbf1736f560 in VecSetType 
/home/yyy910805/petsc/src/vec/vec/interface/vecreg.c:51
#6 0x7fbf1731afae in VecDuplicate_Seq 
/home/yyy910805/petsc/src/vec/vec/impls/seq/bvec2.c:807
#7 0x7fbf1735eff7 in VecDuplicate 
/home/yyy910805/petsc/src/vec/vec/interface/vector.c:379
#8 0x4130de in Domain::setFields() 
/home/yyy910805/scycle/source/domain.cpp:431
#9 0x40c60a in Domain::Domain(char const*) 
/home/yyy910805/scycle/source/domain.cpp:57
#10 0x40b433 in main /home/yyy910805/scycle/source/main.cpp:242
#11 0x7fbf14d2082f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

SUMMARY: AddressSanitizer: bad-free ??:0 __interceptor_free
==1719==ABORTING

Thanks very much!
Yuyun


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


Re: [petsc-users] Problem in loading Matrix Market format

2019-02-12 Thread Zhang, Junchao via petsc-users
Sure.
--Junchao Zhang


On Tue, Feb 12, 2019 at 9:47 AM Matthew Knepley 
mailto:knep...@gmail.com>> wrote:
Hi Junchao,

Could you fix the MM example in PETSc to have this full support? That way we 
will always have it.

 Thanks,

Matt

On Tue, Feb 12, 2019 at 10:27 AM Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Eda,
  I have a code that can read in Matrix Market and write out PETSc binary 
files.  Usage:  mpirun -n 1 ./mm2petsc -fin  -fout .  You can 
have a try.
--Junchao Zhang


On Tue, Feb 12, 2019 at 1:50 AM Eda Oktay via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am trying to load matrix in Matrix Market format. I found an example on mat 
file  (ex78) whih can be tested by using .dat file. Since .dat file and .mtx 
file are similar  in structure (specially afiro_A.dat file is similar to 
amesos2_test_mat0.mtx since they both have 3 columns and the columns represent 
the same properties), I tried to run ex78 by using amesos2_test_mat0.mtx 
instead of afiro_A.dat. However, I got the error "Badly formatted input file". 
Here is the full error message:

[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Badly formatted input file

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
[0]PETSC ERROR: ./ex78 on a arch-linux2-c-debug named 
7330.wls.metu.edu.tr<http://7330.wls.metu.edu.tr> by edaoktay Tue Feb 12 
10:47:58 2019
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
--with-fc=gfortran --with-cxx-dialect=C++11 --download-openblas 
--download-metis --download-parmetis --download-superlu_dist --download-slepc 
--download-mpich
[0]PETSC ERROR: #1 main() line 73 in 
/home/edaoktay/petsc-3.10.3/src/mat/examples/tests/ex78.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -Ain 
/home/edaoktay/petsc-3.10.3/share/petsc/datafiles/matrices/amesos2_test_mat0.mtx
[0]PETSC ERROR: End of Error Message ---send entire error 
message to petsc-ma...@mcs.anl.gov--
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor

I know there is also an example (ex72) for Matrix Market format but in 
description, it is only proper for symmmetric and lower triangle, so I decided 
to use ex78.

Best regards,

Eda


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-users] Problem in loading Matrix Market format

2019-02-12 Thread Zhang, Junchao via petsc-users
Eda,
  I have a code that can read in Matrix Market and write out PETSc binary 
files.  Usage:  mpirun -n 1 ./mm2petsc -fin  -fout .  You can 
have a try.
--Junchao Zhang


On Tue, Feb 12, 2019 at 1:50 AM Eda Oktay via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am trying to load matrix in Matrix Market format. I found an example on mat 
file  (ex78) whih can be tested by using .dat file. Since .dat file and .mtx 
file are similar  in structure (specially afiro_A.dat file is similar to 
amesos2_test_mat0.mtx since they both have 3 columns and the columns represent 
the same properties), I tried to run ex78 by using amesos2_test_mat0.mtx 
instead of afiro_A.dat. However, I got the error "Badly formatted input file". 
Here is the full error message:

[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Badly formatted input file

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
[0]PETSC ERROR: ./ex78 on a arch-linux2-c-debug named 
7330.wls.metu.edu.tr by edaoktay Tue Feb 12 
10:47:58 2019
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
--with-fc=gfortran --with-cxx-dialect=C++11 --download-openblas 
--download-metis --download-parmetis --download-superlu_dist --download-slepc 
--download-mpich
[0]PETSC ERROR: #1 main() line 73 in 
/home/edaoktay/petsc-3.10.3/src/mat/examples/tests/ex78.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -Ain 
/home/edaoktay/petsc-3.10.3/share/petsc/datafiles/matrices/amesos2_test_mat0.mtx
[0]PETSC ERROR: End of Error Message ---send entire error 
message to petsc-ma...@mcs.anl.gov--
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor

I know there is also an example (ex72) for Matrix Market format but in 
description, it is only proper for symmmetric and lower triangle, so I decided 
to use ex78.

Best regards,

Eda


matrixmarket2petsc.tgz
Description: matrixmarket2petsc.tgz


Re: [petsc-users] Slow linear solver via MUMPS

2019-01-26 Thread Zhang, Junchao via petsc-users



On Fri, Jan 25, 2019 at 8:07 PM Mohammad Gohardoust via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

I am trying to modify a "pure MPI" code for solving water movement equation in 
soils which employs KSP iterative solvers. This code gets really slow in the 
hpc I am testing it as I increase the number of calculating nodes (each node 
has 28 cores) even from 1 to 2. I went for implementing "MPI-OpenMP" solutions 
like MUMPS. I did this inside the petsc by:

KSPSetType(ksp, KSPPREONLY);
PCSetType(pc, PCLU);
PCFactorSetMatSolverType(pc, MATSOLVERMUMPS);
KSPSolve(ksp, ...

and I run it through:

export OMP_NUM_THREADS=16 && mpirun -n 2 ~/Programs/my_programs

On some cases, I saw multithreaded MUMPS could improve its performance about 
30%. I guess something was wrong in your tests.  You need to compile MUMPS with 
OpenMP support. If you installed MUMPS through PETSc, you need petsc configure 
option --with-openmp=1. In addition, you need a multithreaded BLAS. You can do 
it through --download-openblas

But first of all, you should add --log_view to report your performance results.
The code is working (in my own PC) but it is too slow (maybe about 50 times 
slower). Since I am not an expert, I like to know is this what I should expect 
from MUMPS!?

Thanks,
Mohammad



Re: [petsc-users] MPI Iterative solver crash on HPC

2019-01-25 Thread Zhang, Junchao via petsc-users
Hi, Sal Am,
 I did some testes with your matrix and vector. It is a complex matrix with 
N=4.7M and nnz=417M. Firstly, I tested on a machine with 36 cores and 128GB 
memory on each compute node. I tried with direct solver and iterative solver 
but both failed. For example, with 36 ranks on one compute node, I got
[9]PETSC ERROR: [9] SuperLU_DIST:pzgssvx line 465 
/blues/gpfs/home/jczhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[9]PETSC ERROR: [9] MatLUFactorNumeric_SuperLU_DIST line 314 
/blues/gpfs/home/jczhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
  With 16 nodes, 576 ranks. I got
SUPERLU_MALLOC fails for GAstore->rnzval[] at line 240 in file 
/blues/gpfs/home/jczhang/petsc/bdw-dbg-complex/externalpackages/git.superlu_dist/SRC/pzutil.c

  Next, I moved to another single-node machine with 1.5TB memory. It did not 
fail this time. It ran overnight and is still doing superlu. Using the top 
command, I found at peak, it consumed almost all memory. At stable period, with 
36 ranks, each rank consumed about 20GB memory. When I changed to iterative 
solvers with -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable, I 
did not meet errors seen on the smaller memory machine. But the residual did 
not converge.
  So, I think the errors you met were simply out of memory error either in 
superlu or in petsc.  If you have machines with large memory, you can try it 
on. Otherwise, I let other petsc developers suggest better iterative solvers to 
you.
  Thanks.

--Junchao Zhang


On Wed, Jan 23, 2019 at 2:52 AM Sal Am 
mailto:tempoho...@gmail.com>> wrote:
Sorry it took long had to see if I could shrink down the problem files from 
50GB to something smaller (now ~10GB).
Can you compress your matrix and upload it to google drive, so we can try to 
reproduce the error.

How I ran the problem: mpiexec valgrind --tool=memcheck 
--suppressions=$HOME/valgrind/valgrind-openmpi.supp -q --num-callers=20 
--log-file=valgrind.log-DS.%p ./solveCSys -malloc off -ksp_type gmres -pc_type 
lu -pc_factor_mat_solver_type superlu_dist -ksp_max_it 1 
-ksp_monitor_true_residual -log_view -ksp_error_if_not_converged

here is the link to matrix A and vector b: 
https://drive.google.com/drive/folders/16YQPTK6TfXC6pV5RMdJ9g7X-ZiqbvwU8?usp=sharing

I redid the problem (twice) by trying to solve a 1M finite elements problem 
corresponding to ~ 4M n and 417M nnz matrix elements on the login shell which 
has ~550GB mem, but it failed. First time it failed because of bus error, 
second time it said killed. I have attached valgrind file from both runs.

OpenMPI is not my favorite. You need to use a suppressions file to get rid of 
all of that noise. Here is one:

Thanks I have been using it, but sometimes I still see same amount of errors.



On Fri, Jan 18, 2019 at 3:12 AM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
Usually when I meet a SEGV error, I will run it again with a parallel debugger 
like DDT and wait for it to segfault, and then examine the stack trace to see 
what is wrong.
Can you compress your matrix and upload it to google drive, so we can try to 
reproduce the error.
--Junchao Zhang


On Thu, Jan 17, 2019 at 10:44 AM Sal Am via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I did two runs, one with the SuperLU_dist and one with bcgs using jacobi, 
attached are the results of one of the reports from valgrind on one random 
processor (out of the 128 files).

DS = direct solver
IS = iterative  solver

There is an awful lot of errors.

how I initiated the two runs:
mpiexec valgrind --tool=memcheck -q --num-callers=20 
--log-file=valgrind.log-IS.%p ./solveCSys -malloc off -ksp_type bcgs -pc_type 
jacobi -mattransposematmult_via scalable -build_twosided allreduce -ksp_monitor 
-log_view

mpiexec valgrind --tool=memcheck -q --num-callers=20 
--log-file=valgrind.log-DS.%p ./solveCSys -malloc off -ksp_type gmres -pc_type 
lu -pc_factor_mat_solver_type superlu_dist -ksp_max_it 1 
-ksp_monitor_true_residual -log_view -ksp_error_if_not_converged

Thank you

On Thu, Jan 17, 2019 at 4:24 PM Matthew Knepley 
mailto:knep...@gmail.com>> wrote:
On Thu, Jan 17, 2019 at 9:18 AM Sal Am 
mailto:tempoho...@gmail.com>> wrote:
1) Running out of memory

2) You passed an invalid array
I have select=4:ncpus=32:mpiprocs=32:mem=300GB in the job script, i.e. using 
300GB/node, a total of 1200GB memory, using 4 nodes and 32 processors per node 
(128 processors in total).
I am not sure what would constitute an invalid array or how I can check that. I 
am using the same procedure as when dealing with the smaller matrix. i.e. 
Generate matrix A and vector b using FEM software then convert the matrix and 
vector using a python script ready for petsc. read in petsc and calculate.

Are you running with 64-bit ints here?
Yes I have it configured petsc with  --with-64-bit-indices and debugging mode, 
which this was run on.

It sounds like you have enough memory, but the fact t

Re: [petsc-users] MPI Iterative solver crash on HPC

2019-01-17 Thread Zhang, Junchao via petsc-users
--
[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch 
system) has told this process to end
[0]PETSC ERROR: [1]PETSC ERROR: 

[2]PETSC ERROR: 

[2]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch 
system) has told this process to end
[3]PETSC ERROR: 

[4]PETSC ERROR: 

[4]PETSC ERROR: [5]PETSC ERROR: [6]PETSC ERROR: 

[8]PETSC ERROR: 

[12]PETSC ERROR: 

[12]PETSC ERROR: [14]PETSC ERROR: 

[14]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch 
system) has told this process to end
--
mpiexec noticed that process rank 10 with PID 0 on node r03n01 exited on signal 
9 (Killed).

Now I was running this with valgrind as someone had previously suggested and 
the 16 files created all contain the same type of error:

Okay, its possible that there are bugs in the MPI implementation. So

1) Try using-build_twosided allreduceon this run

2) Is it possible to get something that fails here but we can run. None of our 
tests show this problem.

  Thanks,

 Matt

==25940== Invalid read of size 8
==25940==at 0x5103326: PetscCheckPointer (checkptr.c:81)
==25940==by 0x4F42058: PetscCommGetNewTag (tagm.c:77)
==25940==by 0x4FC952D: PetscCommBuildTwoSidedFReq_Ibarrier (mpits.c:373)
==25940==by 0x4FCB29B: PetscCommBuildTwoSidedFReq (mpits.c:572)
==25940==by 0x52BBFF4: VecAssemblyBegin_MPI_BTS (pbvec.c:251)
==25940==by 0x52D6B42: VecAssemblyBegin (vector.c:140)
==25940==by 0x5328C97: VecLoad_Binary (vecio.c:141)
==25940==by 0x5329051: VecLoad_Default (vecio.c:516)
==25940==by 0x52E0BAB: VecLoad (vector.c:933)
==25940==by 0x4013D5: main (solveCmplxLinearSys.cpp:31)
==25940==  Address 0x19f807fc is 12 bytes inside a block of size 16 alloc'd
==25940==at 0x4C2A603: memalign (vg_replace_malloc.c:899)
==25940==by 0x4FD0B0E: PetscMallocAlign (mal.c:41)
==25940==by 0x4FD23E7: PetscMallocA (mal.c:397)
==25940==by 0x4FC948E: PetscCommBuildTwoSidedFReq_Ibarrier (mpits.c:371)
==25940==by 0x4FCB29B: PetscCommBuildTwoSidedFReq (mpits.c:572)
==25940==by 0x52BBFF4: VecAssemblyBegin_MPI_BTS (pbvec.c:251)
==25940==by 0x52D6B42: VecAssemblyBegin (vector.c:140)
==25940==by 0x5328C97: VecLoad_Binary (vecio.c:141)
==25940==by 0x5329051: VecLoad_Default (vecio.c:516)
==25940==by 0x52E0BAB: VecLoad (vector.c:933)
==25940==by 0x4013D5: main (solveCmplxLinearSys.cpp:31)
==25940==


On Mon, Jan 14, 2019 at 7:29 PM Zhang, Hong 
mailto:hzh...@mcs.anl.gov>> wrote:
Fande:
According to this PR 
https://bitbucket.org/petsc/petsc/pull-requests/1061/a_selinger-feature-faster-scalable/diff

Should we set the scalable algorithm as default?
Sure, we can. But I feel we need do more tests to compare scalable and 
non-scalable algorithms.
On theory, for small to medium matrices, non-scalable matmatmult() algorithm 
enables more efficient
data accessing. Andreas optimized scalable implementation. Our non-scalable 
implementation might have room to be further optimized.
Hong

On Fri, Jan 11, 2019 at 10:34 AM Zhang, Hong via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Add option '-mattransposematmult_via scalable'
Hong

On Fri, Jan 11, 2019 at 9:52 AM Zhang, Junchao via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I saw the following error message in your first email.
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
Probably the matrix is too large. You can try with more compute nodes, for 
example, use 8 nodes instead of 2, and see what happens.

--Junchao Zhang


On Fri, Jan 11, 2019 at 7:45 AM Sal Am via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Using a larger problem set with 2B non-zero elements and a matrix of 25M x 25M 
I get the following error:
[4]PETSC ERROR: 

[4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[4]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[4]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory 

Re: [petsc-users] MPI Iterative solver crash on HPC

2019-01-11 Thread Zhang, Junchao via petsc-users
I saw the following error message in your first email.
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
Probably the matrix is too large. You can try with more compute nodes, for 
example, use 8 nodes instead of 2, and see what happens.

--Junchao Zhang


On Fri, Jan 11, 2019 at 7:45 AM Sal Am via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Using a larger problem set with 2B non-zero elements and a matrix of 25M x 25M 
I get the following error:
[4]PETSC ERROR: 

[4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[4]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[4]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors
[4]PETSC ERROR: likely location of problem given in stack below
[4]PETSC ERROR: -  Stack Frames 

[4]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[4]PETSC ERROR:   INSTEAD the line number of the start of the function
[4]PETSC ERROR:   is given.
[4]PETSC ERROR: [4] MatCreateSeqAIJWithArrays line 4422 
/lustre/home/vef002/petsc/src/mat/impls/aij/seq/aij.c
[4]PETSC ERROR: [4] MatMatMultSymbolic_SeqAIJ_SeqAIJ line 747 
/lustre/home/vef002/petsc/src/mat/impls/aij/seq/matmatmult.c
[4]PETSC ERROR: [4] MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable line 
1256 /lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
[4]PETSC ERROR: [4] MatTransposeMatMult_MPIAIJ_MPIAIJ line 1156 
/lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
[4]PETSC ERROR: [4] MatTransposeMatMult line 9950 
/lustre/home/vef002/petsc/src/mat/interface/matrix.c
[4]PETSC ERROR: [4] PCGAMGCoarsen_AGG line 871 
/lustre/home/vef002/petsc/src/ksp/pc/impls/gamg/agg.c
[4]PETSC ERROR: [4] PCSetUp_GAMG line 428 
/lustre/home/vef002/petsc/src/ksp/pc/impls/gamg/gamg.c
[4]PETSC ERROR: [4] PCSetUp line 894 
/lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c
[4]PETSC ERROR: [4] KSPSetUp line 304 
/lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c
[4]PETSC ERROR: - Error Message 
--
[4]PETSC ERROR: Signal received
[4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[4]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[4]PETSC ERROR: ./solveCSys on a linux-cumulus-debug named r02g03 by vef002 Fri 
Jan 11 09:13:23 2019
[4]PETSC ERROR: Configure options PETSC_ARCH=linux-cumulus-debug 
--with-cc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicc 
--with-fc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpifort 
--with-cxx=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicxx 
--download-parmetis --download-metis --download-ptscotch 
--download-superlu_dist --download-mumps --with-scalar-type=complex 
--with-debugging=yes --download-scalapack --download-superlu 
--download-fblaslapack=1 --download-cmake
[4]PETSC ERROR: #1 User provided function() line 0 in  unknown file
--
MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
[0]PETSC ERROR: 

[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch 
system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

Using Valgrind on only one of the valgrind files the following error was 
written:

==9053== Invalid read of size 4
==9053==at 0x5B8067E: MatCreateSeqAIJWithArrays (aij.c:4445)
==9053==by 0x5BC2608: MatMatMultSymbolic_SeqAIJ_SeqAIJ (matmatmult.c:790)
==9053==by 0x5D106F8: MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable 
(mpimatmatmult.c:1337)
==9053==by 0x5D0E84E: MatTransposeMatMult_MPIAIJ_MPIAIJ 
(mpimatmatmult.c:1186)
==9053==by 0x5457C57: MatTransposeMatMult (matrix.c:9984)
==9053==by 0x64DD99D: PCGAMGCoarsen_AGG (agg.c:882)
==9053==by 0x64C7527: PCSetUp_GAMG (gamg.c:522)
==9053==by 0x6592AA0: PCSetUp (precon.c:932)
==9053==by 0x66B1267: KSPSetUp (itfunc.c:391)
==9053==by 0x4019A2: main (solveCmplxLinearSys.cpp:68)
==9053==  Address 0x8386997f4 is not stack'd, malloc'd or (recently) free'd
==9053==


On Fri, Jan 11, 2019 

Re: [petsc-users] Dynamically resize the existing PetscVector

2018-12-17 Thread Zhang, Junchao via petsc-users
Or, you can have your own array and then create PETSc vectors with 
VecCreateGhostWithArray, so that the memory resizing is managed by yourself.
--Junchao Zhang


On Mon, Dec 17, 2018 at 9:15 AM Shidi Yan via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,

I am working on adaptive moving mesh problems. Therefore, the petsc
vector size is constantly changing.
The way I am currently dealing with this change is destroy the petsc vector
first with VecDestroy() and create a new vector with VecCreateGhost().
But I think this is not a really efficient way. So I am wondering if there is
any way to resize the existing PetscVector dynamically.

Thank you for your time.

Kind Regards,
Shidi


Re: [petsc-users] MUMPS Error

2018-12-12 Thread Zhang, Junchao via petsc-users


On Wed, Dec 12, 2018 at 7:14 AM Sal Am via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi I am getting an error using MUMPS.
How I run it:
bash-4.2$ mpiexec -n 16 ./solveCSys -ksp_type richardson -pc_type lu 
-pc_factor_mat_solver_type mumps -ksp_max_it 1 -ksp_monitor_true_residual 
-log_view -ksp_error_if_not_converged

The error output:
[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: [1]PETSC ERROR: - Error Message 
--
[1]PETSC ERROR: Error in external library
[1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=0

[1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[6]PETSC ERROR: - Error Message 
--
[6]PETSC ERROR: Error in external library
[6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=0

[6]PETSC ERROR: [8]PETSC ERROR: - Error Message 
--
[8]PETSC ERROR: Error in external library
[8]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=0

[8]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[8]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[8]PETSC ERROR: [9]PETSC ERROR: [15]PETSC ERROR: - Error 
Message --
[15]PETSC ERROR: Error in external library
[15]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=-36536

[15]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[15]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[15]PETSC ERROR: ./solveCSys on a linux-opt named r02g03 by vef002 Wed Dec 12 
10:44:13 2018
[15]PETSC ERROR: Configure options PETSC_ARCH=linux-opt --with-cc=gcc 
--with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-superlu_dist 
--download-mumps --with-scalar-type=complex --with-debugging=no 
--download-scalapack --download-superlu --download-mpich 
--download-fblaslapack=1 --download-cmake
[15]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=-81813

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[1]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[1]PETSC ERROR: ./solveCSys on a linux-opt named r02g03 by vef002 Wed Dec 12 
10:44:13 2018
[1]PETSC ERROR: Configure options PETSC_ARCH=linux-opt --with-cc=gcc 
--with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-superlu_dist 
--download-mumps --with-scalar-type=complex --with-debugging=no 
--download-scalapack --download-superlu --download-mpich 
--download-fblaslapack=1 --download-cmake
[1]PETSC ERROR: [2]PETSC ERROR: - Error Message 
--
[2]PETSC ERROR: Error in external library
[2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=0

[2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[2]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[2]PETSC ERROR: ./solveCSys on a linux-opt named r02g03 by vef002 Wed Dec 12 
10:44:13 2018
[2]PETSC ERROR: Configure options PETSC_ARCH=linux-opt --with-cc=gcc 
--with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-superlu_dist 
--download-mumps --with-scalar-type=complex --with-debugging=no 
--download-scalapack --download-superlu --download-mpich 
--download-fblaslapack=1 --download-cmake
[2]PETSC ERROR: [3]PETSC ERROR: - Error Message 
--
[3]PETSC ERROR: Error in external library
[3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
INFOG(1)=-13, INFO(2)=-28194

[3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[3]PETSC ERROR: Petsc Release Version 3.10.2, unknown
[3]PETSC ERROR: ./solveCSys on a linux-opt named r02g03 by vef002 Wed Dec 12 
10:44:13 2018
[3]PETSC ERROR: Configure options PETSC_ARCH=linux-opt --with-cc=gcc 
--with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-superlu_dist 
--download-mumps --with-scalar-type=complex --with-debugging=no 
--download-scalapack --download-superlu --download-mpich 
--download-fblaslapack=1 --download-cmake
[3]PETSC ERROR: [4]PETSC ERROR: - Error Message 
--
[4]PETSC ERROR: Error in external library
[4]PETSC ERROR: Error 

Re: [petsc-users] Compile petsc using intel mpi

2018-11-27 Thread Zhang, Junchao via petsc-users


On Tue, Nov 27, 2018 at 5:25 AM Edoardo alinovi via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Dear users,

I have installed intel parallel studio on my workstation and  thus I would like 
to take advantage of intel compiler.

Before messing up my installation, have you got some guidelines to survive at 
this attempt? I have found here in the mailing list the following instructions:

--with-cc=icc --with-fc=ifort --with-mpi-include=/path-to-intel 
--with-mpi-lib=/path-to-intel

Are they correct?
I think so. But you may also add PETSC_DIR=/path-to-petsc 
PETSC_ARCH=name-for-this-build


Also I have an already existing and clean installation of petsc using openmpi. 
I would like to retain this installtion since it is working very well and 
switching between the two  somehow. Any tips on this?
Use PETSC_ARCH in PETSc configure to differentiate different builds, and export 
PETSC_ARCH in your environment to select the one you want to use. See more at 
https://www.mcs.anl.gov/petsc/documentation/installation.html


I will never stop to say thank you for your precious support!

Edoardo

--

Edoardo Alinovi, Ph.D.

DICCA, Scuola Politecnica,
Universita' degli Studi di Genova,
1, via Montallegro,
16145 Genova, Italy




Re: [petsc-users] PetscInt overflow

2018-10-19 Thread Zhang, Junchao

On Fri, Oct 19, 2018 at 4:02 AM Jan Grießer 
mailto:griesser@googlemail.com>> wrote:
With more than 1 MPI process you mean i should use spectrum slicing in divide 
the full problem in smaller subproblems?
The --with-64-bit-indices is not a possibility for me since i configured petsc 
with mumps, which does not allow to use the 64-bit version (At least this was 
the error message when i tried to configure PETSc )

MUMPS 5.1.2 manual chapter 2.4.2 says it supports "Selective 64-bit integer 
feature" and "full 64-bit integer version" as well.

Am Mi., 17. Okt. 2018 um 18:24 Uhr schrieb Jose E. Roman 
mailto:jro...@dsic.upv.es>>:
To use BVVECS just add the command-line option -bv_type vecs
This causes to use a separate Vec for each column, instead of a single long Vec 
of size n*m. But it is considerably slower than the default.

Anyway, for such large problems you should consider using more than 1 MPI 
process. In that case the error may disappear because the local size is smaller 
than 768000.

Jose


> El 17 oct 2018, a las 17:58, Matthew Knepley 
> mailto:knep...@gmail.com>> escribió:
>
> On Wed, Oct 17, 2018 at 11:54 AM Jan Grießer 
> mailto:griesser@googlemail.com>> wrote:
> Hi all,
> i am using slepc4py and petsc4py to solve for the smallest real eigenvalues 
> and eigenvectors. For my test cases with a matrix A of the size 30k x 30k 
> solving for the smallest soutions works quite well, but when i increase the 
> dimension of my system to around A = 768000 x 768000 or 3 million x 3 million 
> and ask for the smallest real 3000 (the number is increasing with increasing 
> system size) eigenvalues and eigenvectors i get the output (for the 768000):
>  The product 4001 times 768000 overflows the size of PetscInt; consider 
> reducing the number of columns, or use BVVECS instead
> i understand that the requested number of eigenvectors and eigenvalues is 
> causing an overflow but i do not understand the solution of the problem which 
> is stated in the error message. Can someone tell me what exactly BVVECS is 
> and how i can use it? Or is there any other solution to my problem ?
>
> You can also reconfigure with 64-bit integers: --with-64-bit-indices
>
>   Thanks,
>
> Matt
>
> Thank you very much in advance,
> Jan
>
>
>
> --
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] Failure of MUMPS

2018-10-09 Thread Zhang, Junchao
OK, I found -ksp_error_if_not_converged will trigger PETSc to fail in this case.

--Junchao Zhang


On Tue, Oct 9, 2018 at 3:38 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
I met a case where MUMPS returned an out-of-memory code but PETSc continued to 
run.  When PETSc calls MUMPS, it checks if (A->erroriffailure). I added 
-mat_error_if_failure, but it did not work since it was overwritten by 
MatSetErrorIfFailure(pc->pmat,pc->erroriffailure)
Does it suggest we should add a new option -pc_factor_error_if_failure and 
check it in PCSetFromOptions_Factor()?

--Junchao Zhang

On Fri, Oct 5, 2018 at 8:12 PM Zhang, Hong 
mailto:hzh...@mcs.anl.gov>> wrote:
Mike:
Hello PETSc team:

I am trying to solve a PDE problem with high-order finite elements. The matrix 
is getting denser and my experience is that MUMPS just outperforms iterative 
solvers.

For certain problems, MUMPS just fail in the middle for no clear reason. I just 
wander if there is any suggestion to improve the robustness of MUMPS? Or in 
general, any suggestion for interative solver with very high-order finite 
elements?

What error message do you get when MUMPS fails? Out of memory, zero pivoting, 
or something?
 Hong


Re: [petsc-users] Failure of MUMPS

2018-10-09 Thread Zhang, Junchao
I met a case where MUMPS returned an out-of-memory code but PETSc continued to 
run.  When PETSc calls MUMPS, it checks if (A->erroriffailure). I added 
-mat_error_if_failure, but it did not work since it was overwritten by 
MatSetErrorIfFailure(pc->pmat,pc->erroriffailure)
Does it suggest we should add a new option -pc_factor_error_if_failure and 
check it in PCSetFromOptions_Factor()?

--Junchao Zhang

On Fri, Oct 5, 2018 at 8:12 PM Zhang, Hong 
mailto:hzh...@mcs.anl.gov>> wrote:
Mike:
Hello PETSc team:

I am trying to solve a PDE problem with high-order finite elements. The matrix 
is getting denser and my experience is that MUMPS just outperforms iterative 
solvers.

For certain problems, MUMPS just fail in the middle for no clear reason. I just 
wander if there is any suggestion to improve the robustness of MUMPS? Or in 
general, any suggestion for interative solver with very high-order finite 
elements?

What error message do you get when MUMPS fails? Out of memory, zero pivoting, 
or something?
 Hong