Re: [petsc-users] PetscSF Fortran interface

2022-11-16 Thread Junchao Zhang
Hi, Nicholas,
  I will have a look and get back to you.
  Thanks.
--Junchao Zhang


On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi <
narno...@umich.edu> wrote:

> Hi Petsc Users
>
> I'm in the process of adding some Petsc for mesh management into an
> existing Fortran Solver. It has been relatively straightforward so far but
> I am running into an issue with using PetscSF routines. Some like the
> PetscSFGetGraph work no problem but a few of my routines require the use of
> PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in
> the fortran interface and I just get a linking error. I also don't seem to
> see a PetscSF file in the finclude. Any clarification or assistance would
> be appreciated.
>
>
> Sincerely
> Nicholas
>
> --
> Nicholas Arnold-Medabalimi
>
> Ph.D. Candidate
> Computational Aeroscience Lab
> University of Michigan
>


[petsc-users] PetscSF Fortran interface

2022-11-16 Thread Nicholas Arnold-Medabalimi
Hi Petsc Users

I'm in the process of adding some Petsc for mesh management into an
existing Fortran Solver. It has been relatively straightforward so far but
I am running into an issue with using PetscSF routines. Some like the
PetscSFGetGraph work no problem but a few of my routines require the use of
PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in
the fortran interface and I just get a linking error. I also don't seem to
see a PetscSF file in the finclude. Any clarification or assistance would
be appreciated.


Sincerely
Nicholas

-- 
Nicholas Arnold-Medabalimi

Ph.D. Candidate
Computational Aeroscience Lab
University of Michigan


Re: [petsc-users] Different solution while running in parallel

2022-11-16 Thread Zhang, Hong via petsc-users
Karhik,
Can you find out the condition number of your matrix?
Hong


From: petsc-users  on behalf of Karthikeyan 
Chockalingam - STFC UKRI via petsc-users 
Sent: Wednesday, November 16, 2022 6:04 PM
To: petsc-users@mcs.anl.gov 
Subject: [petsc-users] Different solution while running in parallel


  Hello,



   I tried to solve a (FE discretized) Poisson equation using PCLU. For some 
reason I am getting different solutions while running the problem on one and 
two cores. I have attached the output file (out.txt) from both the runs. I am 
printing A, b and x from both the runs – while A and b are the same but the 
solution seems is different.



I am not sure what I doing wrong.



Below is my matrix, vector, and solve setup.





Mat A;

Vec b, x;



ierr = MatCreate(PETSC_COMM_WORLD, ); CHKERRQ(ierr);

ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr);

ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr);

ierr = MatMPIAIJSetPreallocation(A,d_nz, NULL, o_nz, NULL); CHKERRQ(ierr);

ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr);

ierr = MatCreateVecs(A, , ); CHKERRQ(ierr);



KSP   ksp;

PCpc;

KSPCreate(PETSC_COMM_WORLD, );

KSPSetOperators(ksp, A, A);

ierr = KSPSetType(ksp,KSPPREONLY);CHKERRQ(ierr);

ierr = KSPGetPC(ksp,);CHKERRQ(ierr);

ierr = PCSetType(pc,PCLU);CHKERRQ(ierr);

ierr = PCFactorSetMatSolverType(pc,MATSOLVERMUMPS);CHKERRQ(ierr);

KSPSolve(ksp, b, x);



Thank you for your help.



Karhik.



This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses.


[petsc-users] Different solution while running in parallel

2022-11-16 Thread Karthikeyan Chockalingam - STFC UKRI via petsc-users
  Hello,



   I tried to solve a (FE discretized) Poisson equation using PCLU. For some 
reason I am getting different solutions while running the problem on one and 
two cores. I have attached the output file (out.txt) from both the runs. I am 
printing A, b and x from both the runs – while A and b are the same but the 
solution seems is different.



I am not sure what I doing wrong.



Below is my matrix, vector, and solve setup.





Mat A;
Vec b, x;

ierr = MatCreate(PETSC_COMM_WORLD, ); CHKERRQ(ierr);
ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr);
ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr);
ierr = MatMPIAIJSetPreallocation(A,d_nz, NULL, o_nz, NULL); CHKERRQ(ierr);
ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr);
ierr = MatCreateVecs(A, , ); CHKERRQ(ierr);



KSP   ksp;

PCpc;

KSPCreate(PETSC_COMM_WORLD, );

KSPSetOperators(ksp, A, A);

ierr = KSPSetType(ksp,KSPPREONLY);CHKERRQ(ierr);

ierr = KSPGetPC(ksp,);CHKERRQ(ierr);

ierr = PCSetType(pc,PCLU);CHKERRQ(ierr);

ierr = PCFactorSetMatSolverType(pc,MATSOLVERMUMPS);CHKERRQ(ierr);
KSPSolve(ksp, b, x);

Thank you for your help.

Karhik.


This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses.

MPI initialized with 1 MPI processes
MPI initialized with thread support level 1
AMReX (22.10-20-g3082028e4287) initialized
Mat Object: 1 MPI process
  type: mpiaij
row 0: (0, 1.)  (1, 0.)  (5, 0.)  (6, 0.) 
row 1: (0, 0.)  (1, 1.)  (2, 0.)  (5, 0.)  (6, 0.)  (7, 0.) 
row 2: (1, 0.)  (2, 1.)  (3, 0.)  (6, 0.)  (7, 0.)  (8, 0.) 
row 3: (2, 0.)  (3, 1.)  (4, 0.)  (7, 0.)  (8, 0.)  (9, 0.) 
row 4: (3, 0.)  (4, 1.)  (8, 0.)  (9, 0.) 
row 5: (0, 0.)  (1, 0.)  (5, 1.)  (6, 0.)  (10, 0.)  (11, 0.) 
row 6: (0, 0.)  (1, 0.)  (2, 0.)  (5, 0.)  (6, 10.6667)  (7, -1.3)  (10, 
0.)  (11, -1.3)  (12, -1.3) 
row 7: (1, 0.)  (2, 0.)  (3, 0.)  (6, -1.3)  (7, 10.6667)  (8, -1.3)  
(11, -1.3)  (12, -1.3)  (13, -1.3) 
row 8: (2, 0.)  (3, 0.)  (4, 0.)  (7, -1.3)  (8, 10.6667)  (9, 0.)  (12, 
-1.3)  (13, -1.3)  (14, 0.) 
row 9: (3, 0.)  (4, 0.)  (8, 0.)  (9, 1.)  (13, 0.)  (14, 0.) 
row 10: (5, 0.)  (6, 0.)  (10, 1.)  (11, 0.)  (15, 0.)  (16, 0.) 
row 11: (5, 0.)  (6, -1.3)  (7, -1.3)  (10, 0.)  (11, 10.6667)  (12, 
-1.3)  (15, 0.)  (16, -1.3)  (17, -1.3) 
row 12: (6, -1.3)  (7, -1.3)  (8, -1.3)  (11, -1.3)  (12, 
10.6667)  (13, -1.3)  (16, -1.3)  (17, -1.3)  (18, -1.3) 
row 13: (7, -1.3)  (8, -1.3)  (9, 0.)  (12, -1.3)  (13, 10.6667)  
(14, 0.)  (17, -1.3)  (18, -1.3)  (19, 0.) 
row 14: (8, 0.)  (9, 0.)  (13, 0.)  (14, 1.)  (18, 0.)  (19, 0.) 
row 15: (10, 0.)  (11, 0.)  (15, 1.)  (16, 0.)  (20, 0.)  (21, 0.) 
row 16: (10, 0.)  (11, -1.3)  (12, -1.3)  (15, 0.)  (16, 10.6667)  (17, 
-1.3)  (20, 0.)  (21, 0.)  (22, 0.) 
row 17: (11, -1.3)  (12, -1.3)  (13, -1.3)  (16, -1.3)  (17, 
10.6667)  (18, -1.3)  (21, 0.)  (22, 0.)  (23, 0.) 
row 18: (12, -1.3)  (13, -1.3)  (14, 0.)  (17, -1.3)  (18, 10.6667) 
 (19, 0.)  (22, 0.)  (23, 0.)  (24, 0.) 
row 19: (13, 0.)  (14, 0.)  (18, 0.)  (19, 1.)  (23, 0.)  (24, 0.) 
row 20: (15, 0.)  (16, 0.)  (20, 1.)  (21, 0.) 
row 21: (15, 0.)  (16, 0.)  (17, 0.)  (20, 0.)  (21, 1.)  (22, 0.) 
row 22: (16, 0.)  (17, 0.)  (18, 0.)  (21, 0.)  (22, 1.)  (23, 0.) 
row 23: (17, 0.)  (18, 0.)  (19, 0.)  (22, 0.)  (23, 1.)  (24, 0.) 
row 24: (18, 0.)  (19, 0.)  (23, 0.)  (24, 1.) 
Vec Object: 1 MPI process
  type: seq
0.
0.
0.
0.
0.
0.
4.
4.
4.
0.
0.
4.
4.
4.
0.
0.
4.
4.
4.
0.
0.
0.
0.
0.
0.
Vec Object: 1 MPI process
  type: seq
0.
0.
0.
0.
0.
0.
0.771429
0.964286
0.771429
0.
0.
0.964286
1.24286
0.964286
0.
0.
0.771429
0.964286
0.771429
0.
0.
0.
0.
0.
0.
Unused ParmParse Variables:
  [TOP]::model.type(nvals = 1)  :: [3]

AMReX (22.10-20-g3082028e4287) finalized





MPI initialized with 2 MPI processes
MPI initialized with thread support level 1
AMReX (22.10-20-g3082028e4287) initialized

Mat Object: 2 MPI processes
  type: mpiaij
row 0: (0, 1.)  (1, 0.)  (5, 0.)  (6, 0.) 
row 1: (0, 0.)  (1, 1.)  (2, 0.)  (5, 0.)  (6, 0.)  (7, 0.) 
row 2: (1, 0.)  (2, 1.)  (3, 0.)  (6, 0.)  (7, 0.)  (8, 0.) 
row 3: (2, 0.)  (3, 1.)  (4, 0.)  (7, 0.)  (8, 0.)  (9, 0.) 
row 4: (3, 0.)  

Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

2022-11-16 Thread Mark Adams
Junchao, I tried with the Kokkos dev branch and get this with 8 processes
(and the .petscrc file that I sent/appended):

 srun -n8 -N1 --gpus-per-task=1 --gpu-bind=closest ../ex13
-dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2
-dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
-dm_refine 4 -dm_view -log_tracexxx -log_view -dm_mat_type aijkokkos
-dm_vec_type kokkos

DM Object: box 8 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 4913 4913 4913 4913 4913 4913 4913 4913
  Number of 1-cells per rank: 13872 13872 13872 13872 13872 13872 13872
13872
  Number of 2-cells per rank: 13056 13056 13056 13056 13056 13056 13056
13056
  Number of 3-cells per rank: 4096 4096 4096 4096 4096 4096 4096 4096
Labels:
  celltype: 4 strata with value/size (0 (4913), 1 (13872), 4 (13056), 7
(4096))
  depth: 4 strata with value/size (0 (4913), 1 (13872), 2 (13056), 3 (4096))
  marker: 1 strata with value/size (1 (3169))
  Face Sets: 3 strata with value/size (1 (961), 3 (961), 6 (961))
Number equations N = 250047
[5]PETSC ERROR:

[5]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal
memory access
[5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[5]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
https://petsc.org/release/faq/
[5]PETSC ERROR: -  Stack Frames

[5]PETSC ERROR: The line numbers in the error traceback are not always
exact.
[5]PETSC ERROR: #1 MPI function
[5]PETSC ERROR: #2 PetscSFLinkWaitRequests_MPI() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/sfmpi.c:53
[5]PETSC ERROR: #3 PetscSFLinkFinishCommunication() at
/gpfs/alpine/csc314/scratch/adams/petsc/include/../src/vec/is/sf/impls/basic/sfpack.h:277
[5]PETSC ERROR: #4 PetscSFBcastEnd_Basic() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:205
[5]PETSC ERROR: #5 PetscSFBcastEnd() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:1477
[5]PETSC ERROR: #6 DMGlobalToLocalEnd() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:2849
[5]PETSC ERROR: #7 SNESComputeFunction_DMLocal() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/utils/dmlocalsnes.c:65
[5]PETSC ERROR: #8 SNES callback function
[5]PETSC ERROR: #9 SNESComputeFunction() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:2436
[5]PETSC ERROR: #10 SNESSolve_KSPONLY() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:27
[5]PETSC ERROR: #11 SNESSolve() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4690
[5]PETSC ERROR: #12 main() at ex13.c:178
MPICH ERROR [Rank 5] [job id 213404.0] [Wed Nov 16 18:30:49 2022]
[crusher011] - Abort(59) (rank 5 in comm 0): application called
MPI_Abort(MPI_COMM_WORLD, 59) - process 5

On Wed, Nov 16, 2022 at 2:51 PM Mark Adams  wrote:

> I am able to reproduce this on Crusher with 8 processors.
>
> Junchao, did you want me to use --download-kokkos-commit=origin/develop ?
>
> On Wed, Nov 16, 2022 at 8:05 AM Mark Adams  wrote:
>
>> I can not build right now on Crusher or Perlmutter but I saw this on both.
>>
>> Here is an example output using src/snes/tests/ex13.c using the appended
>> .petscrc
>> This uses 64 processors and the 8 processor case worked. This has been
>> semi-nondertminisitc for me.
>>
>> (and I have attached my current Perlmutter problem)
>>
>> Hope this helps,
>> Mark
>>
>> -dm_plex_simplex 0
>> -dm_plex_dim 3
>> -dm_plex_box_lower 0,0,0
>> -dm_plex_box_upper 1,1,1
>> -petscpartitioner_simple_process_grid 2,2,2
>> -potential_petscspace_degree 2
>> -snes_max_it 1
>> -ksp_max_it 200
>> -ksp_type cg
>> -ksp_rtol 1.e-12
>> -ksp_norm_type unpreconditioned
>> -snes_rtol 1.e-8
>> #-pc_type gamg
>> #-pc_gamg_type agg
>> #-pc_gamg_agg_nsmooths 1
>> -pc_gamg_coarse_eq_limit 100
>> -pc_gamg_process_eq_limit 400
>> -pc_gamg_reuse_interpolation true
>> #-snes_monitor
>> #-ksp_monitor_short
>> -ksp_converged_reason
>> #-ksp_view
>> #-snes_converged_reason
>> #-mg_levels_ksp_max_it 2
>> -mg_levels_ksp_type chebyshev
>> #-mg_levels_ksp_type richardson
>> #-mg_levels_ksp_richardson_scale 0.8
>> -mg_levels_pc_type jacobi
>> -pc_gamg_esteig_ksp_type cg
>> -pc_gamg_esteig_ksp_max_it 10
>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
>> -dm_distribute
>> -petscpartitioner_type simple
>> -pc_gamg_repartition false
>> -pc_gamg_coarse_grid_layout_type compact
>> -pc_gamg_threshold 0.01
>> #-pc_gamg_threshold_scale .5
>> -pc_gamg_aggressive_coarsening 1
>> #-check_pointer_intensity 0
>> -snes_type ksponly
>> #-mg_coarse_sub_pc_factor_mat_solver_type cusparse
>> #-info :pc
>> #-use_gpu_aware_mpi 1
>> -options_left
>> #-malloc_debug
>> -benchmark_it 10
>> #-pc_gamg_use_parallel_coarse_grid_solver
>> #-mg_coarse_pc_type jacobi
>> #-mg_coarse_ksp_type cg
>> #-mg_coarse_ksp_rtol 1.e-2
>> 

Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

2022-11-16 Thread Mark Adams
I am able to reproduce this on Crusher with 8 processors.

Junchao, did you want me to use --download-kokkos-commit=origin/develop ?

On Wed, Nov 16, 2022 at 8:05 AM Mark Adams  wrote:

> I can not build right now on Crusher or Perlmutter but I saw this on both.
>
> Here is an example output using src/snes/tests/ex13.c using the appended
> .petscrc
> This uses 64 processors and the 8 processor case worked. This has been
> semi-nondertminisitc for me.
>
> (and I have attached my current Perlmutter problem)
>
> Hope this helps,
> Mark
>
> -dm_plex_simplex 0
> -dm_plex_dim 3
> -dm_plex_box_lower 0,0,0
> -dm_plex_box_upper 1,1,1
> -petscpartitioner_simple_process_grid 2,2,2
> -potential_petscspace_degree 2
> -snes_max_it 1
> -ksp_max_it 200
> -ksp_type cg
> -ksp_rtol 1.e-12
> -ksp_norm_type unpreconditioned
> -snes_rtol 1.e-8
> #-pc_type gamg
> #-pc_gamg_type agg
> #-pc_gamg_agg_nsmooths 1
> -pc_gamg_coarse_eq_limit 100
> -pc_gamg_process_eq_limit 400
> -pc_gamg_reuse_interpolation true
> #-snes_monitor
> #-ksp_monitor_short
> -ksp_converged_reason
> #-ksp_view
> #-snes_converged_reason
> #-mg_levels_ksp_max_it 2
> -mg_levels_ksp_type chebyshev
> #-mg_levels_ksp_type richardson
> #-mg_levels_ksp_richardson_scale 0.8
> -mg_levels_pc_type jacobi
> -pc_gamg_esteig_ksp_type cg
> -pc_gamg_esteig_ksp_max_it 10
> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
> -dm_distribute
> -petscpartitioner_type simple
> -pc_gamg_repartition false
> -pc_gamg_coarse_grid_layout_type compact
> -pc_gamg_threshold 0.01
> #-pc_gamg_threshold_scale .5
> -pc_gamg_aggressive_coarsening 1
> #-check_pointer_intensity 0
> -snes_type ksponly
> #-mg_coarse_sub_pc_factor_mat_solver_type cusparse
> #-info :pc
> #-use_gpu_aware_mpi 1
> -options_left
> #-malloc_debug
> -benchmark_it 10
> #-pc_gamg_use_parallel_coarse_grid_solver
> #-mg_coarse_pc_type jacobi
> #-mg_coarse_ksp_type cg
> #-mg_coarse_ksp_rtol 1.e-2
> #-mat_cusparse_transgen
> -snes_lag_jacobian -2
>
>
> On Tue, Nov 15, 2022 at 3:42 PM Junchao Zhang 
> wrote:
>
>> Mark,
>> Do you have a reproducer using petsc examples?
>>
>> On Tue, Nov 15, 2022, 12:49 PM Mark Adams  wrote:
>>
>>> Junchao, this is the same problem that I have been having right?
>>>
>>> On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>>
 I built petsc with:

 $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug
 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0
 --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices
 --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos
 --download-kokkos-kernels

 $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all

 $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install


 Then I build xolotl in a separate build directory (after checking out
 the "feature-petsc-kokkos" branch) with:

 $ cmake -DCMAKE_BUILD_TYPE=Debug
 -DKokkos_DIR=$HOME/build/petsc/debug/install
 -DPETSC_DIR=$HOME/build/petsc/debug/install 

 $ make -j4 SystemTester


 Then, from the xolotl build directory, run (for example):

 $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v

 Note that this test case will use the parameter file
 '/benchmarks/params_system_NE_4.txt' which has the command-line
 arguments for petsc in its "petscArgs=..." line. If you look at
 '/test/system/SystemTester.cpp' all the system test cases
 follow the same naming convention with their corresponding parameter files
 under '/benchmarks'.

 The failure happens with the NE_4 case (which is 2D) and the PSI_3 case
 (which is 1D).

 Let me know if this is still unclear.

 Thanks,


 *Philip Fackler *
 Research Software Engineer, Application Engineering Group
 Advanced Computing Systems Research Section
 Computer Science and Mathematics Division
 *Oak Ridge National Laboratory*
 --
 *From:* Junchao Zhang 
 *Sent:* Tuesday, November 15, 2022 00:16
 *To:* Fackler, Philip 
 *Cc:* petsc-users@mcs.anl.gov ; Blondel,
 Sophie 
 *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with
 COO interface crashes in some cases

 Hi, Philip,
   Can you tell me instructions to build Xolotl to reproduce the error?
 --Junchao Zhang


 On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users <
 petsc-users@mcs.anl.gov> wrote:

 In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use
 the COO interface for preallocating and setting values in the Jacobian
 matrix. I have found that with some of our test cases, using more than one
 MPI rank results in a crash. Way down in the preconditioner code in petsc a
 Mat gets computed that has "null" for the "productsymbolic" member of its
 "ops". 

Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

2022-11-16 Thread Fackler, Philip via petsc-users
-- PETSc 
Performance Summary: 
--

Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 
2022
Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 
2022-10-28 14:39:41 +

 Max   Max/Min Avg   Total
Time (sec):   6.023e+00 1.000   6.023e+00
Objects:  1.020e+02 1.000   1.020e+02
Flops:1.080e+09 1.000   1.080e+09  1.080e+09
Flops/sec:1.793e+08 1.000   1.793e+08  1.793e+08
MPI Msg Count:0.000e+00 0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00 0.000   0.000e+00  0.000e+00
MPI Reductions:   0.000e+00 0.000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N 
flops
and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   - Time --  - Flop --  --- Messages ---  -- 
Message Lengths --  -- Reductions --
Avg %Total Avg %TotalCount   %Total 
Avg %TotalCount   %Total
 0:  Main Stage: 6.0226e+00 100.0%  1.0799e+09 100.0%  0.000e+00   0.0%  
0.000e+000.0%  0.000e+00   0.0%


See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
  %T - percent time in this phase %F - percent flop in this phase
  %M - percent messages in this phase %L - percent message lengths in 
this phase
  %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all 
processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time 
over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per 
processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per 
processor)
   GPU %F: percent flops on GPU in this event

EventCount  Time (sec) Flop 
 --- Global ---  --- Stage   Total
   GPU- CpuToGpu -   - GpuToCpu - GPU

   Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
 Mflop/s Count   Size   Count   Size  %F


---


--- Event Stage 0: Main Stage

BuildTwoSided  3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

DMCreateMat1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  
0  0  0  0   1  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

SFSetGraph 3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

SFSetUp3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

SFPack  4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

SFUnpack4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

VecDot   190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00 100

VecMDot  775 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  
0  0  0  0   0  0  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00  0

VecNorm 1728 1.0   nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  
2  0  0  0   0  2  0  0  0  -nan
-nan  0 0.00e+000 0.00e+00 100

VecScale1983 1.0   nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  
1  0  0  0   0 

Re: [petsc-users] AMD vs Intel mobile CPU performance

2022-11-16 Thread Jed Brown
If you're using iterative solvers, compare memory bandwidth first, then cache. 
Flops aren't very important unless you use sparse direct solvers or have SNES 
residual/Jacobian evaluation that is expensive and has been written for 
vectorization.

If you can get the 6650U with LPDDR5-6400, it'll probably be faster. My laptop 
is the previous generation, 5900HS.

"D.J. Nolte"  writes:

> Hi all,
> I'm looking for a small laptop which I'll be using (also) for small scale
> PETSc (KSP & SNES) simulations. For this setting performance is not that
> important, but still, I wonder if the community has any experience with AMD
> Ryzen CPUs (specifically 5 Pro 6650U) CPUs compared to Intel i7 12th gen.
> Do I have to expect significant performance differences?
>
> Thanks!
>
> David