Re: [petsc-users] PetscSF Fortran interface
Hi, Nicholas, I will have a look and get back to you. Thanks. --Junchao Zhang On Wed, Nov 16, 2022 at 9:27 PM Nicholas Arnold-Medabalimi < narno...@umich.edu> wrote: > Hi Petsc Users > > I'm in the process of adding some Petsc for mesh management into an > existing Fortran Solver. It has been relatively straightforward so far but > I am running into an issue with using PetscSF routines. Some like the > PetscSFGetGraph work no problem but a few of my routines require the use of > PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in > the fortran interface and I just get a linking error. I also don't seem to > see a PetscSF file in the finclude. Any clarification or assistance would > be appreciated. > > > Sincerely > Nicholas > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan >
[petsc-users] PetscSF Fortran interface
Hi Petsc Users I'm in the process of adding some Petsc for mesh management into an existing Fortran Solver. It has been relatively straightforward so far but I am running into an issue with using PetscSF routines. Some like the PetscSFGetGraph work no problem but a few of my routines require the use of PetscSFGetLeafRanks and PetscSFGetRootRanks and those don't seem to be in the fortran interface and I just get a linking error. I also don't seem to see a PetscSF file in the finclude. Any clarification or assistance would be appreciated. Sincerely Nicholas -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan
Re: [petsc-users] Different solution while running in parallel
Karhik, Can you find out the condition number of your matrix? Hong From: petsc-users on behalf of Karthikeyan Chockalingam - STFC UKRI via petsc-users Sent: Wednesday, November 16, 2022 6:04 PM To: petsc-users@mcs.anl.gov Subject: [petsc-users] Different solution while running in parallel Hello, I tried to solve a (FE discretized) Poisson equation using PCLU. For some reason I am getting different solutions while running the problem on one and two cores. I have attached the output file (out.txt) from both the runs. I am printing A, b and x from both the runs – while A and b are the same but the solution seems is different. I am not sure what I doing wrong. Below is my matrix, vector, and solve setup. Mat A; Vec b, x; ierr = MatCreate(PETSC_COMM_WORLD, ); CHKERRQ(ierr); ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); ierr = MatMPIAIJSetPreallocation(A,d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr); ierr = MatCreateVecs(A, , ); CHKERRQ(ierr); KSP ksp; PCpc; KSPCreate(PETSC_COMM_WORLD, ); KSPSetOperators(ksp, A, A); ierr = KSPSetType(ksp,KSPPREONLY);CHKERRQ(ierr); ierr = KSPGetPC(ksp,);CHKERRQ(ierr); ierr = PCSetType(pc,PCLU);CHKERRQ(ierr); ierr = PCFactorSetMatSolverType(pc,MATSOLVERMUMPS);CHKERRQ(ierr); KSPSolve(ksp, b, x); Thank you for your help. Karhik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
[petsc-users] Different solution while running in parallel
Hello, I tried to solve a (FE discretized) Poisson equation using PCLU. For some reason I am getting different solutions while running the problem on one and two cores. I have attached the output file (out.txt) from both the runs. I am printing A, b and x from both the runs – while A and b are the same but the solution seems is different. I am not sure what I doing wrong. Below is my matrix, vector, and solve setup. Mat A; Vec b, x; ierr = MatCreate(PETSC_COMM_WORLD, ); CHKERRQ(ierr); ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); ierr = MatMPIAIJSetPreallocation(A,d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); CHKERRQ(ierr); ierr = MatCreateVecs(A, , ); CHKERRQ(ierr); KSP ksp; PCpc; KSPCreate(PETSC_COMM_WORLD, ); KSPSetOperators(ksp, A, A); ierr = KSPSetType(ksp,KSPPREONLY);CHKERRQ(ierr); ierr = KSPGetPC(ksp,);CHKERRQ(ierr); ierr = PCSetType(pc,PCLU);CHKERRQ(ierr); ierr = PCFactorSetMatSolverType(pc,MATSOLVERMUMPS);CHKERRQ(ierr); KSPSolve(ksp, b, x); Thank you for your help. Karhik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. MPI initialized with 1 MPI processes MPI initialized with thread support level 1 AMReX (22.10-20-g3082028e4287) initialized Mat Object: 1 MPI process type: mpiaij row 0: (0, 1.) (1, 0.) (5, 0.) (6, 0.) row 1: (0, 0.) (1, 1.) (2, 0.) (5, 0.) (6, 0.) (7, 0.) row 2: (1, 0.) (2, 1.) (3, 0.) (6, 0.) (7, 0.) (8, 0.) row 3: (2, 0.) (3, 1.) (4, 0.) (7, 0.) (8, 0.) (9, 0.) row 4: (3, 0.) (4, 1.) (8, 0.) (9, 0.) row 5: (0, 0.) (1, 0.) (5, 1.) (6, 0.) (10, 0.) (11, 0.) row 6: (0, 0.) (1, 0.) (2, 0.) (5, 0.) (6, 10.6667) (7, -1.3) (10, 0.) (11, -1.3) (12, -1.3) row 7: (1, 0.) (2, 0.) (3, 0.) (6, -1.3) (7, 10.6667) (8, -1.3) (11, -1.3) (12, -1.3) (13, -1.3) row 8: (2, 0.) (3, 0.) (4, 0.) (7, -1.3) (8, 10.6667) (9, 0.) (12, -1.3) (13, -1.3) (14, 0.) row 9: (3, 0.) (4, 0.) (8, 0.) (9, 1.) (13, 0.) (14, 0.) row 10: (5, 0.) (6, 0.) (10, 1.) (11, 0.) (15, 0.) (16, 0.) row 11: (5, 0.) (6, -1.3) (7, -1.3) (10, 0.) (11, 10.6667) (12, -1.3) (15, 0.) (16, -1.3) (17, -1.3) row 12: (6, -1.3) (7, -1.3) (8, -1.3) (11, -1.3) (12, 10.6667) (13, -1.3) (16, -1.3) (17, -1.3) (18, -1.3) row 13: (7, -1.3) (8, -1.3) (9, 0.) (12, -1.3) (13, 10.6667) (14, 0.) (17, -1.3) (18, -1.3) (19, 0.) row 14: (8, 0.) (9, 0.) (13, 0.) (14, 1.) (18, 0.) (19, 0.) row 15: (10, 0.) (11, 0.) (15, 1.) (16, 0.) (20, 0.) (21, 0.) row 16: (10, 0.) (11, -1.3) (12, -1.3) (15, 0.) (16, 10.6667) (17, -1.3) (20, 0.) (21, 0.) (22, 0.) row 17: (11, -1.3) (12, -1.3) (13, -1.3) (16, -1.3) (17, 10.6667) (18, -1.3) (21, 0.) (22, 0.) (23, 0.) row 18: (12, -1.3) (13, -1.3) (14, 0.) (17, -1.3) (18, 10.6667) (19, 0.) (22, 0.) (23, 0.) (24, 0.) row 19: (13, 0.) (14, 0.) (18, 0.) (19, 1.) (23, 0.) (24, 0.) row 20: (15, 0.) (16, 0.) (20, 1.) (21, 0.) row 21: (15, 0.) (16, 0.) (17, 0.) (20, 0.) (21, 1.) (22, 0.) row 22: (16, 0.) (17, 0.) (18, 0.) (21, 0.) (22, 1.) (23, 0.) row 23: (17, 0.) (18, 0.) (19, 0.) (22, 0.) (23, 1.) (24, 0.) row 24: (18, 0.) (19, 0.) (23, 0.) (24, 1.) Vec Object: 1 MPI process type: seq 0. 0. 0. 0. 0. 0. 4. 4. 4. 0. 0. 4. 4. 4. 0. 0. 4. 4. 4. 0. 0. 0. 0. 0. 0. Vec Object: 1 MPI process type: seq 0. 0. 0. 0. 0. 0. 0.771429 0.964286 0.771429 0. 0. 0.964286 1.24286 0.964286 0. 0. 0.771429 0.964286 0.771429 0. 0. 0. 0. 0. 0. Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] AMReX (22.10-20-g3082028e4287) finalized MPI initialized with 2 MPI processes MPI initialized with thread support level 1 AMReX (22.10-20-g3082028e4287) initialized Mat Object: 2 MPI processes type: mpiaij row 0: (0, 1.) (1, 0.) (5, 0.) (6, 0.) row 1: (0, 0.) (1, 1.) (2, 0.) (5, 0.) (6, 0.) (7, 0.) row 2: (1, 0.) (2, 1.) (3, 0.) (6, 0.) (7, 0.) (8, 0.) row 3: (2, 0.) (3, 1.) (4, 0.) (7, 0.) (8, 0.) (9, 0.) row 4: (3, 0.)
Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases
Junchao, I tried with the Kokkos dev branch and get this with 8 processes (and the .petscrc file that I sent/appended): srun -n8 -N1 --gpus-per-task=1 --gpu-bind=closest ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 4 -dm_view -log_tracexxx -log_view -dm_mat_type aijkokkos -dm_vec_type kokkos DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 4913 4913 4913 4913 4913 4913 4913 4913 Number of 1-cells per rank: 13872 13872 13872 13872 13872 13872 13872 13872 Number of 2-cells per rank: 13056 13056 13056 13056 13056 13056 13056 13056 Number of 3-cells per rank: 4096 4096 4096 4096 4096 4096 4096 4096 Labels: celltype: 4 strata with value/size (0 (4913), 1 (13872), 4 (13056), 7 (4096)) depth: 4 strata with value/size (0 (4913), 1 (13872), 2 (13056), 3 (4096)) marker: 1 strata with value/size (1 (3169)) Face Sets: 3 strata with value/size (1 (961), 3 (961), 6 (961)) Number equations N = 250047 [5]PETSC ERROR: [5]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access [5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [5]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [5]PETSC ERROR: - Stack Frames [5]PETSC ERROR: The line numbers in the error traceback are not always exact. [5]PETSC ERROR: #1 MPI function [5]PETSC ERROR: #2 PetscSFLinkWaitRequests_MPI() at /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/sfmpi.c:53 [5]PETSC ERROR: #3 PetscSFLinkFinishCommunication() at /gpfs/alpine/csc314/scratch/adams/petsc/include/../src/vec/is/sf/impls/basic/sfpack.h:277 [5]PETSC ERROR: #4 PetscSFBcastEnd_Basic() at /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:205 [5]PETSC ERROR: #5 PetscSFBcastEnd() at /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:1477 [5]PETSC ERROR: #6 DMGlobalToLocalEnd() at /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:2849 [5]PETSC ERROR: #7 SNESComputeFunction_DMLocal() at /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/utils/dmlocalsnes.c:65 [5]PETSC ERROR: #8 SNES callback function [5]PETSC ERROR: #9 SNESComputeFunction() at /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:2436 [5]PETSC ERROR: #10 SNESSolve_KSPONLY() at /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:27 [5]PETSC ERROR: #11 SNESSolve() at /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4690 [5]PETSC ERROR: #12 main() at ex13.c:178 MPICH ERROR [Rank 5] [job id 213404.0] [Wed Nov 16 18:30:49 2022] [crusher011] - Abort(59) (rank 5 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 5 On Wed, Nov 16, 2022 at 2:51 PM Mark Adams wrote: > I am able to reproduce this on Crusher with 8 processors. > > Junchao, did you want me to use --download-kokkos-commit=origin/develop ? > > On Wed, Nov 16, 2022 at 8:05 AM Mark Adams wrote: > >> I can not build right now on Crusher or Perlmutter but I saw this on both. >> >> Here is an example output using src/snes/tests/ex13.c using the appended >> .petscrc >> This uses 64 processors and the 8 processor case worked. This has been >> semi-nondertminisitc for me. >> >> (and I have attached my current Perlmutter problem) >> >> Hope this helps, >> Mark >> >> -dm_plex_simplex 0 >> -dm_plex_dim 3 >> -dm_plex_box_lower 0,0,0 >> -dm_plex_box_upper 1,1,1 >> -petscpartitioner_simple_process_grid 2,2,2 >> -potential_petscspace_degree 2 >> -snes_max_it 1 >> -ksp_max_it 200 >> -ksp_type cg >> -ksp_rtol 1.e-12 >> -ksp_norm_type unpreconditioned >> -snes_rtol 1.e-8 >> #-pc_type gamg >> #-pc_gamg_type agg >> #-pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 100 >> -pc_gamg_process_eq_limit 400 >> -pc_gamg_reuse_interpolation true >> #-snes_monitor >> #-ksp_monitor_short >> -ksp_converged_reason >> #-ksp_view >> #-snes_converged_reason >> #-mg_levels_ksp_max_it 2 >> -mg_levels_ksp_type chebyshev >> #-mg_levels_ksp_type richardson >> #-mg_levels_ksp_richardson_scale 0.8 >> -mg_levels_pc_type jacobi >> -pc_gamg_esteig_ksp_type cg >> -pc_gamg_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -dm_distribute >> -petscpartitioner_type simple >> -pc_gamg_repartition false >> -pc_gamg_coarse_grid_layout_type compact >> -pc_gamg_threshold 0.01 >> #-pc_gamg_threshold_scale .5 >> -pc_gamg_aggressive_coarsening 1 >> #-check_pointer_intensity 0 >> -snes_type ksponly >> #-mg_coarse_sub_pc_factor_mat_solver_type cusparse >> #-info :pc >> #-use_gpu_aware_mpi 1 >> -options_left >> #-malloc_debug >> -benchmark_it 10 >> #-pc_gamg_use_parallel_coarse_grid_solver >> #-mg_coarse_pc_type jacobi >> #-mg_coarse_ksp_type cg >> #-mg_coarse_ksp_rtol 1.e-2 >>
Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases
I am able to reproduce this on Crusher with 8 processors. Junchao, did you want me to use --download-kokkos-commit=origin/develop ? On Wed, Nov 16, 2022 at 8:05 AM Mark Adams wrote: > I can not build right now on Crusher or Perlmutter but I saw this on both. > > Here is an example output using src/snes/tests/ex13.c using the appended > .petscrc > This uses 64 processors and the 8 processor case worked. This has been > semi-nondertminisitc for me. > > (and I have attached my current Perlmutter problem) > > Hope this helps, > Mark > > -dm_plex_simplex 0 > -dm_plex_dim 3 > -dm_plex_box_lower 0,0,0 > -dm_plex_box_upper 1,1,1 > -petscpartitioner_simple_process_grid 2,2,2 > -potential_petscspace_degree 2 > -snes_max_it 1 > -ksp_max_it 200 > -ksp_type cg > -ksp_rtol 1.e-12 > -ksp_norm_type unpreconditioned > -snes_rtol 1.e-8 > #-pc_type gamg > #-pc_gamg_type agg > #-pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 100 > -pc_gamg_process_eq_limit 400 > -pc_gamg_reuse_interpolation true > #-snes_monitor > #-ksp_monitor_short > -ksp_converged_reason > #-ksp_view > #-snes_converged_reason > #-mg_levels_ksp_max_it 2 > -mg_levels_ksp_type chebyshev > #-mg_levels_ksp_type richardson > #-mg_levels_ksp_richardson_scale 0.8 > -mg_levels_pc_type jacobi > -pc_gamg_esteig_ksp_type cg > -pc_gamg_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -dm_distribute > -petscpartitioner_type simple > -pc_gamg_repartition false > -pc_gamg_coarse_grid_layout_type compact > -pc_gamg_threshold 0.01 > #-pc_gamg_threshold_scale .5 > -pc_gamg_aggressive_coarsening 1 > #-check_pointer_intensity 0 > -snes_type ksponly > #-mg_coarse_sub_pc_factor_mat_solver_type cusparse > #-info :pc > #-use_gpu_aware_mpi 1 > -options_left > #-malloc_debug > -benchmark_it 10 > #-pc_gamg_use_parallel_coarse_grid_solver > #-mg_coarse_pc_type jacobi > #-mg_coarse_ksp_type cg > #-mg_coarse_ksp_rtol 1.e-2 > #-mat_cusparse_transgen > -snes_lag_jacobian -2 > > > On Tue, Nov 15, 2022 at 3:42 PM Junchao Zhang > wrote: > >> Mark, >> Do you have a reproducer using petsc examples? >> >> On Tue, Nov 15, 2022, 12:49 PM Mark Adams wrote: >> >>> Junchao, this is the same problem that I have been having right? >>> >>> On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users < >>> petsc-users@mcs.anl.gov> wrote: >>> I built petsc with: $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos --download-kokkos-kernels $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install Then I build xolotl in a separate build directory (after checking out the "feature-petsc-kokkos" branch) with: $ cmake -DCMAKE_BUILD_TYPE=Debug -DKokkos_DIR=$HOME/build/petsc/debug/install -DPETSC_DIR=$HOME/build/petsc/debug/install $ make -j4 SystemTester Then, from the xolotl build directory, run (for example): $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v Note that this test case will use the parameter file '/benchmarks/params_system_NE_4.txt' which has the command-line arguments for petsc in its "petscArgs=..." line. If you look at '/test/system/SystemTester.cpp' all the system test cases follow the same naming convention with their corresponding parameter files under '/benchmarks'. The failure happens with the NE_4 case (which is 2D) and the PSI_3 case (which is 1D). Let me know if this is still unclear. Thanks, *Philip Fackler * Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division *Oak Ridge National Laboratory* -- *From:* Junchao Zhang *Sent:* Tuesday, November 15, 2022 00:16 *To:* Fackler, Philip *Cc:* petsc-users@mcs.anl.gov ; Blondel, Sophie *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases Hi, Philip, Can you tell me instructions to build Xolotl to reproduce the error? --Junchao Zhang On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users < petsc-users@mcs.anl.gov> wrote: In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops".
Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.
-- PETSc Performance Summary: -- Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 + Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops:1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec:1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count:0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: - Time -- - Flop -- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %TotalCount %Total Avg %TotalCount %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+000.0% 0.000e+00 0.0% See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event EventCount Time (sec) Flop --- Global --- --- Stage Total GPU- CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 DMCreateMat1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 SFSetUp3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 SFUnpack4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+000 0.00e+00 100 VecScale1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0
Re: [petsc-users] AMD vs Intel mobile CPU performance
If you're using iterative solvers, compare memory bandwidth first, then cache. Flops aren't very important unless you use sparse direct solvers or have SNES residual/Jacobian evaluation that is expensive and has been written for vectorization. If you can get the 6650U with LPDDR5-6400, it'll probably be faster. My laptop is the previous generation, 5900HS. "D.J. Nolte" writes: > Hi all, > I'm looking for a small laptop which I'll be using (also) for small scale > PETSc (KSP & SNES) simulations. For this setting performance is not that > important, but still, I wonder if the community has any experience with AMD > Ryzen CPUs (specifically 5 Pro 6650U) CPUs compared to Intel i7 12th gen. > Do I have to expect significant performance differences? > > Thanks! > > David