Hey Paul,
Thanks for providing background on this.
On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:
Dominic,
A few years ago, I was trying to minimize the amount of data transfer
to and from the GPU (for multi-GPU MatMult) by inspecting the indices
of the data that needed to be message to and from the device. Then, I
would call gather kernels on the GPU which pulled the scattered data
into contiguous buffers and then be transferred to the host
asynchronously (while the MatMult was occurring). The existence of
VecScatterInitializeForGPU was added in order to build the necessary
buffers as needed. This was the motivation behind the existence of
VecScatterInitializeForGPU.
An alternative approach is to message the smallest contiguous buffer
containing all the data with a single cudaMemcpyAsync. This is the
method currently implemented.
I never found a case where the former implementation (with a GPU
gather-kernel) performed better than the alternative approach which
messaged the smallest contiguous buffer. I looked at many, many matrices.
Now, as far as I understand the VecScatter kernels, this method should
only get called if the transfer is MPI_General (i.e. PtoP parallel to
parallel). Other VecScatter methods are called in other circumstances
where the the scatter is not MPI_General. That assumption could be
wrong though.
I see. I figured there was some logic in place to make sure that this
function only gets called in cases where the transfer type is
MPI_General. I'm getting segfaults in this function where the todata and
fromdata are of a different type. This could easily be user error but
I'm not sure. Here is an example valgrind error:
==27781== Invalid read of size 8
==27781== at 0x1188080: VecScatterInitializeForGPU (vscatcusp.c:46)
==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*, _p_Vec*)
(mpiaijcusparse.cu:108)
==27781== by 0xA20CC3: MatMult (matrix.c:2242)
==27781== by 0x4645E4: main (ex7.c:93)
==27781== Address 0x286305e0 is 1,616 bytes inside a block of size 1,620
alloc'd
==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int, char const*,
char const*, void**) (mal.c:27)
==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long, int, char
const*, char const*, void**) (mtr.c:186)
==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*, MatAssemblyType)
(mpiaij.c:706)
==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
==27781== by 0x464301: main (ex7.c:78)
This was produced by src/ksp/ksp/tutorials/ex7.c. The command line
options are
./ex7 -mat_type mpiaijcusparse -vec_type cusp
In this particular case the todata is of type VecScatter_Seq_Stride and
fromdata is of type VecScatter_Seq_General. The complete valgrind log
(including configure options for petsc) is attached.
Any comments or suggestions are appreciated.
Cheers,
Dominic
-Paul
On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I'm trying to understand VecScatterInitializeForGPU in
src/vec/vec/utils/veccusp/__vscatcusp.c. I don't understand why
this function can get away with casting the fromdata and todata in
the inctx to VecScatter_MPI_General. Don't we need to inspect the
VecScatterType fields of the todata and fromdata?
Cheers,
Dominic
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036 <tel:303-996-2036>
Fax: 303-448-7756 <tel:303-448-7756>
www.txcorp.com <http://www.txcorp.com>
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com
==27786== Memcheck, a memory error detector
==27786== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==27786== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==27786== Command: ./ex7 -mat_type mpiaijcusparse -vec_type cusp
==27786==
==27786== Syscall param writev(vector[...]) points to uninitialised byte(s)
==27786== at 0x157E433B: writev (in /lib64/libc-2.12.so)
==27786== by 0x1490D806: mca_oob_tcp_msg_send_handler (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x1490E82C: mca_oob_tcp_peer_send (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x14910BEC: mca_oob_tcp_send_nb (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x1492A315: orte_rml_oob_send (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x1492A55F: orte_rml_oob_send_buffer (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x148F8327: modex (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x147E7ACA: ompi_mpi_init (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x147FE57F: PMPI_Init_thread (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x4929D9: PetscInitialize (pinit.c:777)
==27786== by 0x463B11: main (ex7.c:49)
==27786== Address 0x16c8efc1 is 161 bytes inside a block of size 256 alloc'd
==27786== at 0x4C27BE0: realloc (vg_replace_malloc.c:662)
==27786== by 0x14943AC2: opal_dss_buffer_extend (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x14943C84: opal_dss_copy_payload (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x148F11D6: orte_grpcomm_base_pack_modex_entries (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x148F82DC: modex (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x147E7ACA: ompi_mpi_init (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x147FE57F: PMPI_Init_thread (in
/scr_ivy/dmeiser/PTSOLVE/openmpi-1.6.1-nodl/lib/libmpi.so.1.0.3)
==27786== by 0x4929D9: PetscInitialize (pinit.c:777)
==27786== by 0x463B11: main (ex7.c:49)
==27786==
==27786== Warning: set address range perms: large range [0x800000000,
0x1100000000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x3aeed000,
0x33aeed000) (noaccess)
==27786== Warning: set address range perms: large range [0x1100000000,
0x1400000000) (noaccess)
==27786== Invalid read of size 8
==27786== at 0x1188080: VecScatterInitializeForGPU (vscatcusp.c:46)
==27786== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*, _p_Vec*)
(mpiaijcusparse.cu:108)
==27786== by 0xA20CC3: MatMult (matrix.c:2242)
==27786== by 0x4645E4: main (ex7.c:93)
==27786== Address 0x28634560 is 1,616 bytes inside a block of size 1,620
alloc'd
==27786== at 0x4C26548: memalign (vg_replace_malloc.c:727)
==27786== by 0x4654F9: PetscMallocAlign(unsigned long, int, char const*,
char const*, void**) (mal.c:27)
==27786== by 0xCAEECC: PetscTrMallocDefault(unsigned long, int, char const*,
char const*, void**) (mtr.c:186)
==27786== by 0x5A5296: VecScatterCreate (vscat.c:1168)
==27786== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
==27786== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*, MatAssemblyType)
(mpiaij.c:706)
==27786== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
==27786== by 0x464301: main (ex7.c:78)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF97DB: PetscAbsScalar(double) (petscmath.h:206)
==27786== by 0xCF9878: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x542479: VecValidValues (rvector.c:32)
==27786== by 0x1105581: PCApply (precon.c:434)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF9880: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x542479: VecValidValues (rvector.c:32)
==27786== by 0x1105581: PCApply (precon.c:434)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC4E8DF: KSPSolve_GMRES(_p_KSP*) (gmres.c:234)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF97DB: PetscAbsScalar(double) (petscmath.h:206)
==27786== by 0xCF988B: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x542479: VecValidValues (rvector.c:32)
==27786== by 0x1105581: PCApply (precon.c:434)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF9893: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x542479: VecValidValues (rvector.c:32)
==27786== by 0x1105581: PCApply (precon.c:434)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC4E8DF: KSPSolve_GMRES(_p_KSP*) (gmres.c:234)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF97DB: PetscAbsScalar(double) (petscmath.h:206)
==27786== by 0xCF9878: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x5424F1: VecValidValues (rvector.c:34)
==27786== by 0x1105972: PCApply (precon.c:442)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF9880: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x5424F1: VecValidValues (rvector.c:34)
==27786== by 0x1105972: PCApply (precon.c:442)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC4E8DF: KSPSolve_GMRES(_p_KSP*) (gmres.c:234)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF97DB: PetscAbsScalar(double) (petscmath.h:206)
==27786== by 0xCF988B: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x5424F1: VecValidValues (rvector.c:34)
==27786== by 0x1105972: PCApply (precon.c:442)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786==
==27786== Conditional jump or move depends on uninitialised value(s)
==27786== at 0xCF9893: PetscIsInfOrNanScalar (mathinf.c:67)
==27786== by 0x5424F1: VecValidValues (rvector.c:34)
==27786== by 0x1105972: PCApply (precon.c:442)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC33E36: KSPSolve_BCGS (bcgs.c:50)
==27786== by 0xBAB71C: KSPSolve (itfunc.c:432)
==27786== by 0xA9C0EF: PCApply_BJacobi_Multiblock(_p_PC*, _p_Vec*, _p_Vec*)
(bjacobi.c:945)
==27786== by 0x11057E6: PCApply (precon.c:440)
==27786== by 0x1131693: KSP_PCApply(_p_KSP*, _p_Vec*, _p_Vec*)
(kspimpl.h:227)
==27786== by 0x1132479: KSPInitialResidual (itres.c:64)
==27786== by 0xC4E8DF: KSPSolve_GMRES(_p_KSP*) (gmres.c:234)
==27786==
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Error in external library!
[0]PETSC ERROR: CUSP error 61!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.3-2332-g54f71ec GIT Date:
2014-01-20 14:12:11 -0700
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./ex7 on a pargpudbg named ivy.txcorp.com by dmeiser Wed Jan 22
10:23:36 2014
[0]PETSC ERROR: Libraries linked from
/scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg/lib
[0]PETSC ERROR: Configure run at Tue Jan 21 16:53:42 2014
[0]PETSC ERROR: Configure options
--with-cmake=/scr_ivy/dmeiser/PTSOLVE/cmake/bin/cmake
--prefix=/scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg --with-precision=double
--with-scalar-type=real --with-fortran-kernels=1 --with-x=no --with-mpi=yes
--with-mpi-dir=/scr_ivy/dmeiser/PTSOLVE/openmpi/ --with-openmp=yes
--with-valgrind=1 --with-shared-libraries=0 --with-c-support=yes
--with-debugging=yes --with-cuda=1 --with-cuda-dir=/usr/local/cuda
--with-cuda-arch=sm_35 --download-txpetscgpu --with-thrust=yes
--with-thrust-dir=/usr/local/cuda/include --with-umfpack=yes --download-umfpack
--with-mumps=yes --with-superlu=yes --download-superlu=yes --download-mumps=yes
--download-scalapack --download-parmetis --download-metis --with-cusp=yes
--with-cusp-dir=/scr_ivy/dmeiser/PTSOLVE/cusp/include --CUDAFLAGS="-O3
-I/usr/local/cuda/include --generate-code arch=compute_20,code=sm_20
--generate-code arch=compute_20,code=sm_21 --generate-code
arch=compute_30,code=sm_30 --generate-code arch=compute_35,code=sm_35"
--with-clanguage=C++ --CFLAGS="-pipe -fPIC" --CXXFLAGS="-pipe -fPIC"
--with-c2html=0 --with-gelus=1 --with-gelus-dir=/scr_ivy/dmeiser/software/gelus
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: VecCUSPAllocateCheck() line 72 in
/scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecCUSPCopyToGPU() line 96 in
/scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecCUSPGetArrayReadWrite() line 1946 in
/scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecAXPBYPCZ_SeqCUSP() line 1507 in
/scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecAXPBYPCZ() line 726 in
/scr_ivy/dmeiser/petsc/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: KSPSolve_BCGS() line 120 in
/scr_ivy/dmeiser/petsc/src/ksp/ksp/impls/bcgs/bcgs.c
[0]PETSC ERROR: KSPSolve() line 432 in
/scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: PCApply_BJacobi_Multiblock() line 945 in
/scr_ivy/dmeiser/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c
[0]PETSC ERROR: PCApply() line 440 in
/scr_ivy/dmeiser/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: KSP_PCApply() line 227 in
/scr_ivy/dmeiser/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: KSPInitialResidual() line 64 in
/scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: KSPSolve_GMRES() line 234 in
/scr_ivy/dmeiser/petsc/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: KSPSolve() line 432 in
/scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: main() line 209 in
/scr_ivy/dmeiser/petsc/src/ksp/ksp/examples/tutorials/ex7.c
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 76.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
==27786==
==27786== HEAP SUMMARY:
==27786== in use at exit: 172,193,472 bytes in 211,364 blocks
==27786== total heap usage: 350,508 allocs, 139,144 frees, 200,031,576 bytes
allocated
==27786==
==27786== LEAK SUMMARY:
==27786== definitely lost: 954 bytes in 28 blocks
==27786== indirectly lost: 61 bytes in 7 blocks
==27786== possibly lost: 2,154,848 bytes in 15,843 blocks
==27786== still reachable: 170,037,609 bytes in 195,486 blocks
==27786== suppressed: 0 bytes in 0 blocks
==27786== Rerun with --leak-check=full to see details of leaked memory
==27786==
==27786== For counts of detected and suppressed errors, rerun with: -v
==27786== Use --track-origins=yes to see where uninitialised values come from
==27786== ERROR SUMMARY: 82 errors from 10 contexts (suppressed: 6 from 6)