Hmmm. I may not have protected against the case where the mpaijcusp(arse) classes are called but without mpirun/mpiexec. I suppose it should have occurred to me that someone would do this.
try : mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp In this scenario, the sequential to sequential vecscatters should be called. Then, mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp In this scenario, MPI_General vecscatters should be called ... and work correctly if you have a system with multiple GPUs. I -Paul On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser <[email protected]> wrote: > Hey Paul, > > Thanks for providing background on this. > > > On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote: > >> >> Dominic, >> A few years ago, I was trying to minimize the amount of data transfer >> to and from the GPU (for multi-GPU MatMult) by inspecting the indices >> of the data that needed to be message to and from the device. Then, I >> would call gather kernels on the GPU which pulled the scattered data >> into contiguous buffers and then be transferred to the host >> asynchronously (while the MatMult was occurring). The existence of >> VecScatterInitializeForGPU was added in order to build the necessary >> buffers as needed. This was the motivation behind the existence of >> VecScatterInitializeForGPU. >> An alternative approach is to message the smallest contiguous buffer >> containing all the data with a single cudaMemcpyAsync. This is the >> method currently implemented. >> I never found a case where the former implementation (with a GPU >> gather-kernel) performed better than the alternative approach which >> messaged the smallest contiguous buffer. I looked at many, many matrices. >> Now, as far as I understand the VecScatter kernels, this method should >> only get called if the transfer is MPI_General (i.e. PtoP parallel to >> parallel). Other VecScatter methods are called in other circumstances >> where the the scatter is not MPI_General. That assumption could be >> wrong though. >> > > > I see. I figured there was some logic in place to make sure that this > function only gets called in cases where the transfer type is MPI_General. > I'm getting segfaults in this function where the todata and fromdata are of > a different type. This could easily be user error but I'm not sure. Here is > an example valgrind error: > > ==27781== Invalid read of size 8 > ==27781== at 0x1188080: VecScatterInitializeForGPU (vscatcusp.c:46) > ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*, _p_Vec*) ( > mpiaijcusparse.cu:108) > ==27781== by 0xA20CC3: MatMult (matrix.c:2242) > ==27781== by 0x4645E4: main (ex7.c:93) > ==27781== Address 0x286305e0 is 1,616 bytes inside a block of size 1,620 > alloc'd > ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727) > ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int, char const*, > char const*, void**) (mal.c:27) > ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long, int, char > const*, char const*, void**) (mtr.c:186) > ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168) > ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116) > ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*, MatAssemblyType) > (mpiaij.c:706) > ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959) > ==27781== by 0x464301: main (ex7.c:78) > > This was produced by src/ksp/ksp/tutorials/ex7.c. The command line options > are > > ./ex7 -mat_type mpiaijcusparse -vec_type cusp > > In this particular case the todata is of type VecScatter_Seq_Stride and > fromdata is of type VecScatter_Seq_General. The complete valgrind log > (including configure options for petsc) is attached. > > Any comments or suggestions are appreciated. > Cheers, > Dominic > > >> -Paul >> >> >> On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi, >> >> I'm trying to understand VecScatterInitializeForGPU in >> src/vec/vec/utils/veccusp/__vscatcusp.c. I don't understand why >> >> this function can get away with casting the fromdata and todata in >> the inctx to VecScatter_MPI_General. Don't we need to inspect the >> VecScatterType fields of the todata and fromdata? >> >> Cheers, >> Dominic >> >> -- >> Dominic Meiser >> Tech-X Corporation >> 5621 Arapahoe Avenue >> Boulder, CO 80303 >> USA >> Telephone: 303-996-2036 <tel:303-996-2036> >> Fax: 303-448-7756 <tel:303-448-7756> >> www.txcorp.com <http://www.txcorp.com> >> >> >> > > > -- > Dominic Meiser > Tech-X Corporation > 5621 Arapahoe Avenue > Boulder, CO 80303 > USA > Telephone: 303-996-2036 > Fax: 303-448-7756 > www.txcorp.com > >
