Re: [petsc-dev] VecScatterInitializeForGPU

Paul Mullowney Wed, 22 Jan 2014 09:54:53 -0800

Oh. You're opening a can of worms but maybe that's your intent ;) I see the
block Jacobi preconditioner in the valgrind logs.


Do,
mpirun -n 1 (or 2) ./ex7 -mat_type mpiaijcusparse -vec_type mpicusp
-pc_type none

>From here, we can try to sort out the VecScatterInitializeForGPU problem
when mpirun/exec is not used.

If you want to implement block jacobi preconditioner on multiple GPUs,
that's a larger problem to solve. I had some code that sort of worked. We'd
have to sit down and discuss.
-Paul


On Wed, Jan 22, 2014 at 10:48 AM, Dominic Meiser <[email protected]> wrote:

>  Attached are the logs with 1 rank and 2 ranks. As far as I can tell
> these are different errors.
>
> For the log attached to the previous email I chose to run ex7 without
> mpirun so that valgrind checks ex7 and not mpirun. Is there a way to have
> valgrind check the mpi processes rather than mpirun?
>
> Cheers,
> Dominic
>
>
>
> On 01/22/2014 10:37 AM, Paul Mullowney wrote:
>
>  Hmmm. I may not have protected against the case where the
> mpaijcusp(arse) classes are called but without mpirun/mpiexec. I suppose it
> should have occurred to me that someone would do this.
>
> try :
> mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>
> In this scenario, the sequential to sequential vecscatters should be
> called.
>
> Then,
> mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp
>
> In this scenario, MPI_General vecscatters should be called ... and work
> correctly if you have a system with multiple GPUs.
>
> I
>
> -Paul
>
>
> On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser <[email protected]>wrote:
>
>> Hey Paul,
>>
>> Thanks for providing background on this.
>>
>>
>> On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:
>>
>>>
>>> Dominic,
>>> A few years ago, I was trying to minimize the amount of data transfer
>>> to and from the GPU (for multi-GPU MatMult) by inspecting the indices
>>> of the data that needed to be message to and from the device. Then, I
>>> would call gather kernels on the GPU which pulled the scattered data
>>> into contiguous buffers and then be transferred to the host
>>> asynchronously (while the MatMult was occurring). The existence of
>>> VecScatterInitializeForGPU was added in order to build the necessary
>>> buffers as needed. This was the motivation behind the existence of
>>> VecScatterInitializeForGPU.
>>> An alternative approach is to message the smallest contiguous buffer
>>> containing all the data with a single cudaMemcpyAsync. This is the
>>> method currently implemented.
>>> I never found a case where the former implementation (with a GPU
>>> gather-kernel) performed better than the alternative approach which
>>> messaged the smallest contiguous buffer. I looked at many, many matrices.
>>> Now, as far as I understand the VecScatter kernels, this method should
>>> only get called if the transfer is MPI_General (i.e. PtoP parallel to
>>> parallel). Other VecScatter methods are called in other circumstances
>>> where the the scatter is not MPI_General. That assumption could be
>>> wrong though.
>>>
>>
>>
>>  I see. I figured there was some logic in place to make sure that this
>> function only gets called in cases where the transfer type is MPI_General.
>> I'm getting segfaults in this function where the todata and fromdata are of
>> a different type. This could easily be user error but I'm not sure. Here is
>> an example valgrind error:
>>
>> ==27781== Invalid read of size 8
>> ==27781== at 0x1188080: VecScatterInitializeForGPU (vscatcusp.c:46)
>> ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*, _p_Vec*) (
>> mpiaijcusparse.cu:108)
>> ==27781== by 0xA20CC3: MatMult (matrix.c:2242)
>> ==27781== by 0x4645E4: main (ex7.c:93)
>> ==27781== Address 0x286305e0 is 1,616 bytes inside a block of size 1,620
>> alloc'd
>> ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
>> ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int, char const*,
>> char const*, void**) (mal.c:27)
>> ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long, int, char
>> const*, char const*, void**) (mtr.c:186)
>> ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
>> ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
>> ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*, MatAssemblyType)
>> (mpiaij.c:706)
>> ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
>> ==27781== by 0x464301: main (ex7.c:78)
>>
>> This was produced by src/ksp/ksp/tutorials/ex7.c. The command line
>> options are
>>
>> ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>>
>> In this particular case the todata is of type VecScatter_Seq_Stride and
>> fromdata is of type VecScatter_Seq_General. The complete valgrind log
>> (including configure options for petsc) is attached.
>>
>> Any comments or suggestions are appreciated.
>> Cheers,
>> Dominic
>>
>>
>>> -Paul
>>>
>>>
>>> On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser <[email protected]
>>>  <mailto:[email protected]>> wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to understand VecScatterInitializeForGPU in
>>>  src/vec/vec/utils/veccusp/__vscatcusp.c. I don't understand why
>>>
>>> this function can get away with casting the fromdata and todata in
>>> the inctx to VecScatter_MPI_General. Don't we need to inspect the
>>> VecScatterType fields of the todata and fromdata?
>>>
>>> Cheers,
>>> Dominic
>>>
>>> --
>>> Dominic Meiser
>>> Tech-X Corporation
>>> 5621 Arapahoe Avenue
>>> Boulder, CO 80303
>>> USA
>>>  Telephone: 303-996-2036 <tel:303-996-2036>
>>> Fax: 303-448-7756 <tel:303-448-7756>
>>> www.txcorp.com <http://www.txcorp.com>
>>>
>>>
>>>
>>
>>
>> --
>> Dominic Meiser
>> Tech-X Corporation
>> 5621 Arapahoe Avenue
>> Boulder, CO 80303
>> USA
>> Telephone: 303-996-2036
>> Fax: 303-448-7756
>> www.txcorp.com
>>
>>
>
>
> --
> Dominic Meiser
> Tech-X Corporation
> 5621 Arapahoe Avenue
> Boulder, CO 80303
> USA
> Telephone: 303-996-2036
> Fax: 303-448-7756www.txcorp.com
>
>

Re: [petsc-dev] VecScatterInitializeForGPU

Reply via email to