Re: [petsc-dev] VecScatterInitializeForGPU

Dominic Meiser Wed, 22 Jan 2014 10:50:59 -0800

On Wed 22 Jan 2014 10:54:28 AM MST, Paul Mullowney wrote:

Oh. You're opening a can of worms but maybe that's your intent ;) I
see the block Jacobi preconditioner in the valgrind logs.


Didn't mean to open a can of worms.

Do,
mpirun -n 1 (or 2) ./ex7 -mat_type mpiaijcusparse -vec_type mpicusp
-pc_type none


This works.

From here, we can try to sort out the VecScatterInitializeForGPU
problem when mpirun/exec is not used.
If you want to implement block jacobi preconditioner on multiple GPUs,
that's a larger problem to solve. I had some code that sort of worked.
We'd have to sit down and discuss.


I'd be really interested in learning more about this.

Cheers,
Dominic

-Paul


On Wed, Jan 22, 2014 at 10:48 AM, Dominic Meiser <[email protected]
<mailto:[email protected]>> wrote:

    Attached are the logs with 1 rank and 2 ranks. As far as I can
    tell these are different errors.

    For the log attached to the previous email I chose to run ex7
    without mpirun so that valgrind checks ex7 and not mpirun. Is
    there a way to have valgrind check the mpi processes rather than
    mpirun?

    Cheers,
    Dominic



    On 01/22/2014 10:37 AM, Paul Mullowney wrote:

    Hmmm. I may not have protected against the case where the
    mpaijcusp(arse) classes are called but without mpirun/mpiexec. I
    suppose it should have occurred to me that someone would do this.
    try :
    mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp
    In this scenario, the sequential to sequential vecscatters should
    be called.
    Then,
    mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp
    In this scenario, MPI_General vecscatters should be called ...
    and work correctly if you have a system with multiple GPUs.
    I
    -Paul


    On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser
    <[email protected] <mailto:[email protected]>> wrote:

        Hey Paul,

        Thanks for providing background on this.


        On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:


            Dominic,
            A few years ago, I was trying to minimize the amount of
            data transfer
            to and from the GPU (for multi-GPU MatMult) by inspecting
            the indices
            of the data that needed to be message to and from the
            device. Then, I
            would call gather kernels on the GPU which pulled the
            scattered data
            into contiguous buffers and then be transferred to the host
            asynchronously (while the MatMult was occurring). The
            existence of
            VecScatterInitializeForGPU was added in order to build
            the necessary
            buffers as needed. This was the motivation behind the
            existence of
            VecScatterInitializeForGPU.
            An alternative approach is to message the smallest
            contiguous buffer
            containing all the data with a single cudaMemcpyAsync.
            This is the
            method currently implemented.
            I never found a case where the former implementation
            (with a GPU
            gather-kernel) performed better than the alternative
            approach which
            messaged the smallest contiguous buffer. I looked at
            many, many matrices.
            Now, as far as I understand the VecScatter kernels, this
            method should
            only get called if the transfer is MPI_General (i.e. PtoP
            parallel to
            parallel). Other VecScatter methods are called in other
            circumstances
            where the the scatter is not MPI_General. That assumption
            could be
            wrong though.



        I see. I figured there was some logic in place to make sure
        that this function only gets called in cases where the
        transfer type is MPI_General. I'm getting segfaults in this
        function where the todata and fromdata are of a different
        type. This could easily be user error but I'm not sure. Here
        is an example valgrind error:

        ==27781== Invalid read of size 8
        ==27781== at 0x1188080: VecScatterInitializeForGPU
        (vscatcusp.c:46)
        ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*,
        _p_Vec*, _p_Vec*) (mpiaijcusparse.cu:108
        <http://mpiaijcusparse.cu:108>)
        ==27781== by 0xA20CC3: MatMult (matrix.c:2242)
        ==27781== by 0x4645E4: main (ex7.c:93)
        ==27781== Address 0x286305e0 is 1,616 bytes inside a block of
        size 1,620 alloc'd
        ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
        ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int,
        char const*, char const*, void**) (mal.c:27)
        ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long,
        int, char const*, char const*, void**) (mtr.c:186)
        ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
        ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
        ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*,
        MatAssemblyType) (mpiaij.c:706)
        ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
        ==27781== by 0x464301: main (ex7.c:78)

        This was produced by src/ksp/ksp/tutorials/ex7.c. The command
        line options are

        ./ex7 -mat_type mpiaijcusparse -vec_type cusp

        In this particular case the todata is of type
        VecScatter_Seq_Stride and fromdata is of type
        VecScatter_Seq_General. The complete valgrind log (including
        configure options for petsc) is attached.

        Any comments or suggestions are appreciated.
        Cheers,
        Dominic


            -Paul


            On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>
            wrote:

            Hi,

            I'm trying to understand VecScatterInitializeForGPU in
            src/vec/vec/utils/veccusp/__vscatcusp.c. I don't
            understand why

            this function can get away with casting the fromdata and
            todata in
            the inctx to VecScatter_MPI_General. Don't we need to
            inspect the
            VecScatterType fields of the todata and fromdata?

            Cheers,
            Dominic

            --
            Dominic Meiser
            Tech-X Corporation
            5621 Arapahoe Avenue
            Boulder, CO 80303
            USA
            Telephone: 303-996-2036 <tel:303-996-2036>
            <tel:303-996-2036 <tel:303-996-2036>>
            Fax: 303-448-7756 <tel:303-448-7756> <tel:303-448-7756
            <tel:303-448-7756>>
            www.txcorp.com <http://www.txcorp.com>
            <http://www.txcorp.com>





        --
        Dominic Meiser
        Tech-X Corporation
        5621 Arapahoe Avenue
        Boulder, CO 80303
        USA
        Telephone: 303-996-2036 <tel:303-996-2036>
        Fax: 303-448-7756 <tel:303-448-7756>
        www.txcorp.com <http://www.txcorp.com>



    --
    Dominic Meiser
    Tech-X Corporation
    5621 Arapahoe Avenue
    Boulder, CO 80303
    USA
    Telephone:303-996-2036  <tel:303-996-2036>
    Fax:303-448-7756  <tel:303-448-7756>
    www.txcorp.com  <http://www.txcorp.com>




--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com

Re: [petsc-dev] VecScatterInitializeForGPU

Reply via email to