Re: [petsc-dev] Broken MatMatMult_MPIAIJ_MPIDense

Smith, Barry F. via petsc-dev Sun, 22 Sep 2019 11:27:36 -0700

   The previous code to perform the multiplies was low performance and didn't 
require any "setup", the new code requires a bit more setup so it is reasonably 
that the requirements have changed.


   Unfortunately we didn't have a test case in PETSc so the new requirements 
passed the tests without notice that they broke a supported case.


> On Sep 22, 2019, at 1:16 PM, Stefano Zampini <stefano.zamp...@gmail.com> 
> wrote:
> 
> it's been always a valid use case to pass an already allocated DENSE matrix 
> with MAT_REUSE_MATRIX
> Right above MatMatMultNumeric_MPIDense there's a quite explicit comment that 
> appears to be not valid anymore
> /*
>     This is a "dummy function" that handles the case where matrix C was 
> created as a dense matrix
>   directly by the user and passed to MatMatMult() with the MAT_REUSE_MATRIX 
> option
> 
>   It is the same as MatMatMultSymbolic_MPIAIJ_MPIDense() except does not 
> create C
> */
> 
> 
> Il giorno dom 22 set 2019 alle ore 20:11 Smith, Barry F. via petsc-dev 
> <petsc-dev@mcs.anl.gov> ha scritto:
> 
>    Jose,
> 
>      Thanks for the pointer. 
> 
>      Will this change dramatically affect the organization of SLEPc? As noted 
> in my previous email eventually we need to switch to a new API where the 
> REUSE with a different matrix is even more problematic.
> 
>       If you folks have use cases that fundamentally require reusing a 
> previous matrix instead of destroying and getting a new one created we will 
> need to think about additional features in the API that would allow this 
> reusing of an array. But it seems to me that destroying the old matrix and 
> using the initial call to create the matrix should be ok and just require 
> relatively minor changes to your codes?
> 
>   Barry
> 
> 
> 
> 
> > On Sep 22, 2019, at 11:55 AM, Jose E. Roman <jro...@dsic.upv.es> wrote:
> > 
> > The man page of MatMatMult says:
> > "In the special case where matrix B (and hence C) are dense you can create 
> > the correctly sized matrix C yourself and then call this routine with 
> > MAT_REUSE_MATRIX, rather than first having MatMatMult() create it for you."
> > 
> > If you are going to change the usage, don't forget to remove this sentence. 
> > This use case is what we use in SLEPc and is now causing trouble.
> > Jose
> > 
> > 
> > 
> >> El 22 sept 2019, a las 18:49, Pierre Jolivet via petsc-dev 
> >> <petsc-dev@mcs.anl.gov> escribió:
> >> 
> >> 
> >>> On 22 Sep 2019, at 6:33 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> >>> 
> >>> 
> >>> Ok. So we definitely need better error checking and to clean up the code, 
> >>> comments and docs 
> >>> 
> >>> As the approaches for these computations of products get more complicated 
> >>> it becomes a bit harder to support the use of a raw product matrix so I 
> >>> don't think we want to add all the code needed to call the symbolic part 
> >>> (after the fact) when the matrix is raw.
> >> 
> >> To the best of my knowledge, there is only a single method (not taking MR 
> >> 2069 into account) that uses a MPIDense B and for which these approaches 
> >> are necessary, so it’s not like there is a hundred of code paths to fix, 
> >> but I understand your point.
> >> 
> >>> Would that make things terribly difficult for you not being able to use a 
> >>> raw matrix?
> >> 
> >> Definitely not, but that would require some more memory + one copy after 
> >> the MatMatMult (depending on the size of your block Krylov space, that can 
> >> be quite large, and that defeats the purpose of MR 2032 of being more 
> >> memory efficient).
> >> (BTW, I now remember that I’ve been using this “feature” since our SC16 
> >> paper on block Krylov methods)
> >> 
> >>> I suspect that the dense case was just lucky that using a raw matrix 
> >>> worked. 
> >> 
> >> I don’t think so, this is clearly the intent of MatMatMultNumeric_MPIDense 
> >> (vs. MatMatMultNumeric_MPIAIJ_MPIDense).
> >> 
> >>> The removal of the de facto support for REUSE on the raw matrix should be 
> >>> added to the changes document.
> >>> 
> >>> Sorry for the difficulties. We have trouble testing all the combinations 
> >>> of possible usage, even a coverage tool would not have indicated a 
> >>> problems the lack of lda support. 
> >> 
> >> No problem!
> >> 
> >> Thank you,
> >> Pierre
> >> 
> >>> Hong,
> >>> 
> >>>   Can you take a look at these things on Monday and maybe get a clean 
> >>> into a MR so it gets into the release?
> >>> 
> >>> Thanks
> >>> 
> >>> 
> >>> Barry
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>> On Sep 22, 2019, at 11:12 AM, Pierre Jolivet 
> >>>> <pierre.joli...@enseeiht.fr> wrote:
> >>>> 
> >>>> 
> >>>>> On 22 Sep 2019, at 6:03 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>> On Sep 22, 2019, at 10:14 AM, Pierre Jolivet via petsc-dev 
> >>>>>> <petsc-dev@mcs.anl.gov> wrote:
> >>>>>> 
> >>>>>> FWIW, I’ve fixed MatMatMult and MatTransposeMatMult here 
> >>>>>> https://gitlab.com/petsc/petsc/commit/93d7d1d6d29b0d66b5629a261178b832a925de80
> >>>>>>  (with MAT_INITIAL_MATRIX).
> >>>>>> I believe there is something not right in your MR (2032) with 
> >>>>>> MAT_REUSE_MATRIX (without having called MAT_INITIAL_MATRIX first), cf. 
> >>>>>> https://gitlab.com/petsc/petsc/merge_requests/2069#note_220269898.
> >>>>>> Of course, I’d love to be proved wrong!
> >>>>> 
> >>>>> I don't understand the context.
> >>>>> 
> >>>>> MAT_REUSE_MATRIX requires that the C matrix has come from a previous 
> >>>>> call with MAT_INITIAL_MATRIX, you cannot just put any matrix in the C 
> >>>>> location.
> >>>> 
> >>>> 1) It was not the case before the MR, I’ve used that “feature” (which 
> >>>> may be specific for MatMatMult_MPIAIJ_MPIDense) for as long as I can 
> >>>> remember
> >>>> 2) If it is not the case anymore, I think it should be mentioned 
> >>>> somewhere (and not only in the git log, because I don’t think all users 
> >>>> will go through that)
> >>>> 3) This comment should be removed from the code as well: 
> >>>> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line398
> >>>> 
> >>>>> This is documented in the manual page. We should have better error 
> >>>>> checking that this is the case so the code doesn't crash at memory 
> >>>>> access but instead produces a very useful error message if the matrix 
> >>>>> was not obtained with MAT_INITIAL_MATRIX. 
> >>>>> 
> >>>>> Is this the issue or do I not understand?
> >>>> 
> >>>> This is exactly the issue.
> >>>> 
> >>>>> Barry
> >>>>> 
> >>>>> BTW: yes MAT_REUSE_MATRIX has different meanings for different matrix 
> >>>>> operations in terms of where the matrix came from, this is suppose to 
> >>>>> be all documented in each methods manual page but some may be missing 
> >>>>> or incomplete, and error checking is probably not complete for all 
> >>>>> cases.  Perhaps the code should be changed to have multiple different 
> >>>>> names for each reuse case for clarity to user?
> >>>> 
> >>>> Definitely, cf. above.
> >>>> 
> >>>> Thanks,
> >>>> Pierre
> >>>> 
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> Pierre
> >>>>>> 
> >>>>>>> On 22 Sep 2019, at 5:04 PM, Zhang, Hong <hzh...@mcs.anl.gov> wrote:
> >>>>>>> 
> >>>>>>> I'll check it tomorrow.
> >>>>>>> Hong
> >>>>>>> 
> >>>>>>> On Sun, Sep 22, 2019 at 1:04 AM Pierre Jolivet via petsc-dev 
> >>>>>>> <petsc-dev@mcs.anl.gov> wrote:
> >>>>>>> Jed,
> >>>>>>> I’m not sure how easy it is to put more than a few lines of code on 
> >>>>>>> GitLab, so I’ll just send the (tiny) source here, as a follow-up of 
> >>>>>>> our discussion 
> >>>>>>> https://gitlab.com/petsc/petsc/merge_requests/2069#note_220229648.
> >>>>>>> Please find attached a .cpp showing the brokenness of C=A*B with A of 
> >>>>>>> type MPIAIJ and B of type MPIDense when the LDA of B is not equal to 
> >>>>>>> its number of local rows.
> >>>>>>> It does [[1,1];[1,1]] * [[0,1,2,3];[0,1,2,3]]
> >>>>>>> C should be equal to 2*B, but it’s not, unless lda = m (= 1).
> >>>>>>> Mat Object: 2 MPI processes
> >>>>>>> type: mpidense
> >>>>>>> 0.0000000000000000e+00 1.0000000000000000e+00 2.0000000000000000e+00 
> >>>>>>> 3.0000000000000000e+00
> >>>>>>> 0.0000000000000000e+00 1.0000000000000000e+00 2.0000000000000000e+00 
> >>>>>>> 3.0000000000000000e+00
> >>>>>>> 
> >>>>>>> If you change Bm here 
> >>>>>>> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line549
> >>>>>>>  to the LDA of B, you’ll get the correct result.
> >>>>>>> Mat Object: 2 MPI processes
> >>>>>>> type: mpidense
> >>>>>>> 0.0000000000000000e+00 2.0000000000000000e+00 4.0000000000000000e+00 
> >>>>>>> 6.0000000000000000e+00
> >>>>>>> 0.0000000000000000e+00 2.0000000000000000e+00 4.0000000000000000e+00 
> >>>>>>> 6.0000000000000000e+00
> >>>>>>> 
> >>>>>>> Unfortunately, w.r.t. MR 2069, I still don’t get the same results 
> >>>>>>> with a plain view LDA > m (KO) and a view + duplicate LDA = m (OK).
> >>>>>>> So there might be something else to fix (or this might not even be a 
> >>>>>>> correct fix), but the only reproducer I have right now is the full 
> >>>>>>> solver.
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Pierre
> >>>>>>> 
> >>>>>> 
> >>>>> 
> >>>> 
> >>> 
> >> 
> > 
> 
> 
> 
> -- 
> Stefano

Re: [petsc-dev] Broken MatMatMult_MPIAIJ_MPIDense

Reply via email to