Re: [petsc-users] PETSc with modern C++

Filippo Leonardi Tue, 04 Apr 2017 08:25:54 -0700

I really appreciate the feedback. Thanks.

That of deadlock, when the order of destruction is not preserved, is a
point I hadn't thought of. Maybe it can be cleverly addressed.


PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a
single processor, I obtain:

Example for MAXPY, with expression templates:

BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805

BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335

BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718

BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954

BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788

BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N

BM_Vector_petscxx_MAXPY_RMS 7 % 7 %

Direct call to MAXPY:

BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674

BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878

BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340

BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414

BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312

BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N

BM_Vector_PETSc_MAXPY_RMS 0 % 0 %


And 3x speedup on 2 MPI ranks (not much communication here, anyway). I am
now convinced that this warrants some further investigation/testing.


On Tue, 4 Apr 2017 at 01:08 Jed Brown <j...@jedbrown.org> wrote:

> Matthew Knepley <knep...@gmail.com> writes:
>
> >> BLAS. (Here a interesting point opens: I assume an efficient BLAS
> >>
> >> implementation, but I am not so sure about how the different BLAS do
> >> things
> >>
> >> internally. I work from the assumption that we have a very well tuned
> BLAS
> >>
> >> implementation at our disposal).
> >>
> >
> > The speed improvement comes from pulling vectors through memory fewer
> > times by merging operations (kernel fusion).
>
> Typical examples are VecMAXPY and VecMDot, but note that these are not
> xGEMV because the vectors are independent arrays rather than single
> arrays with a constant leading dimension.
>
> >> call VecGetArray. However I will inevitably foget to return the array to
> >>
> >> PETSc. I could have my new VecArray returning an object that restores
> the
> >>
> >> array
> >>
> >> when it goes out of scope. I can also flag the function with
> [[nodiscard]]
> >> to
> >>
> >> prevent the user to destroy the returned object from the start.
> >>
> >
> > Jed claims that this pattern is no longer preferred, but I have forgotten
> > his argument.
> > Jed?
>
> Destruction order matters and needs to be collective.  If an error
> condition causes destruction to occur in a different order on different
> processes, you can get deadlock.  I would much rather have an error
> leave some resources (for the OS to collect) than escalate into
> deadlock.
>
> > We have had this discussion for years on this list. Having separate names
> > for each type
> > is really ugly and does not achieve what we want. We want smooth
> > interoperability between
> > objects with different backing types, but it is still not clear how to do
> > this.
>
> Hide it internally and implicitly promote.  Only the *GetArray functions
> need to be parametrized on numeric type.  But it's a lot of work on the
> backend.
>

Re: [petsc-users] PETSc with modern C++

Reply via email to