Awesome, I appreciate the feedback of all of you.
@Karli: Just to be clear: my original idea is to wrap up and make PETSc C++
bindings in a clean C++-ish way. Then an added value would be expression
templates ("easy" to implement once you have a clear infrastructure), with
all the benefits and
Hi Filippo,
did you compile PETSc with the same level of optimization than your
template code? In particular, did you turn debugging off for the timings?
Either way, let me share some of the mental stages I've gone through
with ViennaCL. It started out with the same premises as you provide:
@jed: You assembly is what I would've expected. Let me simplify my code and
see if I can provide a useful test example. (also: I assume your assembly
is for xeon, so I should definitely use avx512).
Let me get back at you in a few days (work permitting) with something you
can use.
>From your
Matthew Knepley writes:
> On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote:
>
>> Matthew Knepley writes:
>>
>> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > >
>> > wrote:
>> >
>> >> I had weird issues
On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote:
> Matthew Knepley writes:
>
> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi >
> > wrote:
> >
> >> I had weird issues where gcc (that I am using for my tests right now)
> >>
Matthew Knepley writes:
> On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi
> wrote:
>
>> I had weird issues where gcc (that I am using for my tests right now)
>> wasn't vectorising properly (even enabling all flags, from tree-vectorize,
>> to mavx).
> On Apr 2, 2017, at 2:15 PM, Filippo Leonardi wrote:
>
>
> Hello,
>
> I have a project in mind and seek feedback.
>
> Disclaimer: I hope I am not abusing of this mailing list with this idea. If
> so, please ignore.
>
> As a thought experiment, and to have a bit of
On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi
wrote:
> I had weird issues where gcc (that I am using for my tests right now)
> wasn't vectorising properly (even enabling all flags, from tree-vectorize,
> to mavx). According to my tests, I know the Intel compiler was a
I had weird issues where gcc (that I am using for my tests right now)
wasn't vectorising properly (even enabling all flags, from tree-vectorize,
to mavx). According to my tests, I know the Intel compiler was a bit better
at that.
I actually did not know PETSc was doing some unrolling himself. On
On Tue, Apr 4, 2017 at 1:19 PM, Filippo Leonardi
wrote:
> You are in fact right, it is the same speedup of approximatively 2.5x
> (with 2 ranks), my brain rounded up to 3. (This was just a test done in 10
> min on my Workstation, so no pretence to be definite, I just
You are in fact right, it is the same speedup of approximatively 2.5x
(with 2 ranks), my brain rounded up to 3. (This was just a test done in 10
min on my Workstation, so no pretence to be definite, I just wanted to have
an indication).
As you say, I am using OpenBLAS, so I wouldn't be surprised
MAXPY isn't really a BLAS 1 since it can reuse some data in certain vectors.
> On Apr 4, 2017, at 10:25 AM, Filippo Leonardi wrote:
>
> I really appreciate the feedback. Thanks.
>
> That of deadlock, when the order of destruction is not preserved, is a point
> I
On Tue, Apr 4, 2017 at 10:25 AM, Filippo Leonardi
wrote:
> I really appreciate the feedback. Thanks.
>
> That of deadlock, when the order of destruction is not preserved, is a
> point I hadn't thought of. Maybe it can be cleverly addressed.
>
> PS: If you are interested,
I really appreciate the feedback. Thanks.
That of deadlock, when the order of destruction is not preserved, is a
point I hadn't thought of. Maybe it can be cleverly addressed.
PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a
single processor, I obtain:
Example for
Matthew Knepley writes:
>> BLAS. (Here a interesting point opens: I assume an efficient BLAS
>>
>> implementation, but I am not so sure about how the different BLAS do
>> things
>>
>> internally. I work from the assumption that we have a very well tuned BLAS
>>
>>
On Mon, Apr 3, 2017 at 11:45 AM, Filippo Leonardi
wrote:
> On Monday, 3 April 2017 02:00:53 CEST you wrote:
>
> > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi >
>
> >
>
> > wrote:
>
> > > Hello,
>
> > >
>
> > > I have a project in mind and
Hi Filippo,
to recompile Petsc twice is easy.
The difficulty is that in both libraries there will be the same symbols
for double and double complex functions.
If they were a part of a C++ namespaces, then it would be easier.
Michael.
On 04/03/2017 12:45 PM, Filippo Leonardi wrote:
On Monday,
On Monday, 3 April 2017 02:00:53 CEST you wrote:
> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi
>
> wrote:
> > Hello,
> >
> > I have a project in mind and seek feedback.
> >
> > Disclaimer: I hope I am not abusing of this mailing list with this idea.
> > If
Hello Filippo,
we had to write a wrapper around Petsc to use both double and double
complex functions in the same code.
We achieved it by creating two shared object libraries and hiding Petsc
symbols.
Once we had to achieve it for a statically linked executable, this was
really painful, we
Hello,
I have a project in mind and seek feedback.
Disclaimer: I hope I am not abusing of this mailing list with this idea. If
so, please ignore.
As a thought experiment, and to have a bit of fun, I am currently
writing/thinking on writing, a small (modern) C++ wrapper around PETSc.
Premise:
20 matches
Mail list logo