@jed: You assembly is what I would've expected. Let me simplify my code and
see if I can provide a useful test example. (also: I assume your assembly
is for xeon, so I should definitely use avx512).
Let me get back at you in a few days (work permitting) with something you
can use.
>From your
On Tue, Apr 4, 2017 at 9:10 PM, Jed Brown wrote:
> Barry Smith writes:
>
> >These results seem reasonable to me.
> >
> >What makes you think that KNL should be doing better than it does in
> comparison to Haswell?
> >
> >The entire reason for
> On Apr 4, 2017, at 11:10 PM, Jed Brown wrote:
>
> Barry Smith writes:
>
>> These results seem reasonable to me.
>>
>> What makes you think that KNL should be doing better than it does in
>> comparison to Haswell?
>>
>> The entire reason for
Barry Smith writes:
>These results seem reasonable to me.
>
>What makes you think that KNL should be doing better than it does in
> comparison to Haswell?
>
>The entire reason for the existence of KNL is that it is a way for
>Intel to be able to "compete"
Justin Chang writes:
> So I tried the following options:
>
> -M 40
> -N 40
> -P 5
> -da_refine 1/2/3/4
> -log_view
> -mg_coarse_pc_type gamg
> -mg_levels_0_pc_type gamg
> -mg_levels_1_sub_pc_type cholesky
> -pc_type mg
> -thi_mat_type baij
>
> Performance improved
These results seem reasonable to me.
What makes you think that KNL should be doing better than it does in
comparison to Haswell?
The entire reason for the existence of KNL is that it is a way for Intel to
be able to "compete" with Nvidia GPUs for numerics and data processing, for
Matthew Knepley writes:
> On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote:
>
>> Matthew Knepley writes:
>>
>> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > >
>> > wrote:
>> >
>> >> I had weird issues
On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote:
> Matthew Knepley writes:
>
> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi >
> > wrote:
> >
> >> I had weird issues where gcc (that I am using for my tests right now)
> >>
Matthew Knepley writes:
> On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi
> wrote:
>
>> I had weird issues where gcc (that I am using for my tests right now)
>> wasn't vectorising properly (even enabling all flags, from tree-vectorize,
>> to mavx).
> On Apr 2, 2017, at 2:15 PM, Filippo Leonardi wrote:
>
>
> Hello,
>
> I have a project in mind and seek feedback.
>
> Disclaimer: I hope I am not abusing of this mailing list with this idea. If
> so, please ignore.
>
> As a thought experiment, and to have a bit of
> On Apr 3, 2017, at 10:05 AM, Jed Brown wrote:
>
> Barry Smith writes:
>
>>
>> SNESGetUsingInternalMatMFFD(snes,); Then you can get rid of the
>> horrible
>>
>> PetscBool flg;
>> ierr =
>>
On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi
wrote:
> I had weird issues where gcc (that I am using for my tests right now)
> wasn't vectorising properly (even enabling all flags, from tree-vectorize,
> to mavx). According to my tests, I know the Intel compiler was a
Justin Chang writes:
> Attached are the job output files (which include -log_view) for SNES ex48
> run on a single haswell and knl node (32 and 64 cores respectively).
> Started off with a coarse grid of size 40x40x5 and ran three different
> tests with -da_refine 1/2/3 and
I had weird issues where gcc (that I am using for my tests right now)
wasn't vectorising properly (even enabling all flags, from tree-vectorize,
to mavx). According to my tests, I know the Intel compiler was a bit better
at that.
I actually did not know PETSc was doing some unrolling himself. On
Hey,
here's some data on what you should see with STREAM when comparing
against conventional XEONs:
https://www.karlrupp.net/2016/07/knights-landing-vs-knights-corner-haswell-ivy-bridge-and-sandy-bridge-stream-benchmark-results/
Note that MCDRAM only pays off if you can keep enough cores
Ingo Gaertner writes:
> We have never talked about Riemann solvers in our CFD course, and I don't
> understand what's going on in ex11.
> However, if you could answer a few of my questions, you'll give me a good
> start with PETSc. For the simple poisson problem that
Justin Chang writes:
> Thanks everyone for the helpful advice. So I tried all the suggestions
> including using libsci. The performance did not improve for my particular
> runs, which I think suggests the problem parameters chosen for my tests
> (SNES ex48) are not optimal
Attached are the job output files (which include -log_view) for SNES ex48
run on a single haswell and knl node (32 and 64 cores respectively).
Started off with a coarse grid of size 40x40x5 and ran three different
tests with -da_refine 1/2/3 and -pc_type mg
What's interesting/strange is that if i
I did some quick tests (with a different example) on a single KNL node and a
single Haswell node, both using 4 processes. Check below for the results about
MatMult. And the total running time on KNL is a bit more than two times of that
on Haswell. So I think the results Justin got with SNE ex48
On Tue, Apr 4, 2017 at 1:19 PM, Filippo Leonardi
wrote:
> You are in fact right, it is the same speedup of approximatively 2.5x
> (with 2 ranks), my brain rounded up to 3. (This was just a test done in 10
> min on my Workstation, so no pretence to be definite, I just
> On Apr 4, 2017, at 1:24 AM, Wenbo Zhao wrote:
>
> Barry,
>
> Thanks.
>
> It is my fault. I should not mix the VecScatter and MatSetValues.
>
> 1. Matrix assemble
> There are only two options matrix for case with rotation boundary.
> The first is using
There shouldn't be any additional issue with the petsc4py wrapper. We do
this all the time. In fact, it's generally best to use the petsc4py to do
the initialization of petsc at the very top of your highest level python
script. You'll need to do this anyway if you want to use command line
Hello all,
Another question in a fairly long line of questions from me. Thank you to
this community for all the help I've gotten.
I have a Fortran/PETSc-based code that, with the help of f2py and some of
you, I have compiled into a python module (we'll call it pc_fort_mod). So I
can now
You are in fact right, it is the same speedup of approximatively 2.5x
(with 2 ranks), my brain rounded up to 3. (This was just a test done in 10
min on my Workstation, so no pretence to be definite, I just wanted to have
an indication).
As you say, I am using OpenBLAS, so I wouldn't be surprised
2017-04-03 23:58 GMT+02:00 Matthew Knepley :
> There are no tutorials, and almost no documentation.
>
Uhh, I'm not sure whether it makes sense for me to use PETSc then.
> The best thing to look at is TS ex11. This solves a bunch of different
> equations
> (advection, shallow
MAXPY isn't really a BLAS 1 since it can reuse some data in certain vectors.
> On Apr 4, 2017, at 10:25 AM, Filippo Leonardi wrote:
>
> I really appreciate the feedback. Thanks.
>
> That of deadlock, when the order of destruction is not preserved, is a point
> I
> Does this mean that GAMG works for the symmetrical matrix only?
No, it means that for non symmetric nonzero structure you need the extra
flag. So use the extra flag. The reason we don't always use the flag is because
it adds extra cost and isn't needed if the matrix already has a symmetric
Hi All,
I am using GAMG to solve a group of coupled diffusion equations, but the
resulting matrix is not symmetrical. I got the following error messages:
*[0]PETSC ERROR: Petsc has generated inconsistent data[0]PETSC ERROR: Have
un-symmetric graph (apparently). Use
On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang wrote:
> Thanks everyone for the helpful advice. So I tried all the suggestions
> including using libsci. The performance did not improve for my particular
> runs, which I think suggests the problem parameters chosen for my tests
Thanks everyone for the helpful advice. So I tried all the suggestions
including using libsci. The performance did not improve for my particular
runs, which I think suggests the problem parameters chosen for my tests
(SNES ex48) are not optimal for KNL. Does anyone have example test runs I
could
On Tue, Apr 4, 2017 at 10:25 AM, Filippo Leonardi
wrote:
> I really appreciate the feedback. Thanks.
>
> That of deadlock, when the order of destruction is not preserved, is a
> point I hadn't thought of. Maybe it can be cleverly addressed.
>
> PS: If you are interested,
I really appreciate the feedback. Thanks.
That of deadlock, when the order of destruction is not preserved, is a
point I hadn't thought of. Maybe it can be cleverly addressed.
PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a
single processor, I obtain:
Example for
Ah ok. When I find the time I will have a look into mapping processes to
cores. I guess it is possible using the torque scheduler.
Thank you!
On Tue, Apr 4, 2017 at 2:00 PM Matthew Knepley wrote:
> On Tue, Apr 4, 2017 at 6:58 AM, Toon Weyens wrote:
>
On Tue, Apr 4, 2017 at 6:58 AM, Toon Weyens wrote:
> Dear Matthew,
>
> Thanks for your answer, but this is something I do not really know much
> about... The node I used has 12 cores and about 24GB of RAM.
>
> But for these test cases, isn't the distribution of memory over
Dear Matthew,
Thanks for your answer, but this is something I do not really know much
about... The node I used has 12 cores and about 24GB of RAM.
But for these test cases, isn't the distribution of memory over cores
handled automatically by SLEPC?
Regards
On Tue, Apr 4, 2017 at 1:40 PM
On Tue, Apr 4, 2017 at 2:20 AM, Toon Weyens wrote:
> Dear Jose and Matthew,
>
> Thank you so much for the effort!
>
> I still don't manage to converge using the range interval technique to
> filter out the positive eigenvalues, but using shift-invert combined with a
>
Dear Jose and Matthew,
Thank you so much for the effort!
I still don't manage to converge using the range interval technique to
filter out the positive eigenvalues, but using shift-invert combined with a
target eigenvalue does true miracles. I get extremely fast convergence.
The truth of the
Barry,
Thanks.
It is my fault. I should not mix the VecScatter and MatSetValues.
1. Matrix assemble
There are only two options matrix for case with rotation boundary.
The first is using
"MatSetoption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE)".
The second is to create matrix by hand.
Is it
38 matches
Mail list logo