Thank you
Yours sincerely,
TAY wee-beng
On 3/11/2015 12:45 PM, Barry Smith wrote:
On Nov 2, 2015, at 10:37 PM, TAY wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I tried :
1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg
2. -poisson_pc_type gamg
Run with -poisson_ksp_monitor_true_residual
-poisson_ksp_monitor_converged_reason
Does your poisson have Neumann boundary conditions? Do you
have any zeros on the diagonal for the matrix (you shouldn't).
There may be something wrong with your poisson
discretization that was also messing up hypre
Both options give:
1 0.00150000 0.00000000 0.00000000
1.00000000 NaN NaN NaN
M Diverged but why?, time = 2
reason = -9
How can I check what's wrong?
Thank you
Yours sincerely,
TAY wee-beng
On 3/11/2015 3:18 AM, Barry Smith wrote:
hypre is just not scaling well here. I do not know
why. Since hypre is a block box for us there is no way
to determine why the poor scaling.
If you make the same two runs with -pc_type gamg
there will be a lot more information in the log
summary about in what routines it is scaling well or
poorly.
Barry
On Nov 2, 2015, at 3:17 AM, TAY
wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have attached the 2 files.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 2:55 PM, Barry Smith wrote:
Run (158/2)x(266/2)x(150/2) grid on 8
processes and then (158)x(266)x(150) on 64
processors and send the two -log_summary results
Barry
On Nov 2, 2015, at 12:19 AM, TAY
wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have attached the new results.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 12:27 PM, Barry Smith wrote:
Run without the -momentum_ksp_view
-poisson_ksp_view and send the new results
You can see from the log summary
that the PCSetUp is taking a much
smaller percentage of the time meaning
that it is reusing the preconditioner
and not rebuilding it each time.
Barry
Something makes no sense with the
output: it gives
KSPSolve 199 1.0
2.3298e+03 1.0 5.20e+09 1.8 3.8e+04
9.9e+05 5.0e+02 90100 66100 24 90100
66100 24 165
90% of the time is in the solve but
there is no significant amount of time
in other events of the code which is
just not possible. I hope it is due to
your IO.
On Nov 1, 2015, at 10:02 PM, TAY
wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have attached the new run with
100 time steps for 48 and 96 cores.
Only the Poisson eqn 's RHS
changes, the LHS doesn't. So if I
want to reuse the preconditioner,
what must I do? Or what must I not do?
Why does the number of processes
increase so much? Is there
something wrong with my coding?
Seems to be so too for my new run.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 9:49 AM, Barry Smith
wrote:
If you are doing many time
steps with the same linear
solver then you MUST do your
weak scaling studies with MANY
time steps since the setup
time of AMG only takes place
in the first stimestep. So run
both 48 and 96 processes with
the same large number of time
steps.
Barry
On Nov 1, 2015, at 7:35
PM, TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
Hi,
Sorry I forgot and use the
old a.out. I have attached
the new log for 48cores
(log48), together with the
96cores log (log96).
Why does the number of
processes increase so
much? Is there something
wrong with my coding?
Only the Poisson eqn 's
RHS changes, the LHS
doesn't. So if I want to
reuse the preconditioner,
what must I do? Or what
must I not do?
Lastly, I only simulated 2
time steps previously. Now
I run for 10 timesteps
(log48_10). Is it building
the preconditioner at
every timestep?
Also, what about momentum
eqn? Is it working well?
I will try the gamg later too.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 12:30 AM,
Barry Smith wrote:
You used gmres with
48 processes but
richardson with 96.
You need to be careful
and make sure you
don't change the
solvers when you
change the number of
processors since you
can get very different
inconsistent results
Anyways all the
time is being spent in
the BoomerAMG
algebraic multigrid
setup and it is is
scaling badly. When
you double the problem
size and number of
processes it went from
3.2445e+01 to
4.3599e+02 seconds.
PCSetUp
3 1.0 3.2445e+01 1.0
9.58e+06 2.0 0.0e+00
0.0e+00 4.0e+00 62 8
0 0 4 62 8 0 0
5 11
PCSetUp
3 1.0 4.3599e+02 1.0
9.58e+06 2.0 0.0e+00
0.0e+00 4.0e+00 85 18
0 0 6 85 18 0 0
6 2
Now is the Poisson
problem changing at
each timestep or can
you use the same
preconditioner built
with BoomerAMG for all
the time steps?
Algebraic multigrid
has a large set up
time that you often
doesn't matter if you
have many time steps
but if you have to
rebuild it each
timestep it is too large?
You might also try
-pc_type gamg and see
how PETSc's algebraic
multigrid scales for
your problem/machine.
Barry
On Nov 1, 2015, at
7:30 AM, TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
On 1/11/2015 10:00
AM, Barry Smith wrote:
On Oct 31,
2015, at
8:43 PM,
TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
On
1/11/2015
12:47 AM,
Matthew
Knepley wrote:
On
Sat,
Oct
31,
2015
at
11:34
AM,
TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
Hi,
I
understand
that
as
mentioned
in the
faq,
due to
the
limitations
in
memory, the
scaling is
not
linear. So,
I am
trying
to
write
a
proposal
to use
a
supercomputer.
Its
specs are:
Compute nodes:
82,944
nodes
(SPARC64
VIIIfx; 16GB
of
memory
per node)
8
cores
/
processor
Interconnect:
Tofu
(6-dimensional
mesh/torus)
Interconnect
Each
cabinet contains
96
computing
nodes,
One of
the
requirement
is to
give
the
performance
of my
current code
with
my
current set
of
data,
and
there
is a
formula to
calculate
the
estimated
parallel
efficiency
when
using
the
new
large
set of
data
There
are 2
ways
to
give
performance:
1.
Strong
scaling,
which
is
defined as
how
the
elapsed time
varies
with
the
number
of
processors
for a
fixed
problem.
2.
Weak
scaling,
which
is
defined as
how
the
elapsed time
varies
with
the
number
of
processors
for a
fixed
problem size
per
processor.
I ran
my
cases
with
48 and
96
cores
with
my
current cluster,
giving
140
and 90
mins
respectively.
This
is
classified
as
strong
scaling.
Cluster specs:
CPU:
AMD
6234
2.4GHz
8
cores
/
processor
(CPU)
6 CPU
/ node
So 48
Cores
/ CPU
Not
sure
abt
the
memory
/ node
The
parallel
efficiency
‘En’
for a
given
degree
of
parallelism
‘n’
indicates
how
much
the
program is
efficiently
accelerated
by
parallel
processing.
‘En’
is
given
by the
following
formulae.
Although
their
derivation
processes
are
different
depending
on
strong
and
weak
scaling,
derived formulae
are the
same.
From
the
estimated
time,
my
parallel
efficiency
using
Amdahl's
law on
the
current old
cluster was
52.7%.
So is
my
results
acceptable?
For
the
large
data
set,
if
using
2205
nodes
(2205X8cores),
my
expected
parallel
efficiency
is
only
0.5%.
The
proposal
recommends
value
of > 50%.
The
problem with
this
analysis
is
that
the
estimated
serial
fraction
from
Amdahl's
Law
changes as
a function
of
problem size,
so you
cannot
take
the
strong
scaling from
one
problem and
apply
it to
another without
a
model
of
this
dependence.
Weak
scaling does
model
changes with
problem size,
so I
would
measure weak
scaling on
your
current
cluster,
and
extrapolate
to the
big
machine.
I
realize that
this
does
not
make
sense
for
many
scientific
applications,
but
neither does
requiring
a
certain parallel
efficiency.
Ok I check
the
results
for my
weak
scaling it
is even
worse for
the
expected
parallel
efficiency. From
the
formula
used, it's
obvious
it's doing
some sort
of
exponential
extrapolation
decrease.
So unless
I can
achieve a
near > 90%
speed up
when I
double the
cores and
problem
size for
my current
48/96
cores
setup,
extrapolating
from about
96 nodes
to 10,000
nodes will
give a
much lower
expected
parallel
efficiency
for the
new case.
However,
it's
mentioned
in the FAQ
that due
to memory
requirement,
it's
impossible
to get
>90% speed
when I
double the
cores and
problem
size (ie
linear
increase
in
performance),
which
means that
I can't
get >90%
speed up
when I
double the
cores and
problem
size for
my current
48/96
cores
setup. Is
that so?
What is the
output of
-ksp_view
-log_summary
on the problem
and then on
the problem
doubled in
size and
number of
processors?
Barry
Hi,
I have attached
the output
48 cores: log48
96 cores: log96
There are 2
solvers - The
momentum linear
eqn uses bcgs,
while the Poisson
eqn uses hypre
BoomerAMG.
Problem size
doubled from
158x266x150 to
158x266x300.
So is it
fair to
say that
the main
problem
does not
lie in my
programming skills,
but rather
the way
the linear
equations
are solved?
Thanks.
Thanks,
Matt
Is it
possible
for
this
type
of
scaling in
PETSc
(>50%), when
using
17640
(2205X8)
cores?
Btw, I
do not
have
access
to the
system.
Sent
using
CloudMagic
Email
--
What
most
experimenters
take
for
granted before
they
begin
their
experiments
is
infinitely
more
interesting
than
any
results to
which
their
experiments
lead.
--
Norbert Wiener
<log48.txt><log96.txt>
<log48_10.txt><log48.txt><log96.txt>
<log96_100.txt><log48_100.txt>
<log96_100_2.txt><log48_100_2.txt>
<log64_100.txt><log8_100.txt>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener