Thank you
Yours sincerely,
TAY wee-beng
On 3/11/2015 12:45 PM, Barry Smith wrote:
On Nov 2, 2015, at 10:37 PM, TAY
wee-beng<[email protected] <mailto:[email protected]>>
wrote:
Hi,
I tried :
1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg
2. -poisson_pc_type gamg
Run with -poisson_ksp_monitor_true_residual
-poisson_ksp_monitor_converged_reason
Does your poisson have Neumann boundary conditions? Do
you have any zeros on the diagonal for the matrix (you
shouldn't).
There may be something wrong with your poisson
discretization that was also messing up hypre
Both options give:
1 0.00150000 0.00000000 0.00000000
1.00000000 NaN NaN NaN
M Diverged but why?, time = 2
reason = -9
How can I check what's wrong?
Thank you
Yours sincerely,
TAY wee-beng
On 3/11/2015 3:18 AM, Barry Smith wrote:
hypre is just not scaling well here. I do not
know why. Since hypre is a block box for us there
is no way to determine why the poor scaling.
If you make the same two runs with -pc_type
gamg there will be a lot more information in the
log summary about in what routines it is scaling
well or poorly.
Barry
On Nov 2, 2015, at 3:17 AM, TAY
wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have attached the 2 files.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 2:55 PM, Barry Smith wrote:
Run (158/2)x(266/2)x(150/2) grid on 8
processes and then (158)x(266)x(150) on
64 processors and send the two
-log_summary results
Barry
On Nov 2, 2015, at 12:19 AM, TAY
wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have attached the new results.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 12:27 PM, Barry Smith wrote:
Run without the
-momentum_ksp_view
-poisson_ksp_view and send the
new results
You can see from the log
summary that the PCSetUp is
taking a much smaller percentage
of the time meaning that it is
reusing the preconditioner and
not rebuilding it each time.
Barry
Something makes no sense with
the output: it gives
KSPSolve 199 1.0
2.3298e+03 1.0 5.20e+09 1.8
3.8e+04 9.9e+05 5.0e+02 90100
66100 24 90100 66100 24 165
90% of the time is in the solve
but there is no significant
amount of time in other events of
the code which is just not
possible. I hope it is due to
your IO.
On Nov 1, 2015, at 10:02 PM,
TAY wee-beng<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have attached the new run
with 100 time steps for 48
and 96 cores.
Only the Poisson eqn 's RHS
changes, the LHS doesn't. So
if I want to reuse the
preconditioner, what must I
do? Or what must I not do?
Why does the number of
processes increase so much?
Is there something wrong with
my coding? Seems to be so too
for my new run.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 9:49 AM, Barry
Smith wrote:
If you are doing many
time steps with the same
linear solver then you
MUST do your weak scaling
studies with MANY time
steps since the setup
time of AMG only takes
place in the first
stimestep. So run both 48
and 96 processes with the
same large number of time
steps.
Barry
On Nov 1, 2015, at
7:35 PM, TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
Hi,
Sorry I forgot and
use the old a.out. I
have attached the new
log for 48cores
(log48), together
with the 96cores log
(log96).
Why does the number
of processes increase
so much? Is there
something wrong with
my coding?
Only the Poisson eqn
's RHS changes, the
LHS doesn't. So if I
want to reuse the
preconditioner, what
must I do? Or what
must I not do?
Lastly, I only
simulated 2 time
steps previously. Now
I run for 10
timesteps (log48_10).
Is it building the
preconditioner at
every timestep?
Also, what about
momentum eqn? Is it
working well?
I will try the gamg
later too.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 12:30
AM, Barry Smith wrote:
You used gmres
with 48 processes
but richardson
with 96. You need
to be careful and
make sure you
don't change the
solvers when you
change the number
of processors
since you can get
very different
inconsistent results
Anyways all
the time is being
spent in the
BoomerAMG
algebraic
multigrid setup
and it is is
scaling badly.
When you double
the problem size
and number of
processes it went
from 3.2445e+01
to 4.3599e+02
seconds.
PCSetUp 3 1.0
3.2445e+01 1.0
9.58e+06 2.0
0.0e+00 0.0e+00
4.0e+00 62 8 0
0 4 62 8 0 0
5 11
PCSetUp 3 1.0
4.3599e+02 1.0
9.58e+06 2.0
0.0e+00 0.0e+00
4.0e+00 85 18 0
0 6 85 18 0 0
6 2
Now is the
Poisson problem
changing at each
timestep or can
you use the same
preconditioner
built with
BoomerAMG for all
the time steps?
Algebraic
multigrid has a
large set up time
that you often
doesn't matter if
you have many
time steps but if
you have to
rebuild it each
timestep it is
too large?
You might also
try -pc_type gamg
and see how
PETSc's algebraic
multigrid scales
for your
problem/machine.
Barry
On Nov 1,
2015, at 7:30
AM, TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
On 1/11/2015
10:00 AM,
Barry Smith
wrote:
On
Oct
31,
2015,
at
8:43
PM,
TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
On
1/11/2015
12:47
AM,
Matthew
Knepley
wrote:
On Sat,
Oct
31,
2015
at 11:34
AM,
TAY
wee-beng<[email protected]
<mailto:[email protected]>>
wrote:
Hi,
I
understand
that
as mentioned
in the
faq,
due
to the
limitations
in memory,
the
scaling
is not
linear.
So,
I
am trying
to write
a
proposal
to use
a
supercomputer.
Its
specs
are:
Compute
nodes:
82,944
nodes
(SPARC64
VIIIfx;
16GB
of memory
per
node)
8
cores
/
processor
Interconnect:
Tofu
(6-dimensional
mesh/torus)
Interconnect
Each
cabinet
contains
96 computing
nodes,
One
of the
requirement
is to
give
the
performance
of my
current
code
with
my current
set
of data,
and
there
is a
formula
to calculate
the
estimated
parallel
efficiency
when
using
the
new
large
set
of data
There
are
2
ways
to give
performance:
1. Strong
scaling,
which
is defined
as how
the
elapsed
time
varies
with
the
number
of
processors
for
a
fixed
problem.
2. Weak
scaling,
which
is defined
as how
the
elapsed
time
varies
with
the
number
of
processors
for a
fixed
problem
size
per
processor.
I
ran
my cases
with
48 and
96 cores
with
my current
cluster,
giving
140
and
90 mins
respectively.
This
is
classified
as strong
scaling.
Cluster
specs:
CPU:
AMD
6234
2.4GHz
8
cores
/
processor
(CPU)
6
CPU
/
node
So 48
Cores
/ CPU
Not
sure
abt
the
memory
/
node
The
parallel
efficiency
‘En’
for
a
given
degree
of
parallelism
‘n’
indicates
how
much
the
program
is
efficiently
accelerated
by parallel
processing.
‘En’
is given
by the
following
formulae.
Although
their
derivation
processes
are
different
depending
on strong
and
weak
scaling,
derived
formulae
are
the
same.
From
the
estimated
time,
my parallel
efficiency
using
Amdahl's
law
on the
current
old
cluster
was
52.7%.
So is
my results
acceptable?
For
the
large
data
set,
if using
2205
nodes
(2205X8cores),
my expected
parallel
efficiency
is only
0.5%.
The
proposal
recommends
value
of >
50%.
The
problem
with
this
analysis
is that
the
estimated
serial
fraction
from
Amdahl's
Law
changes
as a
function
of problem
size,
so you
cannot
take
the
strong
scaling
from
one
problem
and
apply
it to
another
without
a
model
of this
dependence.
Weak
scaling
does
model
changes
with
problem
size,
so I
would
measure
weak
scaling
on your
current
cluster,
and
extrapolate
to the
big
machine.
I
realize
that
this
does
not
make
sense
for
many
scientific
applications,
but
neither
does
requiring
a
certain
parallel
efficiency.
Ok I
check
the
results
for
my
weak
scaling
it is
even
worse
for
the
expected
parallel
efficiency.
From
the
formula
used,
it's
obvious
it's
doing
some
sort
of
exponential
extrapolation
decrease.
So
unless I
can
achieve
a
near
> 90%
speed
up
when
I
double the
cores
and
problem
size
for
my
current
48/96
cores
setup,
extrapolating
from
about
96
nodes
to
10,000 nodes
will
give
a
much
lower
expected
parallel
efficiency
for
the
new case.
However,
it's
mentioned
in
the
FAQ
that
due
to
memory
requirement,
it's
impossible
to
get
>90%
speed
when
I
double the
cores
and
problem
size
(ie
linear increase
in
performance),
which
means
that
I
can't
get
>90%
speed
up
when
I
double the
cores
and
problem
size
for
my
current
48/96
cores
setup. Is
that so?
What
is the
output of
-ksp_view
-log_summary
on the
problem
and then
on the
problem
doubled
in size
and
number of
processors?
Barry
Hi,
I have
attached the
output
48 cores: log48
96 cores: log96
There are 2
solvers - The
momentum
linear eqn
uses bcgs,
while the
Poisson eqn
uses hypre
BoomerAMG.
Problem size
doubled from
158x266x150
to 158x266x300.
So is
it
fair
to
say
that
the
main
problem
does
not
lie
in my
programming
skills,
but
rather the
way
the
linear equations
are
solved?
Thanks.
Thanks,
Matt
Is it
possible
for
this
type
of scaling
in PETSc
(>50%),
when
using
17640
(2205X8)
cores?
Btw,
I
do not
have
access
to the
system.
Sent
using
CloudMagic
Email
--
What
most
experimenters
take
for
granted
before
they
begin
their
experiments
is
infinitely
more
interesting
than
any
results
to which
their
experiments
lead.
-- Norbert
Wiener
<log48.txt><log96.txt>
<log48_10.txt><log48.txt><log96.txt>
<log96_100.txt><log48_100.txt>
<log96_100_2.txt><log48_100_2.txt>
<log64_100.txt><log8_100.txt>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to
which their experiments lead.
-- Norbert Wiener