Re: [petsc-users] downloading hypre

2018-06-04 Thread Lukas van de Wiel
Hi Matt,

Whoa, thanks for the invite, but that is a bit too short of a notice for
the circumstances.
I will gladly come to the next one. And if it is far away, I have good
excuse to prepend a holiday to it. :-)

WIll it be in the beginning of June in 2019 as well?

Cheers
Lukas

On Fri, Jun 1, 2018 at 5:16 PM, Matthew Knepley  wrote:

> On Fri, Jun 1, 2018 at 11:11 AM, Lukas van de Wiel <
> lukas.drinkt.t...@gmail.com> wrote:
>
>> Thanks Matt. That was fast! :-)
>>
>> The university server has:
>>
>> [17:04 gtecton@pbsserv petsc-3.8.1] > openssl version
>> OpenSSL 0.9.8zh 3 Dec 2015
>>
>> And a friend with
>>
>> OpenSSL 1.0.2g  1 Mar 2016
>>
>> Can execute the git clone command without trouble.
>>
>> Cheers and have a great weekend!
>>
>
> Great! Also if you are bored next week, we are having the PETSc Meeting in
> London.
> Only a short train ride by Eurostar :)
>
>Matt
>
>
>> Lukas
>>
>>
>>
>>
>> On Fri, Jun 1, 2018 at 5:08 PM, Matthew Knepley 
>> wrote:
>>
>>> On Fri, Jun 1, 2018 at 11:01 AM, Lukas van de Wiel <
>>> lukas.drinkt.t...@gmail.com> wrote:
>>>
 Hi all,

 for years I have been installing PETSc on machines, almost always
 without any issue to speak off. Compliments for the solid configure script.
 However, now I see an issue I cannot easily solve.

 When getting HYPRE in the configuration options, the output gives


 
 ***
  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log
 for details):
 
 ---
 Error during download/extract/detection of HYPRE:
 Unable to download hypre
 Could not execute "git clone https://github.com/LLNL/hypre
 /net/home/gtecton/flops/petsc-3.8.1/linux-gnu-x86_64/externa
 lpackages/git.hypre":
 Cloning into '/net/home/gtecton/flops/petsc
 -3.8.1/linux-gnu-x86_64/externalpackages/git.hypre'...
 fatal: unable to access 'https://github.com/LLNL/hypre/':
 error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert
 protocol version
 Unable to download package HYPRE from: git://https://github.com/LLNL/
 hypre

 Has anybody else seen this? It seems to get stuck on SSL, but web
 security is sadly not my forte...

>>>
>>> I have definitely had issues similar to this. It came up with Firedrake.
>>> I built my own Python, and had
>>> an early version of libssh/libcrypto. I think you need at least 1.0.1 to
>>> get the SSL version that github
>>> is now requiring. You can use ldd (or otool -L) to check the version for
>>> your Python.
>>>
>>>   Thanks,
>>>
>>> Matt
>>>
>>>
 Thanks a lot,

 Lukas


>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/ 
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
>


Re: [petsc-users] MatPtAP

2018-06-04 Thread Samuel Lanthaler
Ok, I now realize that I had implemented the boundary conditions in an 
unnecessarily complicated way... As you pointed out, I can just 
manipulate individual matrix rows to enforce the BC's. In that way, I 
never have to call MatPtAP, or do any expensive operations. Probably 
that's what is commonly done. I've changed my code, and it seems to work 
fine and is much faster.


Thanks a lot for your help, everyone! This has been very educational for me.

Samuel


On 06/01/2018 09:04 PM, Hong wrote:

Samuel,
I have following questions:
1) Why solving \lambda*Pt*B*P*y=Pt*A*P*y is better than solving 
original \lambda*B*x = A*x?


2) Does your eigen solver require matrix factorization? If not, i.e., 
only uses mat-vec multiplication, then you may implement
z = (Pt*(A*(P*y))) in your eigensolver instead of using mat-mat-mat 
multiplication.


3) petsc PtAP() was implemented for multigrid applications, in which 
the product C = PtAP is a denser but a much smaller matrix.
I have not seen the use of your case, that P is square with same size 
as A. If C is much denser than A, then PtAP consumes a large portion 
of time is anticipated.


Hong

On Fri, Jun 1, 2018 at 12:35 PM, Smith, Barry F. > wrote:



  Could you save the P matrix with MatView() using a binary viewer
and the A matrix with MatView() and the binary viewer and email
them to petsc-ma...@mcs.anl.gov  ?
Then we can run the code in the profiler with your matrices and
see if there is any way to speed up the computation.

   Barry


> On Jun 1, 2018, at 11:07 AM, Samuel Lanthaler
mailto:s.lantha...@gmail.com>> wrote:
>
> On 06/01/2018 03:42 PM, Matthew Knepley wrote:
>> On Fri, Jun 1, 2018 at 9:21 AM, Samuel Lanthaler
mailto:s.lantha...@gmail.com>> wrote:
>> Hi,
>>
>> I was wondering what the most efficient way to use MatPtAP
would be in the following situation: I am discretizing a PDE
system. The discretization yields a matrix A that has a band
structure (with k upper and lower bands, say). In order to
implement the boundary conditions, I use a transformation matrix P
which is essentially the unit matrix, except for the entries
P_{ij} where i,j>
>> P =  [ B, 0, 0, 0, ..., 0, 0 ]
>>[  0, 1, 0, 0, ..., 0, 0 ]
>>[  ]
>>[  ]
>>[  ..., 1, 0 ]
>>[  0, 0, 0, 0, ..., 0, C ]
>>
>> with B,C are (k-by-k) matrices.
>> Right now, I'm simply constructing A, P and calling
>>
>> CALL

MatPtAP(petsc_matA,petsc_matP,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,petsc_matPtAP,ierr)
>>
>> where I haven't done anything to pestc_matPtAP, prior to this
call. Is this the way to do it?
>>
>> I'm asking because, currently, setting up the matrices A and P
takes very little time, whereas the operation MatPtAP is taking
quite long, which seems very odd... The matrices are of type
MPIAIJ. In my problem, the total matrix dimension is around 10'000
and the matrix blocks (B,C) are of size ~100.
>>
>> Are you sure this is what you want to do? Usually BC are local,
since by definition PDE are local, and
>> are applied pointwise. What kind of BC do you have here?
>>
>
> The boundary conditions are a mixture of Dirichlet and Neumann;
in my case, the PDE is a system involving 8 variables on a disk,
where the periodic direction is discretized using a Fourier series
expansion, the radial direction uses B-splines.
>
> In reality, I have two matrices A,B, and want to solve the
eigenvalue problem \lambda*B*x = A*x.
> I found it quite convenient to use a transformation P to a
different set of variables y, such that x=P*y and x satisfies the
BC iff certain components of y are 0. The latter is enforced by
inserting spurious eigenvalues at the relevant components of y in
the transformed eigenvalue problem \lambda*Pt*B*P*y=Pt*A*P*y.
After solving the EVP in terms of y, I get back x=P*y.
> Is this an inherently bad/inefficient way of enforcing BC's? Thanks.
>
>
>
>
>>   Thanks,
>>
>> Matt
>>
>> Thanks in advance for any ideas.
>>
>> Cheers,
>> Samuel
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin
their experiments is infinitely more interesting than any results
to which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/

>






Re: [petsc-users] MAT_NEW_NONZERO_LOCATIONS working?

2018-06-04 Thread Jean-Yves LExcellent


Thanks for the details of your needs.

For the first application, the sparse RHS feature with distributed
solution should effectively be fine.

For the second one, a future distributed RHS feature (not currently
available in MUMPS) might help if the centralized sparse RHS is too
memory consuming, depending on the size of B1.

Regards,
Jean-Yves, for the MUMPS developers


 I want to  invert a rather large sparse matrix for this using a 
sparse rhs with centralized input would be ok as long as the solution 
is distributed.
and the second application I have in mind is solving a system of the 
from AX=B where A and B are sparse and B is given by a block matrix of 
the form B=[B1 0, 0 0] where B1 is dense but the dimension is (much) 
smaller than that of the whole matrix B.

Marius:
Current PETSc interface supports sequential sparse multiple right-hand 
side, but not distributed.
It turns out that mumps does not support distributed sparse multiple 
right-hand sides at

the moment (see attached email).
Jean-Yves invites you to communicate with him directly.
Let me know what we can help on this matter,
e.g., add support for parallel implementation of sparse multiple 
right-hand side with

centralized rhs input?
Hong
--


  Jean-Yves LExcellentmailto:jean-yves.l.excell...@ens-lyon.fr>>


5:14 AM (3 hours ago)

toHong,mumps-dev

Hello,

We do not support distributed sparse multiple right-hand sides at
the moment. From the feedback we have from applications, the
right-hand sides are often very sparse, and having them distributed
did not seem critical.

Since we are specifying a distributed right-hand sides feature at the
moment, could you let us know in more detail the need regarding
distributed sparse right-hand side (e.g., do all columns have the same
nonzero structure in that case) or put us in contact with the user who
needs this?

Thanks,
Jean-Yves and Patrick

Thanks a lot guys, very helpful.




I see MUMPS http://mumps.enseeiht.fr/

Sparse multiple right-hand side, distributed solution;
Exploitation of sparsity in the right-hand sidesPETSc interface
computes mumps distributed solution as default (this is not new)
(ICNTL(21) = 1)

I will add support for Sparse multiple right-hand side.

Hong

On Thu, May 31, 2018 at 11:25 AM, Smith, Barry F.
mailto:bsm...@mcs.anl.gov>[mailto:bsm...@mcs.anl.gov
]> wrote:
  Hong,

    Can you see about adding support for distributed right hand side?

    Thanks

      Barry

> On May 31, 2018, at 2:37 AM, Marius Buerkle mailto:mbuer...@web.de>[mailto:mbuer...@web.de
]> wrote:
>
> The fix for MAT_NEW_NONZERO_LOCATIONS, thanks again.
>
> I have yet another question, sorry. The recent version of MUMPS
supports distributed and sparse RHS is there any chance that this
will be supported in PETSc in the near future?
>
>
>
>
>> On May 30, 2018, at 6:55 PM, Marius Buerkle mailto:mbuer...@web.de>[mailto:mbuer...@web.de
]> wrote:
>>
>> Thanks for the quick fix, I will test it and report back.
>> I have another maybe related question, if
MAT_NEW_NONZERO_LOCATIONS is true and let's say 1 new nonzero
position is created it does not allocated 1 but several new
nonzeros but only use 1.
>
> Correct
>
>> I think that is normal, right?
>
> Yes
>
>> But, at least as far as I understand the manual, a subsequent
call of mat assemble with
>> MAT_FINAL_ASSEMBLY should compress out the unused allocations
and release the memory, is this correct?
>
> It "compresses it out" (by shifting all the nonzero entries to
the beginning of the internal i, j, and a arrays), but does NOT
release any memory. Since the values are stored in one big
contiguous array (obtained with a single malloc) it cannot just
free part of the array, so the extra locations just sit harmlessly
at the end if the array unused.
>
>> If so, this did not work for me, even after doing
>> MAT_FINAL_ASSEMBLY the unused nonzero allocations remain. Is
this normal?
>
> Yes,
>
> Barry
>
>>
>>>
>>> Fixed in the branch barry/fix-mat-new-nonzero-locations/maint
>>>
>>> Once this passes testing it will go into the maint branch and
then the next patch release but you can use it now in the branch
barry/fix-mat-new-nonzero-locations/maint
>>>
>>> Thanks for the report and reproducible example
>>>
>>> Barry
>>>
>>>
 On May 29, 2018, at 7:51 PM, Marius Buerkle mailto:mbuer...@web.de>[mailto:mbuer...@web.de
]> wrote:

 Sure, I made a small reproducer, it is Fortran though I hope
that is ok. If MAT_NEW_NONZERO_LOCATIONS is set to false I get an
error, if it is set to true the new 

Re: [petsc-users] downloading hypre

2018-06-04 Thread Matthew Knepley
On Mon, Jun 4, 2018 at 3:50 AM, Lukas van de Wiel <
lukas.drinkt.t...@gmail.com> wrote:

> Hi Matt,
>
> Whoa, thanks for the invite, but that is a bit too short of a notice for
> the circumstances.
> I will gladly come to the next one. And if it is far away, I have good
> excuse to prepend a holiday to it. :-)
>
> WIll it be in the beginning of June in 2019 as well?
>

It usually is. We try to work around other conferences. Generally we get it
set 6 months in advance.

  Thanks,

Matt


> Cheers
> Lukas
>
> On Fri, Jun 1, 2018 at 5:16 PM, Matthew Knepley  wrote:
>
>> On Fri, Jun 1, 2018 at 11:11 AM, Lukas van de Wiel <
>> lukas.drinkt.t...@gmail.com> wrote:
>>
>>> Thanks Matt. That was fast! :-)
>>>
>>> The university server has:
>>>
>>> [17:04 gtecton@pbsserv petsc-3.8.1] > openssl version
>>> OpenSSL 0.9.8zh 3 Dec 2015
>>>
>>> And a friend with
>>>
>>> OpenSSL 1.0.2g  1 Mar 2016
>>>
>>> Can execute the git clone command without trouble.
>>>
>>> Cheers and have a great weekend!
>>>
>>
>> Great! Also if you are bored next week, we are having the PETSc Meeting
>> in London.
>> Only a short train ride by Eurostar :)
>>
>>Matt
>>
>>
>>> Lukas
>>>
>>>
>>>
>>>
>>> On Fri, Jun 1, 2018 at 5:08 PM, Matthew Knepley 
>>> wrote:
>>>
 On Fri, Jun 1, 2018 at 11:01 AM, Lukas van de Wiel <
 lukas.drinkt.t...@gmail.com> wrote:

> Hi all,
>
> for years I have been installing PETSc on machines, almost always
> without any issue to speak off. Compliments for the solid configure 
> script.
> However, now I see an issue I cannot easily solve.
>
> When getting HYPRE in the configuration options, the output gives
>
>
> 
> ***
>  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log
> for details):
> 
> ---
> Error during download/extract/detection of HYPRE:
> Unable to download hypre
> Could not execute "git clone https://github.com/LLNL/hypre
> /net/home/gtecton/flops/petsc-3.8.1/linux-gnu-x86_64/externa
> lpackages/git.hypre":
> Cloning into '/net/home/gtecton/flops/petsc
> -3.8.1/linux-gnu-x86_64/externalpackages/git.hypre'...
> fatal: unable to access 'https://github.com/LLNL/hypre/':
> error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert
> protocol version
> Unable to download package HYPRE from: git://https://github.com/LLNL/
> hypre
>
> Has anybody else seen this? It seems to get stuck on SSL, but web
> security is sadly not my forte...
>

 I have definitely had issues similar to this. It came up with
 Firedrake. I built my own Python, and had
 an early version of libssh/libcrypto. I think you need at least 1.0.1
 to get the SSL version that github
 is now requiring. You can use ldd (or otool -L) to check the version
 for your Python.

   Thanks,

 Matt


> Thanks a lot,
>
> Lukas
>
>


 --
 What most experimenters take for granted before they begin their
 experiments is infinitely more interesting than any results to which their
 experiments lead.
 -- Norbert Wiener

 https://www.cse.buffalo.edu/~knepley/ 

>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ 
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


[petsc-users] "snes_type test" is gone?

2018-06-04 Thread Kong, Fande
Hi PETSc Team,

I was wondering if "snes_type test" has been gone? Quite a few MOOSE users
use this option to test their Jacobian matrices.

If it is gone, any reason?

Fande,


Re: [petsc-users] Poor weak scaling when solving successive linearsystems

2018-06-04 Thread Junchao Zhang
Miachael,  I can compile and run you test.  I am now profiling it. Thanks.

--Junchao Zhang

On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
michael.bec...@physik.uni-giessen.de> wrote:

> Hello again,
> this took me longer than I anticipated, but here we go.
> I did reruns of the cases where only half the processes per node were used
> (without -log_sync):
>
>  125 procs,1st   125 procs,2nd  1000
> procs,1st  1000 procs,2nd
>MaxRatioMaxRatioMax
> RatioMaxRatio
> KSPSolve   1.203E+021.01.210E+021.0
> 1.399E+021.11.365E+021.0
> VecTDot6.376E+003.76.551E+004.0
> 7.885E+002.97.175E+003.4
> VecNorm4.579E+007.15.803E+00   10.2
> 8.534E+006.96.026E+004.9
> VecScale   1.070E-012.11.129E-012.2
> 1.301E-012.51.270E-012.4
> VecCopy1.123E-011.31.149E-011.3
> 1.301E-011.61.359E-011.6
> VecSet 7.063E-011.76.968E-011.7
> 7.432E-011.87.425E-011.8
> VecAXPY1.166E+001.41.167E+001.4
> 1.221E+001.51.279E+001.6
> VecAYPX1.317E+001.61.290E+001.6
> 1.536E+001.91.499E+002.0
> VecScatterBegin6.142E+003.25.974E+002.8
> 6.448E+003.06.472E+002.9
> VecScatterEnd  3.606E+014.23.551E+014.0
> 5.244E+012.74.995E+012.7
> MatMult3.561E+011.63.403E+011.5
> 3.435E+011.43.332E+011.4
> MatMultAdd 1.124E+012.01.130E+012.1
> 2.093E+012.91.995E+012.7
> MatMultTranspose   1.372E+012.51.388E+012.6
> 1.477E+012.21.381E+012.1
> MatSolve   1.949E-020.01.653E-020.0
> 4.789E-020.04.466E-020.0
> MatSOR 6.610E+011.36.673E+011.3
> 7.111E+011.37.105E+011.3
> MatResidual2.647E+011.72.667E+011.7
> 2.446E+011.42.467E+011.5
> PCSetUpOnBlocks5.266E-031.45.295E-031.4
> 5.427E-031.55.289E-031.4
> PCApply1.031E+021.01.035E+021.0
> 1.180E+021.01.164E+021.0
>
> I also slimmed down my code and basically wrote a simple weak scaling test
> (source files attached) so you can profile it yourself. I appreciate the
> offer Junchao, thank you.
> You can adjust the system size per processor at runtime via
> "-nodes_per_proc 30" and the number of repeated calls to the function
> containing KSPsolve() via "-iterations 1000". The physical problem is
> simply calculating the electric potential from a homogeneous charge
> distribution, done multiple times to accumulate time in KSPsolve().
> A job would be started using something like
>
> mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
> -iterations 1000 \\
>  -ksp_rtol 1E-6 \
>  -log_view -log_sync\
>  -pc_type gamg -pc_gamg_type classical\
>  -ksp_type cg \
>  -ksp_norm_type unpreconditioned \
>  -mg_levels_ksp_type richardson \
>  -mg_levels_ksp_norm_type none \
>  -mg_levels_pc_type sor \
>  -mg_levels_ksp_max_it 1 \
>  -mg_levels_pc_sor_its 1 \
>  -mg_levels_esteig_ksp_type cg \
>  -mg_levels_esteig_ksp_max_it 10 \
>  -gamg_est_ksp_type cg
>
> , ideally started on a cube number of processes for a cubical process grid.
> Using 125 processes and 10.000 iterations I get the output in
> "log_view_125_new.txt", which shows the same imbalance for me.
>
> Michael
>
>
> Am 02.06.2018 um 13:40 schrieb Mark Adams:
>
>
>
> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang 
> wrote:
>
>> Hi,Michael,
>>   You can add -log_sync besides -log_view, which adds barriers to certain
>> events but measures barrier time separately from the events. I find this
>> option makes it easier to interpret log_view output.
>>
>
> That is great (good to know).
>
> This should give us a better idea if your large VecScatter costs are from
> slow communication or if it catching some sort of load imbalance.
>
>
>>
>> --Junchao Zhang
>>
>> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
>> michael.bec...@physik.uni-giessen.de> wrote:
>>
>>> Barry: On its way. Could take a couple days again.
>>>
>>> Junchao: I unfortunately don't have access to a cluster with a faster
>>> network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking fat-tree
>>> network, which I realize causes parallel slowdown if the nodes are not
>>> connected to the same switch. Each node has 24 processors (2x12/socket) and
>>> four NUMA domains (two for each socket).
>>> The ranks are usually not distributed perfectly even, i.e. for 125
>>> processes, of the six required nodes, five would use 21 cores and one 20.
>>> Would using another CPU 

Re: [petsc-users] "snes_type test" is gone?

2018-06-04 Thread Alexander Lindsay
Looks like `-snes_test_jacobian` and `-snes_test_jacobian_view` are the
options to use...

On Mon, Jun 4, 2018 at 2:27 PM, Kong, Fande  wrote:

> Hi PETSc Team,
>
> I was wondering if "snes_type test" has been gone? Quite a few MOOSE users
> use this option to test their Jacobian matrices.
>
> If it is gone, any reason?
>
> Fande,
>


Re: [petsc-users] "snes_type test" is gone?

2018-06-04 Thread Zhang, Hong



> On Jun 4, 2018, at 4:59 PM, Zhang, Hong  wrote:
> 
> -snes_type has been removed. We can just use -snes_test_jacobian instead. 
> Note that the test is done every time the Jacobian is computed.

It was meant to be "-snes_type test".

> Hong (Mr.)
> 
>> On Jun 4, 2018, at 3:27 PM, Kong, Fande  wrote:
>> 
>> Hi PETSc Team,
>> 
>> I was wondering if "snes_type test" has been gone? Quite a few MOOSE users 
>> use this option to test their Jacobian matrices. 
>> 
>> If it is gone, any reason?
>> 
>> Fande,
> 



[petsc-users] MKL Pardiso Solver Execution Step control

2018-06-04 Thread Matthew Overholt
Hello,

I am using KSP in KSPPREONLY mode to do a direct solve on an A*x = b
system, with solver algorithms MUMPS, CPardiso and Pardiso.  For Pardiso,
is it possible to control the solver execution step (denoted "phase" in
Intel's docs)?  I would like to be able to control when it refactors as one
can when calling it directly.

https://software.intel.com/en-us/mkl-developer-reference-c-pardiso

If so, please give me the details, since I can't see how to do it from the
MATSOLVERMKL_PARDISO docs page (which is clearly listing the iparm[] flags).

Thank you.

Matt Overholt
CapeSym, Inc.
(508) 653-7100 x204
overh...@capesim.com


Re: [petsc-users] "snes_type test" is gone?

2018-06-04 Thread Zhang, Hong
-snes_type has been removed. We can just use -snes_test_jacobian instead. Note 
that the test is done every time the Jacobian is computed.

Hong (Mr.)

> On Jun 4, 2018, at 3:27 PM, Kong, Fande  wrote:
> 
> Hi PETSc Team,
> 
> I was wondering if "snes_type test" has been gone? Quite a few MOOSE users 
> use this option to test their Jacobian matrices. 
> 
> If it is gone, any reason?
> 
> Fande,



Re: [petsc-users] MAT_NEW_NONZERO_LOCATIONS working?

2018-06-04 Thread Hong
On Mon, Jun 4, 2018 at 1:03 PM, Jean-Yves LExcellent <
jean-yves.l.excell...@ens-lyon.fr> wrote:

>
> Thanks for the details of your needs.
>
> For the first application, the sparse RHS feature with distributed
> solution should effectively be fine.
>
I'll add parallel support of this feature in petsc after I'm back the ANL
after June 14th.
Hong

>
> For the second one, a future distributed RHS feature (not currently
> available in MUMPS) might help if the centralized sparse RHS is too
> memory consuming, depending on the size of B1.
>
> Regards,
> Jean-Yves, for the MUMPS developers
>
>
>  I want to  invert a rather large sparse matrix for this using a sparse
> rhs with centralized input would be ok as long as the solution is
> distributed.
>
> and the second application I have in mind is solving a system of the from
> AX=B where A and B are sparse and B is given by a block matrix of the form
> B=[B1 0, 0 0] where B1 is dense but the dimension is (much) smaller than
> that of the whole matrix B.
>
> Marius:
> Current PETSc interface supports sequential sparse multiple right-hand
> side, but not distributed.
> It turns out that mumps does not support distributed sparse multiple
> right-hand sides at
> the moment (see attached email).
>
> Jean-Yves invites you to communicate with him directly.
> Let me know what we can help on this matter,
> e.g., add support for parallel implementation of sparse multiple
> right-hand side with
> centralized rhs input?
>
> Hong
>
> --
> Jean-Yves LExcellent 
> 5:14 AM (3 hours ago)
>
> to Hong, mumps-dev
>
>
>
>
> Hello,
>
> We do not support distributed sparse multiple right-hand sides at
> the moment. From the feedback we have from applications, the
> right-hand sides are often very sparse, and having them distributed
> did not seem critical.
>
> Since we are specifying a distributed right-hand sides feature at the
> moment, could you let us know in more detail the need regarding
> distributed sparse right-hand side (e.g., do all columns have the same
> nonzero structure in that case) or put us in contact with the user who
> needs this?
>
> Thanks,
> Jean-Yves and Patrick
>
>>
>> Thanks a lot guys, very helpful.
>>
>>
>>
>>
>> I see MUMPS http://mumps.enseeiht.fr/
>>
>> Sparse multiple right-hand side, distributed solution; Exploitation of
>> sparsity in the right-hand sidesPETSc interface computes mumps distributed
>> solution as default (this is not new) (ICNTL(21) = 1)
>>
>> I will add support for Sparse multiple right-hand side.
>>
>> Hong
>>
>> On Thu, May 31, 2018 at 11:25 AM, Smith, Barry F. > [mailto:bsm...@mcs.anl.gov]> wrote:
>>   Hong,
>>
>> Can you see about adding support for distributed right hand side?
>>
>> Thanks
>>
>>   Barry
>>
>> > On May 31, 2018, at 2:37 AM, Marius Buerkle > mbuer...@web.de]> wrote:
>> >
>> > The fix for MAT_NEW_NONZERO_LOCATIONS, thanks again.
>> >
>> > I have yet another question, sorry. The recent version of MUMPS
>> supports distributed and sparse RHS is there any chance that this will be
>> supported in PETSc in the near future?
>> >
>> >
>> >
>> >
>> >> On May 30, 2018, at 6:55 PM, Marius Buerkle > mbuer...@web.de]> wrote:
>> >>
>> >> Thanks for the quick fix, I will test it and report back.
>> >> I have another maybe related question, if MAT_NEW_NONZERO_LOCATIONS is
>> true and let's say 1 new nonzero position is created it does not allocated
>> 1 but several new nonzeros but only use 1.
>> >
>> > Correct
>> >
>> >> I think that is normal, right?
>> >
>> > Yes
>> >
>> >> But, at least as far as I understand the manual, a subsequent call of
>> mat assemble with
>> >> MAT_FINAL_ASSEMBLY should compress out the unused allocations and
>> release the memory, is this correct?
>> >
>> > It "compresses it out" (by shifting all the nonzero entries to the
>> beginning of the internal i, j, and a arrays), but does NOT release any
>> memory. Since the values are stored in one big contiguous array (obtained
>> with a single malloc) it cannot just free part of the array, so the extra
>> locations just sit harmlessly at the end if the array unused.
>> >
>> >> If so, this did not work for me, even after doing
>> >> MAT_FINAL_ASSEMBLY the unused nonzero allocations remain. Is this
>> normal?
>> >
>> > Yes,
>> >
>> > Barry
>> >
>> >>
>> >>>
>> >>> Fixed in the branch barry/fix-mat-new-nonzero-locations/maint
>> >>>
>> >>> Once this passes testing it will go into the maint branch and then
>> the next patch release but you can use it now in the branch
>> barry/fix-mat-new-nonzero-locations/maint
>> >>>
>> >>> Thanks for the report and reproducible example
>> >>>
>> >>> Barry
>> >>>
>> >>>
>>  On May 29, 2018, at 7:51 PM, Marius Buerkle > mbuer...@web.de]> wrote:
>> 
>>  Sure, I made a small reproducer, it is Fortran though I hope that is
>> ok. If MAT_NEW_NONZERO_LOCATIONS is set to false I get an error, if it is
>> set to true the new nonzero element is inserted, if
>> MAT_NEW_NONZERO_LOCATIONS is 

Re: [petsc-users] "snes_type test" is gone?

2018-06-04 Thread Kong, Fande
Thanks, Hong,

I see. It is better if "-snes_type test" did not exist at the first place.


Fande,

On Mon, Jun 4, 2018 at 4:01 PM, Zhang, Hong  wrote:

>
>
> > On Jun 4, 2018, at 4:59 PM, Zhang, Hong  wrote:
> >
> > -snes_type has been removed. We can just use -snes_test_jacobian
> instead. Note that the test is done every time the Jacobian is computed.
>
> It was meant to be "-snes_type test".
>
> > Hong (Mr.)
> >
> >> On Jun 4, 2018, at 3:27 PM, Kong, Fande  wrote:
> >>
> >> Hi PETSc Team,
> >>
> >> I was wondering if "snes_type test" has been gone? Quite a few MOOSE
> users use this option to test their Jacobian matrices.
> >>
> >> If it is gone, any reason?
> >>
> >> Fande,
> >
>
>


Re: [petsc-users] Poor weak scaling when solving successive linearsystems

2018-06-04 Thread Michael Becker

Hello again,

this took me longer than I anticipated, but here we go.
I did reruns of the cases where only half the processes per node were 
used (without -log_sync):


125 procs,1st   125 procs,2nd  1000 procs,1st  
1000 procs,2nd
   Max    Ratio Max    RatioMax    
RatioMax    Ratio
KSPSolve 1.203E+02    1.0        1.210E+02    1.0        1.399E+02 
1.1        1.365E+02    1.0
VecTDot    6.376E+00    3.7        6.551E+00    4.0 7.885E+00    
2.9        7.175E+00    3.4
VecNorm    4.579E+00    7.1        5.803E+00   10.2 8.534E+00    
6.9        6.026E+00    4.9
VecScale   1.070E-01    2.1        1.129E-01    2.2 1.301E-01    
2.5        1.270E-01    2.4
VecCopy    1.123E-01    1.3        1.149E-01    1.3 1.301E-01    
1.6        1.359E-01    1.6
VecSet 7.063E-01    1.7        6.968E-01    1.7 7.432E-01    
1.8        7.425E-01    1.8
VecAXPY    1.166E+00    1.4        1.167E+00    1.4 1.221E+00    
1.5        1.279E+00    1.6
VecAYPX    1.317E+00    1.6        1.290E+00    1.6 1.536E+00    
1.9        1.499E+00    2.0
VecScatterBegin    6.142E+00    3.2        5.974E+00    2.8 6.448E+00    
3.0        6.472E+00    2.9
VecScatterEnd  3.606E+01    4.2        3.551E+01    4.0 5.244E+01    
2.7        4.995E+01    2.7
MatMult    3.561E+01    1.6        3.403E+01    1.5 3.435E+01    
1.4        3.332E+01    1.4
MatMultAdd 1.124E+01    2.0        1.130E+01    2.1 2.093E+01    
2.9        1.995E+01    2.7
MatMultTranspose   1.372E+01    2.5        1.388E+01    2.6 1.477E+01    
2.2        1.381E+01    2.1
MatSolve   1.949E-02    0.0        1.653E-02    0.0 4.789E-02    
0.0        4.466E-02    0.0
MatSOR 6.610E+01    1.3        6.673E+01    1.3 7.111E+01    
1.3        7.105E+01    1.3
MatResidual    2.647E+01    1.7        2.667E+01    1.7 2.446E+01    
1.4        2.467E+01    1.5
PCSetUpOnBlocks    5.266E-03    1.4        5.295E-03    1.4 5.427E-03    
1.5        5.289E-03    1.4
PCApply    1.031E+02    1.0        1.035E+02    1.0 1.180E+02    
1.0        1.164E+02    1.0


I also slimmed down my code and basically wrote a simple weak scaling 
test (source files attached) so you can profile it yourself. I 
appreciate the offer Junchao, thank you.
You can adjust the system size per processor at runtime via 
"-nodes_per_proc 30" and the number of repeated calls to the function 
containing KSPsolve() via "-iterations 1000". The physical problem is 
simply calculating the electric potential from a homogeneous charge 
distribution, done multiple times to accumulate time in KSPsolve().

A job would be started using something like

   mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
   -iterations 1000 \\
 -ksp_rtol 1E-6 \
 -log_view -log_sync\
 -pc_type gamg -pc_gamg_type classical\
 -ksp_type cg \
 -ksp_norm_type unpreconditioned \
 -mg_levels_ksp_type richardson \
 -mg_levels_ksp_norm_type none \
 -mg_levels_pc_type sor \
 -mg_levels_ksp_max_it 1 \
 -mg_levels_pc_sor_its 1 \
 -mg_levels_esteig_ksp_type cg \
 -mg_levels_esteig_ksp_max_it 10 \
 -gamg_est_ksp_type cg

, ideally started on a cube number of processes for a cubical process grid.
Using 125 processes and 10.000 iterations I get the output in 
"log_view_125_new.txt", which shows the same imbalance for me.


Michael



Am 02.06.2018 um 13:40 schrieb Mark Adams:



On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang > wrote:


Hi,Michael,
  You can add -log_sync besides -log_view, which adds barriers to
certain events but measures barrier time separately from the
events. I find this option makes it easier to interpret log_view
output.


That is great (good to know).

This should give us a better idea if your large VecScatter costs are 
from slow communication or if it catching some sort of load imbalance.



--Junchao Zhang

On Wed, May 30, 2018 at 3:27 AM, Michael Becker
mailto:michael.bec...@physik.uni-giessen.de>> wrote:

Barry: On its way. Could take a couple days again.

Junchao: I unfortunately don't have access to a cluster with a
faster network. This one has a mixed 4X QDR-FDR InfiniBand 2:1
blocking fat-tree network, which I realize causes parallel
slowdown if the nodes are not connected to the same switch.
Each node has 24 processors (2x12/socket) and four NUMA
domains (two for each socket).
The ranks are usually not distributed perfectly even, i.e. for
125 processes, of the six required nodes, five would use 21
cores and one 20.
Would using another CPU type make a difference
communication-wise? I could switch to faster ones (on the same
network), but I always assumed this would only improve
performance of the stuff that is unrelated to