Re: [petsc-users] Question on MatMatmult

2024-05-29 Thread Frank Bramkamp
Ah ok,

Then I will have a look at matconvert. 
And then maybe later switch to AIJ as well.


Thanks of the help, Frank


> On 29 May 2024, at 16:57, Barry Smith  wrote:
> 
> 
>   You can use MatConvert()
> 
> 
>> On May 29, 2024, at 10:53 AM, Frank Bramkamp  wrote:
>> 
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> Hello Hong,
>> 
>> Thank you for the clarification.
>> If I already have a BAIJ matrix format, can I then convert it later into AIJ 
>> format as well ?!
>> In that case I would have two matrices, but that would be ok for testing.
>> I think that you sometimes convert different matrix formats into each other 
>> ?!
>> 
>> 
>> Since I typically have BAIJ format, I also use a blocked ILU, which would 
>> turn into a point wise ILU
>> for an AIJ matrix. That is why I typically have the BAIJ format.
>> 
>> Otherwise, I have to change it into an AIJ format from the beginning.
>> 
>> 
>> Thanks for the quick help,
>> 
>> Frank
>> 
>> 
>> 
>> 
> 



Re: [petsc-users] Question on MatMatmult

2024-05-29 Thread Frank Bramkamp




 Hello Hong, Thank you for the clarification. If I already have a BAIJ matrix format, can I then convert it later into AIJ format as well ?! In that case I would have two matrices, but that would be ok for testing. I think that you sometimes




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Hello Hong,

Thank you for the clarification.
If I already have a BAIJ matrix format, can I then convert it later into AIJ format as well ?!
In that case I would have two matrices, but that would be ok for testing.
I think that you sometimes convert different matrix formats into each other ?!


Since I typically have BAIJ format, I also use a blocked ILU, which would turn into a point wise ILU
for an AIJ matrix. That is why I typically have the BAIJ format.

Otherwise, I have to change it into an AIJ format from the beginning.


Thanks for the quick help,

Frank






[petsc-users] Question on MatMatmult

2024-05-29 Thread Frank Bramkamp




 Dear PETSc Team, I would like to make a matrix-matrix product of two matrices. I try to use CALL MatMatMult(Mat_A,MAT_B,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,MAT_AB,IERROR). // calling from fortran When I try to use this function I get the




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Dear PETSc Team,

I would like to make a matrix-matrix product of two matrices.

I try to use
 CALL MatMatMult(Mat_A,MAT_B,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,MAT_AB,IERROR). // calling from fortran

When I try to use this function I get the following error message:

"Unspecified symbolic phase for product AB with A seqbaij, B seqbaij. The product is not supported”

I am using the seqbaij matrix format. Is MatMatMult and MatProductSymbolic
only defined for the standard point-wise matrix format but not for a blocked format ?!

In the documentation, I could not see a hint on supported matrix formats or any limitations.
The examples also just use a point-wise format (AIJ), as I can see so far.


Greetings, Frank Bramkamp









Re: [petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-05 Thread Frank Bramkamp




 Dear Barry, That looks very good now. The -lnvc is gone now. I also tested my small fortran program. There I can see that libnvc is automatically added as well, but this time is comes after the libaccdevice. so. library for openacc. And then




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd





Dear Barry,

That looks very good now. The -lnvc is gone now.

I also tested my small fortran program. There I can see that libnvc is automatically added as well, but this time is comes after the 
libaccdevice.so. library for openacc. And then my openacc commands also work again.


I also mentioned some issues with some cuda nvJitlink  library. I just found out that some path in our cuda compiler module was not set correctly.
I will try to compile it with cuda again as well.

We just start to get PETSC on GPUs with the cuda backend, and I start with openccc for our fortran code to get first experience how everything works with GPU
porting.


Good that you could fix the issue. 

Thanks for the great help. Have a nice weekend, Frank Bramkamp









Re: [petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-05 Thread Frank Bramkamp




 Thanks for effort, Barry. I will get it and give it another try. Thanks a lot, Frank > On 5 Apr 2024, at 15: 56, Barry Smith  wrote: > > > There was a bug in my attempted fix so it actually did not skip the




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Thanks for effort, Barry.


I will get it and give it another try.

Thanks a lot, Frank





> On 5 Apr 2024, at 15:56, Barry Smith  wrote:
> 
> 
>   There was a bug in my attempted fix so it actually did not skip the option.
> 
>   Try git pull and then run configure again.
> 
> 
>> On Apr 5, 2024, at 6:30 AM, Frank Bramkamp  wrote:
>> 
>> Dear Barry,
>> 
>> I tried your fix for -lnvc.  Unfortunately it did not work so far.
>> Here I send you the configure.log file again.
>> 
>> One can see that you try to skip something, but later it still always includes -lnvc for the linker.
>> In the file petscvariables it also appears as before.
>> 
>> As I see it, it lists the linker options including -lnvc also before you try to skip it.
>> Maybe it is already in the linker options before the skipping.
>> 
>> 
>> Greetings, Frank 
>> 
>> 
>> 
> 




Re: [petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-05 Thread Frank Bramkamp




 Thanks for the response, My code is in fortran. I will try to explicitly set LIBS=. . as you suggested. At the moment I skip cuda, but later I also want to use cuda as well. Barry also tried to skip the “-lnvc”, but that did not work yet. Thanks




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Thanks for the response,

My code is in fortran. I will try to explicitly set LIBS=.. as you suggested.
At the moment I skip cuda, but later I also want to use cuda as well.

Barry also tried to skip the “-lnvc”, but that did not work yet.


Thanks a lot for the suggestions, Frank







Re: [petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-04 Thread Frank Bramkamp




 Ok, I will have a look. It is already evening here in Sweden, so it might take until tomorrow. Thanks Frank  ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Ok, I will have a look.

It is already evening here in Sweden, so it might take until tomorrow.

Thanks Frank



Re: [petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-04 Thread Frank Bramkamp




 Ok, I will look for the config. log file. Frank  ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd





Ok, I will look for the config.log file.

Frank




Re: [petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-04 Thread Frank Bramkamp
Thanks for the reply,

Do you know if you actively include the libnvc library ?!
Or is this somehow automatically included ?! 

Greetings, Frank




> On 4 Apr 2024, at 15:56, Satish Balay  wrote:
> 
> 
> On Thu, 4 Apr 2024, Frank Bramkamp wrote:
> 
>> Dear PETSC Team,
>> 
>> I found the following problem:
>> I compile petsc 3.20.5 with Nvidia compiler 23.7.
>> 
>> 
>> I use a pretty standard configuration, including
>> 
>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" 
>> CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 
>> --download-fblaslapack --with-cuda=0
>> 
>> I exclude cuda, since I was not sure if the problem was cuda related. 
> 
> Can you try using (to exclude cuda): --with-cudac=0
> 
>> 
>> 
>> The problem is now, if I have s simple fortran program where I link the 
>> petsc library, but I actually do not use petsc in that program
>> (Just for testing). I want to use OpenACC directives in my program, e.g. 
>> !$acc parallel loop .
>> The problem is now, as soon I link with the petsc library, the openacc 
>> commands do not work anymore.
>> It seems that openacc is not initialised and hence it cannot find a GPU.
>> 
>> The problem seems that you link with -lnvc.
>> In “petscvariables” => PETSC_WITH_EXTERNAL_LIB you include “-lnvc”.
>> If I take this out, then openacc works. With “-lnvc” something gets messed 
>> up.
>> 
>> The problem is also discussed here:
>> https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$
>>  
>> <https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$><https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$
>>  
>> <https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$>
>>  >
>> 
>> My understanding is that libnvc is more a runtime library that does not need 
>> to be included by the linker.
>> Not sure if there is a specific reason to include libnvc (I am not so 
>> familiar what this library does).
>> 
>> If I take out -lnvc from “petscvariables”, then my program with openacc 
>> works as expected. I did not try any more realistic program that includes 
>> petsc.
>> 
>> 
>> 
>> 2)
>> When compiling petsc with cuda support, I also found that in the petsc 
>> library the library libnvJitLink.so.12
>> Is not found. On my system this library is in $CUDA_ROOT/lib64
>> I am not sure where this library is on your system ?! 
> 
> Hm - good if you can send configure.log for this. configure attempts '$CC -v' 
> to determine the link libraries to get c/c++/fortran compatibility libraries. 
> But it can grab other libraries that the compilers are using internally here.
> 
> To avoid this - you can explicitly list these libraries to configure. For ex: 
> for gcc/g++/gfortran
> 
> ./configure CC=gcc CXX=g++ FC=gfortran LIBS="-lgfortran -lstdc++"
> 
> Satish
> 
>> 
>> 
>> Thanks a lot, Frank Bramkamp



[petsc-users] Problem with NVIDIA compiler and OpenACC

2024-04-04 Thread Frank Bramkamp
Dear PETSC Team,

I found the following problem:
I compile petsc 3.20.5 with Nvidia compiler 23.7.


I use a pretty standard configuration, including

--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" 
CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g"  --with-debugging=0 --with-log=1 
--download-fblaslapack --with-cuda=0

I exclude cuda, since I was not sure if the problem was cuda related. 


The problem is now, if I have s simple fortran program where I link the petsc 
library, but I actually do not use petsc in that program
(Just for testing). I want to use OpenACC directives in my program, e.g. !$acc 
parallel loop .
The problem is now, as soon I link with the petsc library, the openacc commands 
do not work anymore.
It seems that openacc is not initialised and hence it cannot find a GPU.

The problem seems that you link with -lnvc.
In “petscvariables” => PETSC_WITH_EXTERNAL_LIB you include “-lnvc”.
If I take this out, then openacc works. With “-lnvc” something gets messed up.

The problem is also discussed here:
https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$
  
<https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$
 >

My understanding is that libnvc is more a runtime library that does not need to 
be included by the linker.
Not sure if there is a specific reason to include libnvc (I am not so familiar 
what this library does).

If I take out -lnvc from “petscvariables”, then my program with openacc works 
as expected. I did not try any more realistic program that includes petsc.



2)
When compiling petsc with cuda support, I also found that in the petsc library 
the library libnvJitLink.so.12
Is not found. On my system this library is in $CUDA_ROOT/lib64
I am not sure where this library is on your system ?! 


Thanks a lot, Frank Bramkamp













[petsc-users] MATSETVALUES: Fortran problem

2024-03-15 Thread Frank Bramkamp




 Dear PETSc Team, I am using the latest petsc version 3. 20. 5. I would like to create a matrix using MatCreateSeqAIJ To insert values, I use MatSetValues. It seems that the Fortran interface/stubs are missing for MatsetValues, as the linker does




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Dear PETSc Team,

I am using the latest petsc version 3.20.5.


I would like to create a matrix using
MatCreateSeqAIJ

To insert values, I use MatSetValues.
It seems that the Fortran interface/stubs are missing for MatsetValues, as the linker does not find any subroutine with that name.
MatSetValueLocal seems to be fine.


Typically I am using a blocked matrix format (BAIJ), which works fine in fortran.
Soon we want to try PETSC on GPUs, using the format MATAIJCUSPARSE, since there seems not to be a blocked format available in PETSC for GPUs so far.
Therefore I first want to try the pointwise format MatCreateSeqAIJ format on a CPU, before using the GPU format.

I think that CUDA also supports a block format now ?! Maybe that would be also useful to have one day.


Greetings, Frank Bramkamp









[petsc-users] Fortran problem MatGetValuesLocal

2023-11-28 Thread Frank Bramkamp
Dear PETSc team,


We are using the latest petsc version 3.20.1, intel compiler 2023,
and we found the following problem:

We want to call the function MatGetValuesLocal to extract a block sub-matrix
from an assembled matrix (e.g. a 5x5 blocked sub matrix). We use the matrix 
format MatCreateBAIJ in parallel.
In particular we try to call MatGetValuesLocal in Fortran.

It seems that the linked does not find the subroutine MatGetValuesLocal.
The subroutine MatGetValues seems to be fine.
I guess that the fortran stubs/fortran interface is missing for this routine.
On the documentation side, you also write a note for developers that the 
fortran stubs and interface
Is not automatically generated for MatGetValuesLocal. So maybe that has been 
forgotten to do.


Unfortunately I do not have any small test example, since we just incorporated 
the function call into our own software.
Otherwise I would first have to set a small test example for the parallel case.

I think there is also an include file where one can check the fortran 
interfaces ?!
I forgot where to look this up.


Greetings, Frank Bramkamp












[petsc-users] KSPAGMRES Question

2022-08-02 Thread Frank Bramkamp
Dear PETSc team,

I have seen that there is the KSP method: KSPAGMRES,
https://petsc.org/release/docs/manualpages/KSP/KSPAGMRES.html 
<https://petsc.org/release/docs/manualpages/KSP/KSPAGMRES.html>

I wanted to test this method, as it also seems to reduce the amount of MPI 
communication, compared
to the standard GMRES. 


I supposed that the class is called “KSPAGMRES”.
But in the include files petscksp.h and petsc/finclude/petscksp.h 
there is no definition for KSPAGMRES, just KSPDGMRES.

I wonder if the definition KSPAGMRES is simply missing, or do I have to call
DGMRES and set another option for AGMRES ?!

The standard GMRES has the problem that MPI_Allreduce gets expensive for 2048 
cores.
Therefore I wanted to see, if AGMRES has a bit less communication, as this is 
mentioned in the description
of the method. 

Greetings, Frank Bramkamp





[petsc-users] Fortran interface of MatNullSpaceCreate

2018-02-26 Thread frank

Hello,

I have a question of the Fortran interface of subroutine MatNullSpaceCreate.

I tried to call the subroutine in the following form:

Vec :: dummyVec, dummyVecs(1)
MatNullSpace :: nullspace
INTEGER :: ierr

(a) call MatNullSpaceCreate( PETSC_COMM_WORLD, PETSC_TRUE, 
PETSC_NULL_INTEGER, dummyVec, nullspace, ierr)


(b) call MatNullSpaceCreate( PETSC_COMM_WORLD, PETSC_TRUE, 
PETSC_NULL_INTEGER, dummyVecs, nullspace, ierr)


(a) and (b) gave me the same error during compilation: no specific 
subroutine for the generic MatNullSpaceCreate.


I am using the latest version of Petsc. I just did a "git pull" and 
re-build it.

How can I call the subroutine ?

In addition, I found two 'petscmat.h90' : 
petsc/include/petsc/finclude/ftn-auto/petscmat.h90 and 
petsc/src/mat/f90-mod/petscmat.h90.
The former defines a subroutine MatNullSpaceCreate in the above form 
(b). The latter provides generic interface for both (a) and (b).

I am not sure if this relates to the error I get.

Thank you.

Frank


Re: [petsc-users] Question about Set-up of Full MG and its Output

2016-12-07 Thread frank

Hello,

Thank you. Now I am able to see the trace of MG.
I still have a question about the interpolation. I wan to get the matrix 
of the default interpolation method and print it on terminal.

The code is as follow: ( KSP is already set by petsc options)
- 


132   CALL KSPGetPC( ksp, pc, ierr )
133   CALL MATCreate( PETSC_COMM_WORLD, interpMat, ierr )
134   CALL MATSetType( interpMat, MATSEQAIJ, ierr )
135   CALL MATSetSizes( interpMat, i5, i5, i5, i5, ierr )
136   CALL MATSetUp( interpMat, ierr )
137   CALL PCMGGetInterpolation( pc, i1, interpMat, ierr )
138   CALL MatAssemblyBegin( interpMat, MAT_FINAL_ASSEMBLY, ierr )
139   CALL MatAssemblyEnd( interpMat, MAT_FINAL_ASSEMBLY, ierr )
140   CALL MatView( interpMat, PETSC_VIEWER_STDOUT_SELF, ierr )
-

The error massage is:
---
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Must call PCMGSetInterpolation() or PCMGSetRestriction()
---

Do I have to set the interpolation first? How can I just print the 
default interpolation matrix?

I attached the option file.

Thank you.
Frank



On 12/06/2016 02:31 PM, Jed Brown wrote:

frank <hengj...@uci.edu> writes:


Dear all,

I am trying to use full MG to solve a 2D Poisson equation.

I want to set full MG as the solver and SOR as the smoother. Is the
following setup the proper way to do it?
   -ksp_typerichardson
   -pc_type  mg
   -pc_mg_type   full
   -mg_levels_ksp_type   richardson
   -mg_levels_pc_typesor

The ksp_view shows the levels from the coarsest mesh to finest mesh in a
linear order.

It is showing the solver configuration, not a trace of the cycle.


I was expecting sth like:  coarsest -> level1 -> coarsest -> level1 ->
level2 -> level1 -> coarsest -> ...
Is there a way to show exactly how the full MG proceeds?

You could get a trace like this from

-mg_coarse_ksp_converged_reason -mg_levels_ksp_converged_reason

If you want to deliminate the iterations, you could add -ksp_monitor.


Also in the above example, I want to know what interpolation or
prolongation method is used from level1 to level2.
Can I get that info by adding some options? (not using PCMGGetInterpolation)

I attached the ksp_view info and my petsc options file.
Thank you.

Frank
Linear solve converged due to CONVERGED_RTOL iterations 3
KSP Object: 1 MPI processes
   type: richardson
 Richardson: damping factor=1.
   maximum iterations=1
   tolerances:  relative=1e-07, absolute=1e-50, divergence=1.
   left preconditioning
   using nonzero initial guess
   using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
   type: mg
 MG: type is FULL, levels=6 cycles=v
   Using Galerkin computed coarse grid matrices
   Coarse grid solver -- level ---
 KSP Object: (mg_coarse_) 1 MPI processes
   type: preonly
   maximum iterations=1, initial guess is zero
   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
   left preconditioning
   using NONE norm type for convergence test
 PC Object: (mg_coarse_) 1 MPI processes
   type: lu
 out-of-place factorization
 tolerance for zero pivot 2.22045e-14
 using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
 matrix ordering: nd
 factor fill ratio given 0., needed 0.
   Factored matrix follows:
 Mat Object: 1 MPI processes
   type: superlu_dist
   rows=64, cols=64
   package used to perform factorization: superlu_dist
   total: nonzeros=0, allocated nonzeros=0
   total number of mallocs used during MatSetValues calls =0
 SuperLU_DIST run parameters:
   Process grid nprow 1 x npcol 1
   Equilibrate matrix TRUE
   Matrix input mode 0
   Replace tiny pivots FALSE
   Use iterative refinement FALSE
   Processors in row 1 col partition 1
   Row permutation LargeDiag
   Column permutation METIS_AT_PLUS_A
   Parallel symbolic factorization FALSE
   Repeated factorization SamePattern
   linear system matrix = precond matrix:
   Mat Object: 1 MPI processes
 type: seqaij
 rows=64, cols=64
 total: nonzeros=576, allocated nonzeros=576
 total number of mallocs used during MatSetValues calls =0
   not using I-node routines
   Down solver (pre-smoother) on level 1 

[petsc-users] Question about Set-up of Full MG and its Output

2016-12-06 Thread frank

Dear all,

I am trying to use full MG to solve a 2D Poisson equation.

I want to set full MG as the solver and SOR as the smoother. Is the 
following setup the proper way to do it?

 -ksp_typerichardson
 -pc_type  mg
 -pc_mg_type   full
 -mg_levels_ksp_type   richardson
 -mg_levels_pc_typesor

The ksp_view shows the levels from the coarsest mesh to finest mesh in a 
linear order.
I was expecting sth like:  coarsest -> level1 -> coarsest -> level1 -> 
level2 -> level1 -> coarsest -> ...

Is there a way to show exactly how the full MG proceeds?

Also in the above example, I want to know what interpolation or 
prolongation method is used from level1 to level2.

Can I get that info by adding some options? (not using PCMGGetInterpolation)

I attached the ksp_view info and my petsc options file.
Thank you.

Frank
Linear solve converged due to CONVERGED_RTOL iterations 3
KSP Object: 1 MPI processes
  type: richardson
Richardson: damping factor=1.
  maximum iterations=1
  tolerances:  relative=1e-07, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: mg
MG: type is FULL, levels=6 cycles=v
  Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level ---
KSP Object: (mg_coarse_) 1 MPI processes
  type: preonly
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object: (mg_coarse_) 1 MPI processes
  type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 0., needed 0.
  Factored matrix follows:
Mat Object: 1 MPI processes
  type: superlu_dist
  rows=64, cols=64
  package used to perform factorization: superlu_dist
  total: nonzeros=0, allocated nonzeros=0
  total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
  Process grid nprow 1 x npcol 1 
  Equilibrate matrix TRUE 
  Matrix input mode 0 
  Replace tiny pivots FALSE 
  Use iterative refinement FALSE 
  Processors in row 1 col partition 1 
  Row permutation LargeDiag 
  Column permutation METIS_AT_PLUS_A
  Parallel symbolic factorization FALSE 
  Repeated factorization SamePattern
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaij
rows=64, cols=64
total: nonzeros=576, allocated nonzeros=576
total number of mallocs used during MatSetValues calls =0
  not using I-node routines
  Down solver (pre-smoother) on level 1 ---
KSP Object: (mg_levels_1_) 1 MPI processes
  type: richardson
Richardson: damping factor=1.
  maximum iterations=1
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
  type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaij
rows=256, cols=256
total: nonzeros=2304, allocated nonzeros=2304
total number of mallocs used during MatSetValues calls =0
  not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 ---
KSP Object: (mg_levels_2_) 1 MPI processes
  type: richardson
Richardson: damping factor=1.
  maximum iterations=1
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI processes
  type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaij
rows=1024, cols=1024
total: nonzeros=9216, allocated nonzeros=9216
total number of mallocs used during MatSetValues calls =0
  not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 ---
KSP Object: (mg_levels_3_) 1 MPI

[petsc-users] Time cost by Vec Assembly

2016-10-07 Thread frank

Hello,


Another thing, the vector assemble and scatter take more time as I increased 
the cores#:

  cores#   4096 8192  
16384 32768  65536
VecAssemblyBegin   2982.91E+002.87E+008.59E+002.75E+01  
  2.21E+03
VecAssemblyEnd  2983.37E-031.78E-031.78E-03   
5.13E-031.99E-03
VecScatterBegin   763033.82E+003.01E+002.54E+004.40E+00 
   1.32E+00
VecScatterEnd  763033.09E+011.47E+012.23E+01
2.96E+012.10E+01

The above data is produced by solving a constant coefficients Possoin equation 
with different rhs for 100 steps.
As you can see, the time of VecAssemblyBegin increase dramatically from 32K 
cores to 65K.

Something is very very wrong here. It is likely not the VecAssemblyBegin() 
itself that is taking the huge amount of time. VecAssemblyBegin() is a barrier, 
that is all processes have to reach it before any process can continue beyond 
it. Something in the code on some processes is taking a huge amount of time 
before reaching that point. Perhaps it is in starting up all the processes?   
Or are you generating the entire rhs on one process? You can't to that.

Barry
(I create a new subject since this is a separate problem from my 
previous  question.)


Each process computes its part of the rhs.
The above result are from 100 steps' computation. It is not a 
starting-up issue.


I also have the results  from a simple code to show this problem:

cores#  4096  8192 16384
32768 65536
VecAssemblyBegin14.56E-023.27E-023.63E-02 6.26E-02
2.80E+02
VecAssemblyEnd   13.54E-043.43E-043.47E-04 3.44E-04
4.53E-04


Again, the time cost increases dramatically after 30K cores.
The max/min ratio of VecAssemblyBegin is 1.2 for both 30K and 65K cases. 
If there is a huge delay on some process, should this value be large?


The part of code that calls the assembly subroutines looks like:

  CALL DMCreateGlobalVector( ... )
  CALL DMDAVecGetArrayF90( ... )
 ... each process computes its part of rhs...
  CALL DMDAVecRestoreArrayF90(...)

  CALL VecAssemblyBegin( ... )
  CALL VecAssemblyEnd( ... )

Thank you

Regards,
Frank


On 10/04/2016 12:56 PM, Dave May wrote:


On Tuesday, 4 October 2016, frank <hengj...@uci.edu> wrote:
Hi,

This question is follow-up of the thread "Question about memory usage in Multigrid 
preconditioner".
I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver 
with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem.

Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one 
sub-communicator in all the tests. The difference between the petsc options in those 
tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the 
up/down solver. The function "ksp_solve" is timed. It is kind of slow and 
doesn't scale at all.

Test1: 512^3 grid points
Core#telescope_reduction_factorMG levels# for up/down solver
 Time for KSPSolve (s)
512 8 4 / 3 
 6.2466
4096   64   5 / 3   
   0.9361
32768 64   4 / 3
  4.8914

Test2: 1024^3 grid points
Core#telescope_reduction_factorMG levels# for up/down solver
 Time for KSPSolve (s)
4096   64   5 / 4   
   3.4139
8192   128 5 / 4
  2.4196
16384 32   5 / 3
  5.4150
32768 64   5 / 3
  5.6067
65536 128 5 / 3 
 6.5219

You have to be very careful how you interpret these numbers. Your solver 
contains nested calls to KSPSolve, and unfortunately as a result the numbers 
you report include setup time. This will remain true even if you call KSPSetUp 
on the outermost KSP.

Your email concerns scalability of the silver application, so let's focus on 
that issue.

The only way to clearly separate setup from solve time is to perform two 
identical solves. The second solve will not require any setup. You should 
monitor the second solve via a new PetscStage.

This was what I did in the telescope paper. It was the only way to understand 
the setup cost (and sca

[petsc-users] create global vector in latest version of petsc

2016-10-05 Thread frank

Hi,

I update petsc to the latest version by pulling from the repo. Then I 
find one of my old code, which worked before, output errors now.

After debugging, I find that the error is caused by "DMCreateGlobalVector".
I attach a short program which can re-produce the error. This program 
works well with an older version of petsc.

I also attach the script I used to configure petsc.

The error message is below. Did I miss something in the installation ? 
Thank you.


1 [0]PETSC ERROR: - Error Message 
--

  2 [0]PETSC ERROR: Null argument, when expecting valid pointer
  3 [0]PETSC ERROR: Null Object: Parameter # 2
  4 [0]PETSC ERROR: See 
http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
  5 [0]PETSC ERROR: Petsc Development GIT revision: 
v3.7.4-1571-g7fc5cb5  GIT Date: 2016-10-05 10:56:19 -0500
  6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx 
named kolmog1 by frank Wed Oct  5 17:40:07 2016
  7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " 
--known-memcmp-ok  --with-debugging="1 " --with-shared-libraries=0 
--with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " 
--download-parmetis="1 " --download-superlu_dist="1 " 
--download-hypre=1 PETSC_ARCH=gnu-dbg-32idx
  8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in 
/home/frank/petsc/src/vec/vec/interface/vector.c
  9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in 
/home/frank/petsc/src/dm/impls/da/dadist.c
 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in 
/home/frank/petsc/src/dm/interface/dm.c



Regards,
Frank



PROGRAM test_ksp

#include  
#include 

  USE petscdmda
  USE petscsys

  IMPLICIT NONE

  DM  :: decomp
  INTEGER :: N = 32, px = 2, py = 2, pz = 2, ierr
  Vec :: b
  

  CALL PetscInitialize( PETSC_NULL_CHARACTER, ierr ) 

  CALL DMDACreate3d( PETSC_COMM_WORLD, &
   & DM_BOUNDARY_PERIODIC, DM_BOUNDARY_PERIODIC, DM_BOUNDARY_PERIODIC, &
   & DMDA_STENCIL_STAR, N, N, N, px, py, pz, 1, 1, &
   & PETSC_NULL_INTEGER,  PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, &
   & decomp, ierr ) 

  CALL DMCreateGlobalVector( decomp, b, ierr ) ! cause error

  CALL VecDestroy( b, ierr ) 

  CALL DMDestroy( decomp, ierr )

  CALL PetscFinalize( ierr )

END PROGRAM test_ksp
#!/usr/bin/python

# Do the following before running this configure script [hopp2.nersc.gov]
#
# setenv XTPE_INFO_MESSAGE_OFF yes
# module add acml
# Order of the download and installation of libraries is crucial!!!

if __name__ == '__main__':
  import sys
  import os
  sys.path.insert(0, os.path.abspath('config'))
  import configure
  configure_options = [
'--known-mpi-shared=0 ',
'--known-memcmp-ok ',
'--with-debugging=1 ',
'--with-shared-libraries=0',
'--with-mpi-compilers=1 ',
#'--with-64-bit-indices',
'--download-blacs=1 ',
'--download-metis=1 ',
'--download-parmetis=1 ',
'--download-superlu_dist=1 ', 
'--download-hypre=1',
#'--with-hdf5-include=/usr/local/petsc/gnu-dbg-32idx/include',
#'--with-hdf5-lib=/usr/local/petsc/gnu-dbg-32idx/lib',
#'--download-netcdf=1',
#'--download-ml=1',
  ]
  configure.petsc_configure(configure_options)


Re: [petsc-users] Performance of the Telescope Multigrid Preconditioner

2016-10-04 Thread frank

Hi Dave,

Thank you for the reply.
What do you mean by the "nested calls to KSPSolve"?
I tried to call KSPSolve twice, but the the second solve converged in 0 
iteration. KSPSolve seems to remember the solution. How can I force both 
solves start from the same initial guess?


Thank you.

Frank


On 10/04/2016 12:56 PM, Dave May wrote:



On Tuesday, 4 October 2016, frank <hengj...@uci.edu 
<mailto:hengj...@uci.edu>> wrote:


Hi,

This question is follow-up of the thread "Question about memory
usage in Multigrid preconditioner".
I used to have the "Out of Memory(OOM)" problem when using the
CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0;
-matptap_scalable" option did solve that problem.

Then I test the scalability by solving a 3d poisson eqn for 1
step. I used one sub-communicator in all the tests. The difference
between the petsc options in those tests are: 1 the
pc_telescope_reduction_factor; 2 the number of multigrid levels in
the up/down solver. The function "ksp_solve" is timed. It is kind
of slow and doesn't scale at all.

Test1: 512^3 grid points
Core#telescope_reduction_factor MG levels# for up/down
solver Time for KSPSolve (s)
512 8 4 / 3  6.2466
4096   64 5 / 3  0.9361
32768 64 4 / 3  4.8914

Test2: 1024^3 grid points
Core#telescope_reduction_factor MG levels# for up/down
solver Time for KSPSolve (s)
4096   64 5 / 43.4139
8192   128 5 / 4  2.4196
16384 32 5 / 35.4150
32768 64 5 / 3  5.6067
65536 128 5 / 3  6.5219


You have to be very careful how you interpret these numbers. Your 
solver contains nested calls to KSPSolve, and unfortunately as a 
result the numbers you report include setup time. This will remain 
true even if you call KSPSetUp on the outermost KSP.


Your email concerns scalability of the silver application, so let's 
focus on that issue.


The only way to clearly separate setup from solve time is to perform 
two identical solves. The second solve will not require any setup. You 
should monitor the second solve via a new PetscStage.


This was what I did in the telescope paper. It was the only way to 
understand the setup cost (and scaling) cf the solve time (and scaling).


Thanks
  Dave

I guess I didn't set the MG levels properly. What would be the
efficient way to arrange the MG levels?
Also which preconditionr at the coarse mesh of the 2nd
communicator should I use to improve the performance?

I attached the test code and the petsc options file for the 1024^3
cube with 32768 cores.

Thank you.

Regards,
Frank






On 09/15/2016 03:35 AM, Dave May wrote:

HI all,

I the only unexpected memory usage I can see is associated with
the call to MatPtAP().
Here is something you can try immediately.
Run your code with the additional options
  -matrap 0 -matptap_scalable

I didn't realize this before, but the default behaviour of
MatPtAP in parallel is actually to to explicitly form the
transpose of P (e.g. assemble R = P^T) and then compute R.A.P.
You don't want to do this. The option -matrap 0 resolves this issue.

The implementation of P^T.A.P has two variants.
The scalable implementation (with respect to memory usage) is
selected via the second option -matptap_scalable.

Try it out - I see a significant memory reduction using these
options for particular mesh sizes / partitions.

I've attached a cleaned up version of the code you sent me.
There were a number of memory leaks and other issues.
The main points being
  * You should call DMDAVecGetArrayF90() before
VecAssembly{Begin,End}
  * You should call PetscFinalize(), otherwise the option
-log_summary (-log_view) will not display anything once the
program has completed.


Thanks,
  Dave


On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu
<javascript:_e(%7B%7D,'cvml','hengj...@uci.edu');>> wrote:

Hi Dave,

Sorry, I should have put more comment to explain the code.
The number of process in each dimension is the same: Px =
Py=Pz=P. So is the domain size.
So if the you want to run the code for a  512^3 grid points
on 16^3 cores, you need to set "-N 512 -P 16" in the command
line.
I add more comments and also fix an error in the attached
code. ( The error only effects the accuracy of solution but
not the memory usage. )

Thank you.
Frank


On 9/14/2016 9:05 PM, Dave May wrote:



On Thursday, 15 September 2016, Dave May
&l

[petsc-users] Performance of the Telescope Multigrid Preconditioner

2016-10-04 Thread frank

Hi,

This question is follow-up of the thread "Question about memory usage in 
Multigrid preconditioner".
I used to have the "Out of Memory(OOM)" problem when using the 
CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; 
-matptap_scalable" option did solve that problem.


Then I test the scalability by solving a 3d poisson eqn for 1 step. I 
used one sub-communicator in all the tests. The difference between the 
petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 
the number of multigrid levels in the up/down solver. The function 
"ksp_solve" is timed. It is kind of slow and doesn't scale at all.


Test1: 512^3 grid points
Core#telescope_reduction_factorMG levels# for up/down 
solver Time for KSPSolve (s)
512 8 4 / 
3  6.2466
4096   64   5 / 
3  0.9361
32768 64   4 / 
3  4.8914


Test2: 1024^3 grid points
Core#telescope_reduction_factorMG levels# for up/down 
solver Time for KSPSolve (s)
4096   64   5 / 4 
 3.4139
8192   128 5 / 
4  2.4196
16384 32   5 / 3 
 5.4150
32768 64   5 / 
3  5.6067
65536 128 5 / 
3  6.5219


I guess I didn't set the MG levels properly. What would be the efficient 
way to arrange the MG levels?
Also which preconditionr at the coarse mesh of the 2nd communicator 
should I use to improve the performance?


I attached the test code and the petsc options file for the 1024^3 cube 
with 32768 cores.


Thank you.

Regards,
Frank






On 09/15/2016 03:35 AM, Dave May wrote:

HI all,

I the only unexpected memory usage I can see is associated with the 
call to MatPtAP().

Here is something you can try immediately.
Run your code with the additional options
  -matrap 0 -matptap_scalable

I didn't realize this before, but the default behaviour of MatPtAP in 
parallel is actually to to explicitly form the transpose of P (e.g. 
assemble R = P^T) and then compute R.A.P.

You don't want to do this. The option -matrap 0 resolves this issue.

The implementation of P^T.A.P has two variants.
The scalable implementation (with respect to memory usage) is selected 
via the second option -matptap_scalable.


Try it out - I see a significant memory reduction using these options 
for particular mesh sizes / partitions.


I've attached a cleaned up version of the code you sent me.
There were a number of memory leaks and other issues.
The main points being
  * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
  * You should call PetscFinalize(), otherwise the option -log_summary 
(-log_view) will not display anything once the program has completed.



Thanks,
  Dave


On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu 
<mailto:hengj...@uci.edu>> wrote:


Hi Dave,

Sorry, I should have put more comment to explain the code.
The number of process in each dimension is the same: Px = Py=Pz=P.
So is the domain size.
So if the you want to run the code for a  512^3 grid points on
16^3 cores, you need to set "-N 512 -P 16" in the command line.
I add more comments and also fix an error in the attached code. (
The error only effects the accuracy of solution but not the memory
usage. )

Thank you.
Frank


On 9/14/2016 9:05 PM, Dave May wrote:



On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com
<mailto:dave.mayhe...@gmail.com>> wrote:



On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:

Hi,

I write a simple code to re-produce the error. I hope
this can help to diagnose the problem.
The code just solves a 3d poisson equation.


Why is the stencil width a runtime parameter?? And why is the
default value 2? For 7-pnt FD Laplace, you only need
a stencil width of 1.

Was this choice made to mimic something in the
real application code?


Please ignore - I misunderstood your usage of the param set by -P


I run the code on a 1024^3 mesh. The process partition is
32 * 32 * 32. That's when I re-produce the OOM error.
Each core has about 2G memory.
I al

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread frank

Hi,

I write a simple code to re-produce the error. I hope this can help to 
diagnose the problem.

The code just solves a 3d poisson equation.
I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. 
That's when I re-produce the OOM error. Each core has about 2G memory.
I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp 
solver works fine.

I attached the code, ksp_view_pre's output and my petsc option file.

Thank you.
Frank

On 09/09/2016 06:38 PM, Hengjie Wang wrote:

Hi Barry,

I checked. On the supercomputer, I had the option "-ksp_view_pre" but 
it is not in file I sent you. I am sorry for the confusion.


Regards,
Frank

On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov 
<mailto:bsm...@mcs.anl.gov>> wrote:



> On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu
<javascript:;>> wrote:
>
> Hi Barry,
>
> I think the first KSP view output is from -ksp_view_pre. Before
I submitted the test, I was not sure whether there would be OOM
error or not. So I added both -ksp_view_pre and -ksp_view.

  But the options file you sent specifically does NOT list the
-ksp_view_pre so how could it be from that?

   Sorry to be pedantic but I've spent too much time in the past
trying to debug from incorrect information and want to make sure
that the information I have is correct before thinking. Please
recheck exactly what happened. Rerun with the exact input file you
emailed if that is needed.

   Barry

>
> Frank
>
>
> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>   Why does ksp_view2.txt have two KSP views in it while
ksp_view1.txt has only one KSPView in it? Did you run two
different solves in the 2 case but not the one?
>>
    >>   Barry
>>
>>
>>
>>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu
<javascript:;>> wrote:
>>>
>>> Hi,
>>>
>>> I want to continue digging into the memory problem here.
>>> I did find a work around in the past, which is to use less
cores per node so that each core has 8G memory. However this is
deficient and expensive. I hope to locate the place that uses the
most memory.
>>>
>>> Here is a brief summary of the tests I did in past:
>>>> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> Maximum (over computational time) process memory: 
 total 7.0727e+08

>>> Current process memory:total
7.0727e+08
>>> Maximum (over computational time) space PetscMalloc()ed: 
total 6.3908e+11
>>> Current space PetscMalloc()ed:
total 1.8275e+09

>>>
>>>> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> Maximum (over computational time) process memory: 
 total 5.9431e+09

>>> Current process memory:total
5.9431e+09
>>> Maximum (over computational time) space PetscMalloc()ed: 
total 5.3202e+12
>>> Current space PetscMalloc()ed:
 total 5.4844e+09

>>>
>>>> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> OOM( Out Of Memory ) killer of the supercomputer
terminated the job during "KSPSolve".
>>>
>>> I attached the output of ksp_view( the third test's output is
from ksp_view_pre ), memory_view and also the petsc options.
>>>
>>> In all the tests, each core can access about 2G memory. In
test3, there are 4223139840 non-zeros in the matrix. This will
consume about 1.74M, using double precision. Considering some
extra memory used to store integer index, 2G memory should still
be way enough.
>>>
>>> Is there a way to find out which part of KSPSolve uses the
most memory?
>>> Thank you so much.
>>>
>>> BTW, there are 4 options remains unused and I don't understand
why they are omitted:
>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>>
>>>
>>> Regards,
>>> Frank
>>>
>>> On 07/13/2016 05:47 PM, Dave May wrote:
>>>>
>>>> On 14 July 2016 at 01:07, frank <hengj...@uci.edu
<javascript:;>> wrote:
>>>> Hi Dave,
>>&

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread frank

Hi Barry,

I think the first KSP view output is from -ksp_view_pre. Before I 
submitted the test, I was not sure whether there would be OOM error or 
not. So I added both -ksp_view_pre and -ksp_view.


Frank


On 09/09/2016 12:38 PM, Barry Smith wrote:

   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only 
one KSPView in it? Did you run two different solves in the 2 case but not the 
one?

   Barry




On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:

Hi,

I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per node so 
that each core has 8G memory. However this is deficient and expensive. I hope 
to locate the place that uses the most memory.

Here is a brief summary of the tests I did in past:

Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12

Maximum (over computational time) process memory:   total 7.0727e+08
Current process memory: 
total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
Current space PetscMalloc()ed:
total 1.8275e+09


Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24

Maximum (over computational time) process memory:   total 5.9431e+09
Current process memory: 
total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
Current space PetscMalloc()ed: 
total 5.4844e+09


Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24

 OOM( Out Of Memory ) killer of the supercomputer terminated the job during 
"KSPSolve".

I attached the output of ksp_view( the third test's output is from ksp_view_pre 
), memory_view and also the petsc options.

In all the tests, each core can access about 2G memory. In test3, there are 
4223139840 non-zeros in the matrix. This will consume about 1.74M, using double 
precision. Considering some extra memory used to store integer index, 2G memory 
should still be way enough.

Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.

BTW, there are 4 options remains unused and I don't understand why they are 
omitted:
-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson


Regards,
Frank

On 07/13/2016 05:47 PM, Dave May wrote:


On 14 July 2016 at 01:07, frank <hengj...@uci.edu> wrote:
Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There are 
4223139840 allocated non-zeros and 18432 MPI processes. Double precision is 
used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.

No - I totally f***ed it up. You are correct. That'll teach me for fumbling 
around with my iphone calculator and not using my brain. (Note that to convert 
to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert 
between units correctly)

 From the PETSc objects associated with the solver, It looks like it _should_ 
run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere 
in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge 
over allocation (e.g. as per our discussion of MatPtAP); or in your application 
code there are other objects you have forgotten to log the memory for.



I am running this job on Bluewater
I am using the 7 points FD stencil in 3D.

I thought so on both counts.
  


I apologize that I made a stupid mistake in computing the memory per core. My settings 
render each core can access only 2G memory on average instead of 8G which I mentioned in 
previous email. I re-run the job with 8G memory per core on average and there is no 
"Out Of Memory" error. I would do more test to see if there is still some 
memory issue.

Ok. I'd still like to know where the memory was being used since my estimates 
were off.


Thanks,
   Dave
  


Regards,
Frank



On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote:
Hi Dave,

I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
option file is attached.
I still got the "Out Of Memory" error. The error occurred before the linear 
solver finished one step. So I don't have the full info from ksp_view. The info from 
ksp_view_pre is attached.

Okay - that is essentially useless (sorry)
  


It seems to me that the error occurred when the decomposition was going to be 
changed.

Based on what informat

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread frank

Hi,

I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per 
node so that each core has 8G memory. However this is deficient and 
expensive. I hope to locate the place that uses the most memory.


Here is a brief summary of the tests I did in past:
> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
Maximum (over computational time) process memory:   total 
7.0727e+08

Current process memory: total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
Current space PetscMalloc()ed:   total 1.8275e+09

> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
Maximum (over computational time) process memory:   total 
5.9431e+09

Current process memory: total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
Current space PetscMalloc()ed: total 5.4844e+09

> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
OOM( Out Of Memory ) killer of the supercomputer terminated the job 
during "KSPSolve".


I attached the output of ksp_view( the third test's output is from 
ksp_view_pre ), memory_view and also the petsc options.


In all the tests, each core can access about 2G memory. In test3, there 
are 4223139840 non-zeros in the matrix. This will consume about 1.74M, 
using double precision. Considering some extra memory used to store 
integer index, 2G memory should still be way enough.


Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.

BTW, there are 4 options remains unused and I don't understand why they 
are omitted:

-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson


Regards,
Frank

On 07/13/2016 05:47 PM, Dave May wrote:



On 14 July 2016 at 01:07, frank <hengj...@uci.edu 
<mailto:hengj...@uci.edu>> wrote:


Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There
are 4223139840 allocated non-zeros and 18432 MPI processes. Double
precision is used. So the memory per process is:
  4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.


No - I totally f***ed it up. You are correct. That'll teach me for 
fumbling around with my iphone calculator and not using my brain. 
(Note that to convert to MB just divide by 1e6, not 1024^2 - although 
I apparently cannot convert between units correctly)


From the PETSc objects associated with the solver, It looks like it 
_should_ run with 2GB per MPI rank. Sorry for my mistake. 
Possibilities are: somewhere in your usage of PETSc you've introduced 
a memory leak; PETSc is doing a huge over allocation (e.g. as per our 
discussion of MatPtAP); or in your application code there are other 
objects you have forgotten to log the memory for.




I am running this job on Bluewater
<https://bluewaters.ncsa.illinois.edu/user-guide>

I am using the 7 points FD stencil in 3D.


I thought so on both counts.


I apologize that I made a stupid mistake in computing the memory
per core. My settings render each core can access only 2G memory
on average instead of 8G which I mentioned in previous email. I
re-run the job with 8G memory per core on average and there is no
"Out Of Memory" error. I would do more test to see if there is
still some memory issue.


Ok. I'd still like to know where the memory was being used since my 
estimates were off.



Thanks,
  Dave


Regards,
Frank



On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank <hengj...@uci.edu
<mailto:hengj...@uci.edu>> wrote:

Hi Dave,

I re-run the test using bjacobi as the preconditioner on the
coarse mesh of telescope. The Grid is 3072*256*768 and
process mesh is 96*8*24. The petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred
before the linear solver finished one step. So I don't have
the full info from ksp_view. The info from ksp_view_pre is
attached.


Okay - that is essentially useless (sorry)


It seems to me that the error occurred when the decomposition
was going to be changed.


Based on what information?
Running with -info would give us more clues, but will create a
ton of output.
Please try running the case which failed with -info

I had another test with a grid of 1536*128*384 and the same
process mesh as above. There was no error. The ksp_view info
is attached for comparison.
Thank you.



[3] Here is my crude estimate of y

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-13 Thread frank

Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There are 
4223139840 allocated non-zeros and 18432 MPI processes. Double precision 
is used. So the memory per process is:

  4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.

I am running this job on Bluewater 
<https://bluewaters.ncsa.illinois.edu/user-guide>

I am using the 7 points FD stencil in 3D.

I apologize that I made a stupid mistake in computing the memory per 
core. My settings render each core can access only 2G memory on average 
instead of 8G which I mentioned in previous email. I re-run the job with 
8G memory per core on average and there is no "Out Of Memory" error. I 
would do more test to see if there is still some memory issue.


Regards,
Frank


On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank <hengj...@uci.edu 
<mailto:hengj...@uci.edu>> wrote:


Hi Dave,

I re-run the test using bjacobi as the preconditioner on the
coarse mesh of telescope. The Grid is 3072*256*768 and process
mesh is 96*8*24. The petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred before
the linear solver finished one step. So I don't have the full info
from ksp_view. The info from ksp_view_pre is attached.


Okay - that is essentially useless (sorry)


It seems to me that the error occurred when the decomposition was
going to be changed.


Based on what information?
Running with -info would give us more clues, but will create a ton of 
output.

Please try running the case which failed with -info

I had another test with a grid of 1536*128*384 and the same
process mesh as above. There was no error. The ksp_view info is
attached for comparison.
Thank you.



[3] Here is my crude estimate of your memory usage.
I'll target the biggest memory hogs only to get an order of magnitude 
estimate


* The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per 
MPI rank assuming double precision.
The indices for the AIJ could amount to another 0.3 GB (assuming 32 
bit integers)


* You use 5 levels of coarsening, so the other operators should 
represent (collectively)
2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the 
communicator with 18432 ranks.
The coarse grid should consume ~ 0.5 MB per MPI rank on the 
communicator with 18432 ranks.


* You use a reduction factor of 64, making the new communicator with 
288 MPI ranks.
PCTelescope will first gather a temporary matrix associated with your 
coarse level operator assuming a comm size of 288 living on the comm 
with size 18432.
This matrix will require approximately 0.5 * 64 = 32 MB per core on 
the 288 ranks.
This matrix is then used to form a new MPIAIJ matrix on the subcomm, 
thus require another 32 MB per rank.

The temporary matrix is now destroyed.

* Because a DMDA is detected, a permutation matrix is assembled.
This requires 2 doubles per point in the DMDA.
Your coarse DMDA contains 92 x 16 x 48 points.
Thus the permutation matrix will require < 1 MB per MPI rank on the 
sub-comm.


* Lastly, the matrix is permuted. This uses MatPtAP(), but the 
resulting operator will have the same memory footprint as the 
unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 
operators of size 32 MB are held in memory when the DMDA is provided.


From my rough estimates, the worst case memory foot print for any 
given core, given your options is approximately

2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
This is way below 8 GB.

Note this estimate completely ignores:
(1) the memory required for the restriction operator,
(2) the potential growth in the number of non-zeros per row due to 
Galerkin coarsening (I wished -ksp_view_pre reported the output from 
MatView so we could see the number of non-zeros required by the coarse 
level operators)
(3) all temporary vectors required by the CG solver, and those 
required by the smoothers.

(4) internal memory allocated by MatPtAP
(5) memory associated with IS's used within PCTelescope

So either I am completely off in my estimates, or you have not 
carefully estimated the memory usage of your application code. 
Hopefully others might examine/correct my rough estimates


Since I don't have your code I cannot access the latter.
Since I don't have access to the same machine you are running on, I 
think we need to take a step back.


[1] What machine are you running on? Send me a URL if its available

[2] What discretization are you using? (I am guessing a scalar 7 point 
FD stencil)
If it's a 7 point FD stencil, we should be able to examine the memory 
usage of your solver configuration using a standard, light weight 
existing PETSc example, run on your machine at the same scale.
This would hopefully enable us to correctly evaluate the

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread frank

Hi Barry and Dave,

Thank both of you for the advice.

@Barry
I made a mistake in the file names in last email. I attached the correct 
files this time.

For all the three tests, 'Telescope' is used as the coarse preconditioner.

== Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
Part of the memory usage:  Vector   125124 3971904 0.
 Matrix   101 101  
9462372 0


== Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
Part of the memory usage:  Vector   125124 681672 0.
 Matrix   101 101  
1462180 0.


In theory, the memory usage in Test1 should be 8 times of Test2. In my 
case, it is about 6 times.


== Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per 
process: 32*32*32

Here I get the out of memory error.

I tried to use -mg_coarse jacobi. In this way, I don't need to set 
-mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?

The linear solver didn't work in this case. Petsc output some errors.

@Dave
In test3, I use only one instance of 'Telescope'. On the coarse mesh of 
'Telescope', I used LU as the preconditioner instead of SVD.
If my set the levels correctly, then on the last coarse mesh of MG where 
it calls 'Telescope', the sub-domain per process is 2*2*2.
On the last coarse mesh of 'Telescope', there is only one grid point per 
process.

I still got the OOM error. The detailed petsc option file is attached.


Thank you so much.

Frank



On 07/06/2016 02:51 PM, Barry Smith wrote:

On Jul 6, 2016, at 4:19 PM, frank <hengj...@uci.edu> wrote:

Hi Barry,

Thank you for you advice.
I tried three test. In the 1st test, the grid is 3072*256*768 and the process 
mesh is 96*8*24.
The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as 
the preconditioner at the coarse mesh.
The system gives me the "Out of Memory" error before the linear system is 
completely solved.
The info from '-ksp_view_pre' is attached. I seems to me that the error occurs 
when it reaches the coarse mesh.

The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd 
test uses the same grid but a different process mesh 48*4*12.

Are you sure this is right? The total matrix and vector memory usage goes 
from 2nd test
   Vector   384383  8,193,712 0.
   Matrix   103103 11,508,688 0.
to 3rd test
  Vector   384383  1,590,520 0.
   Matrix   103103  3,508,664 0.
that is the memory usage got smaller but if you have only 1/8th the processes 
and the same grid it should have gotten about 8 times bigger. Did you maybe cut 
the grid by a factor of 8 also? If so that still doesn't explain it because the 
memory usage changed by a factor of 5 something for the vectors and 3 something 
for the matrices.



The linear solver and petsc options in 2nd and 3rd tests are the same in 1st 
test. The linear solver works fine in both test.
I attached the memory usage of the 2nd and 3rd tests. The memory info is from 
the option '-log_summary'. I tried to use '-momery_info' as you suggested, but 
in my case petsc treated it as an unused option. It output nothing about the 
memory. Do I need to add sth to my code so I can use '-memory_info'?

Sorry, my mistake the option is -memory_view

   Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 
1 (just so it doesn't iterate forever) to see how much memory is used without 
the telescope? Also run case 2 the same way.

   Barry




In both tests the memory usage is not large.

It seems to me that it might be the 'telescope'  preconditioner that allocated 
a lot of memory and caused the error in the 1st test.
Is there is a way to show how much memory it allocated?

Frank

On 07/05/2016 03:37 PM, Barry Smith wrote:

   Frank,

 You can run with -ksp_view_pre to have it "view" the KSP before the solve 
so hopefully it gets that far.

  Please run the problem that does fit with -memory_info when the problem completes 
it will show the "high water mark" for PETSc allocated memory and total memory 
used. We first want to look at these numbers to see if it is using more memory than you 
expect. You could also run with say half the grid spacing to see how the memory usage 
scaled with the increase in grid points. Make the runs also with -log_view and send all 
the output from these options.

Barry


On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote:

Hi,

I am using the CG ksp solver and Multigrid preconditioner  to solve a linear 
system in parallel.
I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its 
good performance.
The petsc options file is attached.

The domain is a 3d box.
It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. When I 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-06 Thread frank

Hi Barry,

Thank you for you advice.
I tried three test. In the 1st test, the grid is 3072*256*768 and the 
process mesh is 96*8*24.
The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is 
used as the preconditioner at the coarse mesh.
The system gives me the "Out of Memory" error before the linear system 
is completely solved.
The info from '-ksp_view_pre' is attached. I seems to me that the error 
occurs when it reaches the coarse mesh.


The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. 
The 3rd test uses the same grid but a different process mesh 48*4*12.
The linear solver and petsc options in 2nd and 3rd tests are the same in 
1st test. The linear solver works fine in both test.
I attached the memory usage of the 2nd and 3rd tests. The memory info is 
from the option '-log_summary'. I tried to use '-momery_info' as you 
suggested, but in my case petsc treated it as an unused option. It 
output nothing about the memory. Do I need to add sth to my code so I 
can use '-memory_info'?

In both tests the memory usage is not large.

It seems to me that it might be the 'telescope'  preconditioner that 
allocated a lot of memory and caused the error in the 1st test.

Is there is a way to show how much memory it allocated?

Frank

On 07/05/2016 03:37 PM, Barry Smith wrote:

   Frank,

 You can run with -ksp_view_pre to have it "view" the KSP before the solve 
so hopefully it gets that far.

  Please run the problem that does fit with -memory_info when the problem completes 
it will show the "high water mark" for PETSc allocated memory and total memory 
used. We first want to look at these numbers to see if it is using more memory than you 
expect. You could also run with say half the grid spacing to see how the memory usage 
scaled with the increase in grid points. Make the runs also with -log_view and send all 
the output from these options.

Barry


On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote:

Hi,

I am using the CG ksp solver and Multigrid preconditioner  to solve a linear 
system in parallel.
I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its 
good performance.
The petsc options file is attached.

The domain is a 3d box.
It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. When I 
double the size of grid and keep the same process mesh and petsc options, I get an 
"out of memory" error from the super-cluster I am using.
Each process has access to at least 8G memory, which should be more than enough 
for my application. I am sure that all the other parts of my code( except the 
linear solver ) do not use much memory. So I doubt if there is something wrong 
with the linear solver.
The error occurs before the linear system is completely solved so I don't have 
the info from ksp view. I am not able to re-produce the error with a smaller 
problem either.
In addition,  I tried to use the block jacobi as the preconditioner with the 
same grid and same decomposition. The linear solver runs extremely slow but 
there is no memory error.

How can I diagnose what exactly cause the error?
Thank you so much.

Frank



KSP Object: 18432 MPI processes
  type: cg
  maximum iterations=1
  tolerances:  relative=1e-07, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
  type: mg
  PC has not been set up so information may be incomplete
MG: type is MULTIPLICATIVE, levels=4 cycles=v
  Cycles per PCApply=1
  Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level ---
KSP Object:(mg_coarse_) 18432 MPI processes
  type: preonly
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using DEFAULT norm type for convergence test
PC Object:(mg_coarse_) 18432 MPI processes
  type: redundant
  PC has not been set up so information may be incomplete
Redundant preconditioner: Not yet setup
  Down solver (pre-smoother) on level 1 ---
KSP Object:(mg_levels_1_) 18432 MPI processes
  type: chebyshev
Chebyshev: eigenvalue estimates:  min = 0., max = 0.
  maximum iterations=2, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object:(mg_levels_1_) 18432 MPI processes
  type: sor
  PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 ---
KSP Obj

[petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-05 Thread frank

Hi,

I am using the CG ksp solver and Multigrid preconditioner  to solve a 
linear system in parallel.
I chose to use the 'Telescope' as the preconditioner on the coarse mesh 
for its good performance.

The petsc options file is attached.

The domain is a 3d box.
It works well when the grid is  1536*128*384 and the process mesh is 
96*8*24. When I double the size of grid and keep the same process mesh 
and petsc options, I get an "out of memory" error from the super-cluster 
I am using.
Each process has access to at least 8G memory, which should be more than 
enough for my application. I am sure that all the other parts of my 
code( except the linear solver ) do not use much memory. So I doubt if 
there is something wrong with the linear solver.
The error occurs before the linear system is completely solved so I 
don't have the info from ksp view. I am not able to re-produce the error 
with a smaller problem either.
In addition,  I tried to use the block jacobi as the preconditioner with 
the same grid and same decomposition. The linear solver runs extremely 
slow but there is no memory error.


How can I diagnose what exactly cause the error?
Thank you so much.

Frank
-ksp_typecg 
-ksp_norm_type   unpreconditioned
-ksp_lag_norm
-ksp_rtol1e-7
-ksp_initial_guess_nonzero  yes
-ksp_converged_reason 
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 4
-mg_levels_ksp_type richardson 
-mg_levels_ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left
-log_summary

# Setting dmdarepart on subcomm
-repart_da_processors_x 24
-repart_da_processors_y 2
-repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
#-mg_coarse_telescope_ksp_constant_null_space
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type svd
#-mg_coarse_telescope_mg_coarse_pc_type telescope
#-mg_coarse_telescope_mg_coarse_pc_telescope_reduction_factor 64

# Second subcomm
#-mg_coarse_telescope_mg_coarse_telescope_ksp_type preonly
#-mg_coarse_telescope_mg_coarse_telescope_pc_type mg
#-mg_coarse_telescope_mg_coarse_telescope_pc_mg_galerkin
#-mg_coarse_telescope_mg_coarse_telescope_pc_mg_levels 3
#-mg_coarse_telescope_mg_coarse_telescope_mg_levels_ksp_type richardson
#-mg_coarse_telescope_mg_coarse_telescope_mg_levels_ksp_max_it 1
#-mg_coarse_telescope_mg_coarse_telescope_mg_coarse_ksp_type richardson
#-mg_coarse_telescope_mg_coarse_telescope_mg_coarse_pc_type svd


Re: [petsc-users] Question about using Hypre with OpenMP under Petsc

2015-05-26 Thread frank

Hi Barry,

Thank you for your prompt reply.
Which executable lib should I use ldd to check?

Thank you,
Frank.

On 05/26/2015 02:41 PM, Barry Smith wrote:

On May 26, 2015, at 4:18 PM, frank hengj...@uci.edu wrote:

Hi

I am trying to use multigrid to solve a large sparse linear system. I use Hypre 
boomeramg as the preconditioner. The code calling KSPSolve is paralleled by MPI.

I want to set Hypre to use OpenMP. Here is what I did:
* I downloaded and compiled Hypre through Petsc
* I recompiled the Hypre with  --with-opemp .

Ok, you need to make sure that PETSc is linking against the OpenMP compiled 
version of hypre libraries. Use ldd on linux or otool -L on Mac.


* I set -pc_type hypre and -pc_type_type boomeramg for Petsc.

My question:
? In this way, would Hypre use OpenMP to parallel the execution when KSPSolve 
is called ?
? If this does not work, is there another way I can set Hypre to use OpenMP 
under Petsc ?
? Is there a way I can know explicitly whether Hypre is using OpenMP under 
Petsc or not ?

Your question really has little to do with PETSc and more to do with hypre. 
You need to look through the hypre documentation and find out how you control 
the number of OpenMP threads that hypre uses (likely it is some environmental 
variables).  Then run varying this number of threads and see what happens, if 
you use more threads does it go faster?  It is best to make this test with just 
a single MPI process and 1,2,4, 8 OpenMP threads.

   Barry









Thank you so much
Frank








[petsc-users] Question about using Hypre with OpenMP under Petsc

2015-05-26 Thread frank

Hi

I am trying to use multigrid to solve a large sparse linear system. I 
use Hypre boomeramg as the preconditioner. The code calling KSPSolve is 
paralleled by MPI.


I want to set Hypre to use OpenMP. Here is what I did:
* I downloaded and compiled Hypre through Petsc
* I recompiled the Hypre with  --with-opemp .
* I set -pc_type hypre and -pc_type_type boomeramg for Petsc.

My question:
? In this way, would Hypre use OpenMP to parallel the execution when 
KSPSolve is called ?
? If this does not work, is there another way I can set Hypre to use 
OpenMP under Petsc ?
? Is there a way I can know explicitly whether Hypre is using OpenMP 
under Petsc or not ?


Thank you so much
Frank






[petsc-users] Dense matrix solver

2013-09-19 Thread Frank
Hi,

I am thinking of solving a linear equations, whose coefficient matrix
has 27 nonzero diagonal bands(diagonal dominant). Does any body have any
idea about how this will perform? Do you have any recommendation about
which solver to choose?
I have solved a 11 nonzero diagonal bands matrix equation with
boomeramg, which is 50% solwer than 7 nonzero diagonal bands.

Thanks.


[petsc-users] Will matrix-free faster than solving linearized equation?

2013-09-13 Thread Frank

Hi,

Currently, I am solving a nonlinear equation with some linearization  
method. I am thinking to modify it with a nonlinear solver. With PETSc 
library, I am confident to do it. I just want to ask those who have 
experiences with nonlinear solver if matrix-free method will be faster.


Thank you very much.



[petsc-users] How to reuse matrix A

2013-09-09 Thread Frank

Hi,

I am solving Ax=b repeatedly, and A does not change all the time. I did 
things like this:


%1. Set up entries for matrix A%%%
CALL MATASSEMBLYBEGIN(A,MAT_FINAL_ASSEMBLY,IERR)
CALL MATASSEMBLYEND(A,MAT_FINAL_ASSEMBLY,IERR)
CALL MATSETOPTION(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr)
CALL KSPSetOperators(ksp,A,A,SAME_NONZERO_PATTERN,ierr)

Call KSPSetOperators(ksp,A,A,SAME_NONZERO_PATTERN,ierr)

   Do i=1,100
   % Set up entries for vector b%
  CALL VECASSEMBLYBEGIN(b,IERR)
  CALL VECASSEMBLYEND(b,IERR)
  CALL KSPSolve(ksp,b,x,ierr)
ENDDO

However, it does not work. I don't know why. Could anybody help me with 
this? Thank you very much.


Sincerely
Xingjun Fang


[petsc-users] Weird memory leakage

2013-08-25 Thread Frank

Hi,
I have very weird problem here.
I am using FORTRAN to call PETSc to solve Poisson equation.
When I run my code with 8 cores, it works fine, and the consumed memory 
does not increase. However, when it is run with 64 cores, first of all 
it gives lots of error like this:


[n310:18951] [[62652,0],2] - [[62652,0],10] (node: n219) oob-tcp:
Number of attempts to create TCP connection has been exceeded. Can not
communicate with peer
[n310:18951] [[62652,0],2] - [[62652,0],18] (node: n128) oob-tcp:
Number of attempts to create TCP connection has been exceeded. Can not
communicate with peer
[n310:18951] [[62652,0],2] - [[62652,0],34] (node: n089) oob-tcp:
Number of attempts to create TCP connection has been exceeded. Can not
communicate with peer
[n310:18951] [[62652,0],2] ORTED_CMD_PROCESSOR: STUCK IN INFINITE LOOP -
ABORTING
[n310:18951] *** Process received signal ***
[n310:18951] Signal: Aborted (6)
[n310:18951] Signal code: (-6)
[n310:18951] [ 0] /lib64/libpthread.so.0() [0x35b120f500]
[n310:18951] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x35b0e328a5]
[n310:18951] [ 2] /lib64/libc.so.6(abort+0x175) [0x35b0e34085]
[n310:18951] [ 3]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_daemon_cmd_processor+0x243)
[0x2ae5e02f0813]
[n310:18951] [ 4]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_base_loop+0x31a)
[0x2ae5e032f56a]
[n310:18951] [ 5]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_loop+0x12)
[0x2ae5e032f242]
[n310:18951] [ 6]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_progress+0x5c)
[0x2ae5e031845c]
[n310:18951] [ 7]
/global/software/openmpi-1.6.1-intel1/lib/openmpi/mca_grpcomm_bad.so(+0x1bd7)
[0x2ae5e28debd7]
[n310:18951] [ 8]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_ess_base_orted_finalize+0x1e)
[0x2ae5e02f431e]
[n310:18951] [ 9]
/global/software/openmpi-1.6.1-intel1/lib/openmpi/mca_ess_tm.so(+0x1294)
[0x2ae5e1ab1294]
[n310:18951] [10]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_finalize+0x4e)
[0x2ae5e02d0fbe]
[n310:18951] [11]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(+0x4840b)
[0x2ae5e02f040b]
[n310:18951] [12]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_base_loop+0x31a)
[0x2ae5e032f56a]
[n310:18951] [13]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_loop+0x12)
[0x2ae5e032f242]
[n310:18951] [14]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_progress+0x5c)
[0x2ae5e031845c]
[n310:18951] [15]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_trigger_event+0x50)
[0x2ae5e02dc930]
[n310:18951] [16]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(+0x4916f)
[0x2ae5e02f116f]
[n310:18951] [17]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_daemon_cmd_processor+0x149)
[0x2ae5e02f0719]
[n310:18951] [18]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_base_loop+0x31a)
[0x2ae5e032f56a]
[n310:18951] [19]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_loop+0x12)
[0x2ae5e032f242]
[n310:18951] [20]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(opal_event_dispatch+0x8)
[0x2ae5e032f228]
[n310:18951] [21]
/global/software/openmpi-1.6.1-intel1/lib/libopen-rte.so.4(orte_daemon+0x9f0)
[0x2ae5e02ef8a0]
[n310:18951] [22] orted(main+0x88) [0x4024d8]
[n310:18951] [23] /lib64/libc.so.6(__libc_start_main+0xfd) [0x35b0e1ecdd]
[n310:18951] [24] orted() [0x402389]
[n310:18951] *** End of error message ***

but the program still gives the right result for a short period. After that, it 
suddenly stopped because memory exceeds some limit. I don't understand this. If 
there is memory leakage in my code, how come it can work with 8 cores? Please 
help me.Thank you so much!

Sincerely
Xingjun




[petsc-users] FORTRAN 90 with PETSc

2013-08-16 Thread Frank

Hi,

I am using PETSc to iterate a problem, that is to say I call KSPSolve 
repeatedly.
Firstly, I write all the PETSc components in one subroutine, including 
MatCreate, VecCreateMPI, etc. Everything works fine.
Then, I want to only initialize ksp once outside the loop, and the 
matrix and rhs is changed within the loop repeatedly. Here are my problem:


1. I tried to use COMMON to transfer the following variables. I include 
petsc.var in the solver subroutine.  It cannot be compiled.

petsc.var
Vec x,b
Mat A
KSP ksp
PC  pc
COMMON /MYPETSC/x, b, A,ksp,pc

2. I defined the following in the main program:
PROGRAM MAIN
#include finclude/petscsys.h
#include finclude/petscvec.h
#include finclude/petscmat.h
#include finclude/petscpc.h
#include finclude/petscksp.h
Vec x,b
Mat A
KSP ksp
PC  pc
..
CALL INIT_PETSC(ksp,pc,A,x,b)
..
CALL LOOP(ksp,pc,A,x,b)

END PROGRAM
!---
SUBROUTINE LOOP(ksp,pc,A,x,b)
Vec x,b
Mat A
KSP ksp
PC  pc
..
CALL SOLVE(ksp,pc,A,x,b)
...
END SUBROUTINE
!---
SUBROUTINE SOLVE(ksp,pc,A,x,b)
Vec x,b
Mat A
KSP ksp
PC  pc

..
CALL (ksp, b,x,ierr)
END SUBROUTINE

It can be compiled, but ksp does not iterate.

Could you please explain to me the reason and solution for this problem.

Thank you very much.