Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Jed Brown
It isn't always wrong to link threaded BLAS. For example, a user might need to 
call threaded BLAS on the side (but the application can only link one) or a 
sparse direct solver might want threading for the supernode. We could test at 
runtime whether child threads exist/are created when calling BLAS and deliver a 
warning. 

Barry Smith  writes:

>There would need to be, for example, some symbol in all the threaded BLAS 
> libraries that is not in the unthreaded libraries. Of at least in some of the 
> threaded libraries but never in the unthreaded. 
>
>BlasLapack.py could check for the special symbol(s) to determine.
>
>   Barry
>
>
>> On Dec 7, 2022, at 4:47 PM, Mark Lohry  wrote:
>> 
>> Thanks, yes, I figured out the OMP_NUM_THREADS=1 way while triaging it, and 
>> the --download-fblaslapack way occurred to me. 
>> 
>> I was hoping for something that "just worked" (refuse to build in this case) 
>> but I don't know if it's programmatically possible for petsc to tell whether 
>> or not it's linking to a threaded BLAS?
>> 
>> On Wed, Dec 7, 2022 at 4:35 PM Satish Balay > > wrote:
>>> If you don't specify a blas to use - petsc configure will guess and use 
>>> what it can find.
>>> 
>>> So only way to force it use a particular blas is to specify one [one way is 
>>> --download-fblaslapack]
>>> 
>>> Wrt multi-thread openblas -  you can force it run single threaded [by one 
>>> of these 2 env variables]
>>> 
>>> # Use single thread openblas
>>> export OPENBLAS_NUM_THREADS=1
>>> export OMP_NUM_THREADS=1
>>> 
>>> Satish
>>> 
>>> 
>>> On Wed, 7 Dec 2022, Mark Lohry wrote:
>>> 
>>> > I ran into an unexpected issue -- on an NP-core machine, each MPI rank of
>>> > my application was launching NP threads, such that when running with
>>> > multiple ranks the machine was quickly oversubscribed and performance
>>> > tanked.
>>> > 
>>> > The root cause of this was petsc linking against the system-provided
>>> > library (libopenblas0-pthread in this case) set by the update-alternatives
>>> > in ubuntu. At some point this machine got updated to using the threaded
>>> > blas implementation instead of serial; not sure how, and I wouldn't have
>>> > noticed if I weren't running interactively.
>>> > 
>>> > Is there any mechanism in petsc or its build system to prevent linking
>>> > against an inappropriate BLAS, or do I need to be diligent about manually
>>> > setting the BLAS library in the configuration stage?
>>> > 
>>> > Thanks,
>>> > Mark
>>> > 
>>> 


Re: [petsc-users] Petsc Section in DMPlex

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 9:21 PM Nicholas Arnold-Medabalimi <
narno...@umich.edu> wrote:

> Thank you for the help.
>
> I think the last piece of the puzzle is how do I create the "expanded IS"
> from the subpoint IS using the section?
>

Loop over the points in the IS. For each point, get the dof and offset from
the Section. Make a new IS that has all the
dogs, namely each run [offset, offset+dof).

  Thanks,

 Matt


> Sincerely
> Nicholas
>
>
> On Wed, Dec 7, 2022 at 7:06 AM Matthew Knepley  wrote:
>
>> On Wed, Dec 7, 2022 at 6:51 AM Nicholas Arnold-Medabalimi <
>> narno...@umich.edu> wrote:
>>
>>> Hi
>>>
>>> Thank you so much for your patience. One thing to note: I don't have any
>>> need to go back from the filtered distributed mapping back to the full but
>>> it is good to know.
>>>
>>> One aside question.
>>> 1) Is natural and global ordering the same in this context?
>>>
>>
>> No.
>>
>>
>>> As far as implementing what you have described.
>>>
>>> When I call ISView on the generated SubpointIS, I get an unusual error
>>> which I'm not sure how to interpret. (this case is running on 2 ranks and
>>> the filter label has points located on both ranks of the original DM.
>>> However, if I manually get the indices (the commented lines), it seems to
>>> not have any issues.
>>> call DMPlexFilter(dmplex_full, iBlankLabel, 1, dmplex_filtered,ierr)
>>> call DMPlexGetSubpointIS(dmplex_filtered, subpointsIS,ierr)
>>> !call ISGetIndicesF90(subpointsIS, subPointKey,ierr)
>>> !write(*,*) subPointKey
>>> !call ISRestoreIndicesF90(subpointsIS, subPointKey,ierr)
>>> call ISView(subpointsIS,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>
>>> [1]PETSC ERROR: - Error Message
>>> --
>>> [1]PETSC ERROR: Arguments must have same communicators
>>> [1]PETSC ERROR: Different communicators in the two objects: Argument # 1
>>> and 2 flag 3
>>> [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>>> [1]PETSC ERROR: Petsc Development GIT revision: v3.18.1-320-g7810d690132
>>>  GIT Date: 2022-11-20 20:25:41 -0600
>>> [1]PETSC ERROR: Configure options with-fc=mpiifort with-mpi-f90=mpiifort
>>> --download-triangle --download-parmetis --download-metis --with-debugging=1
>>> --download-hdf5 --prefix=/home/narnoldm/packages/petsc_install
>>> [1]PETSC ERROR: #1 ISView() at
>>> /home/narnoldm/packages/petsc/src/vec/is/is/interface/index.c:1629
>>>
>>
>> The problem here is the subpointsIS is a _serial_ object, and you are
>> using a parallel viewer. You can use PETSC_VIEWER_STDOUT_SELF,
>> or you can pull out the singleton viewer from STDOUT_WORLD if you want
>> them all to print in order.
>>
>>
>>> As far as the overall process you have described my question on first
>>> glance is do I have to allocate/create the vector that is output by
>>> VecISCopy before calling it, or does it create the vector automatically?
>>>
>>
>> You create both vectors. I would do it using DMCreateGlobalVector() from
>> both DMs.
>>
>>
>>> I think I would need to create it first using a section and Setting the
>>> Vec in the filtered DM?
>>>
>>
>> Setting the Section in the filtered DM.
>>
>>
>>> And I presume in this case I would be using the scatter reverse option
>>> to go from the full set to the reduced set?
>>>
>>
>> Yes
>>
>>   Thanks
>>
>>  Matt
>>
>>
>>> Sincerely
>>> Nicholas
>>>
>>>
>>>
>>>
>>>
>>>
>>> Sincerely
>>> Nick
>>>
>>> On Wed, Dec 7, 2022 at 6:00 AM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Dec 7, 2022 at 3:35 AM Nicholas Arnold-Medabalimi <
 narno...@umich.edu> wrote:

> Hi Matthew
>
> Thank you for the help. This clarified a great deal.
>
> I have a follow-up question related to DMPlexFilter. It may be better
> to describe what I'm trying to achieve.
>
> I have a general mesh I am solving which has a section with cell
> center finite volume states, as described in my initial email. After
> calculating some metrics, I tag a bunch of cells with an identifying Label
> and use DMFilter to generate a new DM which is only that subset of cells.
> Generally, this leads to a pretty unbalanced DM so I then plan to use
> DMPlexDIstribute to balance that DM across the processors. The coordinates
> pass along fine, but the state(or I should say Section) does not at least
> as far as I can tell.
>
> Assuming I can get a filtered DM I then distribute the DM and state
> using the method you described above and it seems to be working ok.
>
> The last connection I have to make is the transfer of information from
> the full mesh to the "sampled" filtered mesh. From what I can gather I
> would need to get the mapping of points using DMPlexGetSubpointIS and then
> manually copy the values from the full DM section to the filtered DM? I
> have the process from full->filtered->distributed all working for the
> coordinates so its just a matter of transferring 

Re: [petsc-users] Petsc Section in DMPlex

2022-12-07 Thread Nicholas Arnold-Medabalimi
Thank you for the help.

I think the last piece of the puzzle is how do I create the "expanded IS"
from the subpoint IS using the section?


Sincerely
Nicholas


On Wed, Dec 7, 2022 at 7:06 AM Matthew Knepley  wrote:

> On Wed, Dec 7, 2022 at 6:51 AM Nicholas Arnold-Medabalimi <
> narno...@umich.edu> wrote:
>
>> Hi
>>
>> Thank you so much for your patience. One thing to note: I don't have any
>> need to go back from the filtered distributed mapping back to the full but
>> it is good to know.
>>
>> One aside question.
>> 1) Is natural and global ordering the same in this context?
>>
>
> No.
>
>
>> As far as implementing what you have described.
>>
>> When I call ISView on the generated SubpointIS, I get an unusual error
>> which I'm not sure how to interpret. (this case is running on 2 ranks and
>> the filter label has points located on both ranks of the original DM.
>> However, if I manually get the indices (the commented lines), it seems to
>> not have any issues.
>> call DMPlexFilter(dmplex_full, iBlankLabel, 1, dmplex_filtered,ierr)
>> call DMPlexGetSubpointIS(dmplex_filtered, subpointsIS,ierr)
>> !call ISGetIndicesF90(subpointsIS, subPointKey,ierr)
>> !write(*,*) subPointKey
>> !call ISRestoreIndicesF90(subpointsIS, subPointKey,ierr)
>> call ISView(subpointsIS,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>> [1]PETSC ERROR: - Error Message
>> --
>> [1]PETSC ERROR: Arguments must have same communicators
>> [1]PETSC ERROR: Different communicators in the two objects: Argument # 1
>> and 2 flag 3
>> [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [1]PETSC ERROR: Petsc Development GIT revision: v3.18.1-320-g7810d690132
>>  GIT Date: 2022-11-20 20:25:41 -0600
>> [1]PETSC ERROR: Configure options with-fc=mpiifort with-mpi-f90=mpiifort
>> --download-triangle --download-parmetis --download-metis --with-debugging=1
>> --download-hdf5 --prefix=/home/narnoldm/packages/petsc_install
>> [1]PETSC ERROR: #1 ISView() at
>> /home/narnoldm/packages/petsc/src/vec/is/is/interface/index.c:1629
>>
>
> The problem here is the subpointsIS is a _serial_ object, and you are
> using a parallel viewer. You can use PETSC_VIEWER_STDOUT_SELF,
> or you can pull out the singleton viewer from STDOUT_WORLD if you want
> them all to print in order.
>
>
>> As far as the overall process you have described my question on first
>> glance is do I have to allocate/create the vector that is output by
>> VecISCopy before calling it, or does it create the vector automatically?
>>
>
> You create both vectors. I would do it using DMCreateGlobalVector() from
> both DMs.
>
>
>> I think I would need to create it first using a section and Setting the
>> Vec in the filtered DM?
>>
>
> Setting the Section in the filtered DM.
>
>
>> And I presume in this case I would be using the scatter reverse option to
>> go from the full set to the reduced set?
>>
>
> Yes
>
>   Thanks
>
>  Matt
>
>
>> Sincerely
>> Nicholas
>>
>>
>>
>>
>>
>>
>> Sincerely
>> Nick
>>
>> On Wed, Dec 7, 2022 at 6:00 AM Matthew Knepley  wrote:
>>
>>> On Wed, Dec 7, 2022 at 3:35 AM Nicholas Arnold-Medabalimi <
>>> narno...@umich.edu> wrote:
>>>
 Hi Matthew

 Thank you for the help. This clarified a great deal.

 I have a follow-up question related to DMPlexFilter. It may be better
 to describe what I'm trying to achieve.

 I have a general mesh I am solving which has a section with cell center
 finite volume states, as described in my initial email. After calculating
 some metrics, I tag a bunch of cells with an identifying Label and use
 DMFilter to generate a new DM which is only that subset of cells.
 Generally, this leads to a pretty unbalanced DM so I then plan to use
 DMPlexDIstribute to balance that DM across the processors. The coordinates
 pass along fine, but the state(or I should say Section) does not at least
 as far as I can tell.

 Assuming I can get a filtered DM I then distribute the DM and state
 using the method you described above and it seems to be working ok.

 The last connection I have to make is the transfer of information from
 the full mesh to the "sampled" filtered mesh. From what I can gather I
 would need to get the mapping of points using DMPlexGetSubpointIS and then
 manually copy the values from the full DM section to the filtered DM? I
 have the process from full->filtered->distributed all working for the
 coordinates so its just a matter of transferring the section correctly.

 I appreciate all the help you have provided.

>>>
>>> Let's do this in two steps, which makes it easier to debug. First, do
>>> not redistribute the submesh. Just use DMPlexGetSubpointIS()
>>> to get the mapping of filtered points to points in the original mesh.
>>> Then create an expanded IS using the Section which makes
>>> dofs in the filtered mesh to dofs in the original 

Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

2022-12-07 Thread Junchao Zhang
Hi, Philip,
 I could reproduce the error. I need to find a  way to debug it.  Thanks.

/home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in
"System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds
1e-10
*** 1 failure is detected in the test module "Regression"


--Junchao Zhang


On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip  wrote:

> I think it would be simpler to use the develop branch for this issue. But
> you can still just build the SystemTester. Then (if you changed the PSI_1
> case) run:
>
>  ./test/system/SystemTester -t System/PSI_1 -- -v​
>
> (No need for multiple MPI ranks)
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Monday, December 5, 2022 15:40
> *To:* Fackler, Philip 
> *Cc:* xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>; Blondel, Sophie ; Roth,
> Philip 
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> I configured with xolotl branch feature-petsc-kokkos, and typed `make`
> under ~/xolotl-build/.  Though there were errors,  a lot of *Tester were
> built.
>
> [ 62%] Built target xolotlViz
> [ 63%] Linking CXX executable TemperatureProfileHandlerTester
> [ 64%] Linking CXX executable TemperatureGradientHandlerTester
> [ 64%] Built target TemperatureProfileHandlerTester
> [ 64%] Built target TemperatureConstantHandlerTester
> [ 64%] Built target TemperatureGradientHandlerTester
> [ 65%] Linking CXX executable HeatEquationHandlerTester
> [ 65%] Built target HeatEquationHandlerTester
> [ 66%] Linking CXX executable FeFitFluxHandlerTester
> [ 66%] Linking CXX executable W111FitFluxHandlerTester
> [ 67%] Linking CXX executable FuelFitFluxHandlerTester
> [ 67%] Linking CXX executable W211FitFluxHandlerTester
>
> Which Tester should I use to run with the parameter file
> benchmarks/params_system_PSI_2.txt? And how many ranks should I use?
> Could you give an example command line?
> Thanks.
>
> --Junchao Zhang
>
>
> On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang 
> wrote:
>
> Hello, Philip,
>Do I still need to use the feature-petsc-kokkos branch?
> --Junchao Zhang
>
>
> On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip 
> wrote:
>
> Junchao,
>
> Thank you for working on this. If you open the parameter file for, say,
> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add 
> -dm_mat_type
> aijkokkos -dm_vec_type kokkos​` to the "petscArgs=" field (or the
> corresponding cusparse/cuda option).
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Thursday, December 1, 2022 17:05
> *To:* Fackler, Philip 
> *Cc:* xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>; Blondel, Sophie ; Roth,
> Philip 
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> Hi, Philip,
>   Sorry for the long delay.  I could not get something useful from the
> -log_view output.  Since I have already built xolotl, could you give me
> instructions on how to do a xolotl test to reproduce the divergence with
> petsc GPU backends (but fine on CPU)?
>   Thank you.
> --Junchao Zhang
>
>
> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip 
> wrote:
>
> -- PETSc
> Performance Summary:
> --
>
> Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16
> 14:36:46 2022
> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date:
> 2022-10-28 14:39:41 +
>
>  Max   Max/Min Avg   Total
> Time (sec):   6.023e+00 1.000   6.023e+00
> Objects:  1.020e+02 1.000   1.020e+02
> Flops:1.080e+09 1.000   1.080e+09  1.080e+09
> Flops/sec:1.793e+08 1.000   1.793e+08  1.793e+08
> MPI Msg Count:0.000e+00 0.000   0.000e+00  0.000e+00
> MPI Msg Len (bytes):  0.000e+00 0.000   0.000e+00  0.000e+00
> MPI Reductions:   0.000e+00 0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> 

Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Barry Smith

   There would need to be, for example, some symbol in all the threaded BLAS 
libraries that is not in the unthreaded libraries. Of at least in some of the 
threaded libraries but never in the unthreaded. 

   BlasLapack.py could check for the special symbol(s) to determine.

  Barry


> On Dec 7, 2022, at 4:47 PM, Mark Lohry  wrote:
> 
> Thanks, yes, I figured out the OMP_NUM_THREADS=1 way while triaging it, and 
> the --download-fblaslapack way occurred to me. 
> 
> I was hoping for something that "just worked" (refuse to build in this case) 
> but I don't know if it's programmatically possible for petsc to tell whether 
> or not it's linking to a threaded BLAS?
> 
> On Wed, Dec 7, 2022 at 4:35 PM Satish Balay  > wrote:
>> If you don't specify a blas to use - petsc configure will guess and use what 
>> it can find.
>> 
>> So only way to force it use a particular blas is to specify one [one way is 
>> --download-fblaslapack]
>> 
>> Wrt multi-thread openblas -  you can force it run single threaded [by one of 
>> these 2 env variables]
>> 
>> # Use single thread openblas
>> export OPENBLAS_NUM_THREADS=1
>> export OMP_NUM_THREADS=1
>> 
>> Satish
>> 
>> 
>> On Wed, 7 Dec 2022, Mark Lohry wrote:
>> 
>> > I ran into an unexpected issue -- on an NP-core machine, each MPI rank of
>> > my application was launching NP threads, such that when running with
>> > multiple ranks the machine was quickly oversubscribed and performance
>> > tanked.
>> > 
>> > The root cause of this was petsc linking against the system-provided
>> > library (libopenblas0-pthread in this case) set by the update-alternatives
>> > in ubuntu. At some point this machine got updated to using the threaded
>> > blas implementation instead of serial; not sure how, and I wouldn't have
>> > noticed if I weren't running interactively.
>> > 
>> > Is there any mechanism in petsc or its build system to prevent linking
>> > against an inappropriate BLAS, or do I need to be diligent about manually
>> > setting the BLAS library in the configuration stage?
>> > 
>> > Thanks,
>> > Mark
>> > 
>> 



Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Mark Lohry
Thanks, yes, I figured out the OMP_NUM_THREADS=1 way while triaging it, and
the --download-fblaslapack way occurred to me.

I was hoping for something that "just worked" (refuse to build in this
case) but I don't know if it's programmatically possible for petsc to tell
whether or not it's linking to a threaded BLAS?

On Wed, Dec 7, 2022 at 4:35 PM Satish Balay  wrote:

> If you don't specify a blas to use - petsc configure will guess and use
> what it can find.
>
> So only way to force it use a particular blas is to specify one [one way
> is --download-fblaslapack]
>
> Wrt multi-thread openblas -  you can force it run single threaded [by one
> of these 2 env variables]
>
> # Use single thread openblas
> export OPENBLAS_NUM_THREADS=1
> export OMP_NUM_THREADS=1
>
> Satish
>
>
> On Wed, 7 Dec 2022, Mark Lohry wrote:
>
> > I ran into an unexpected issue -- on an NP-core machine, each MPI rank of
> > my application was launching NP threads, such that when running with
> > multiple ranks the machine was quickly oversubscribed and performance
> > tanked.
> >
> > The root cause of this was petsc linking against the system-provided
> > library (libopenblas0-pthread in this case) set by the
> update-alternatives
> > in ubuntu. At some point this machine got updated to using the threaded
> > blas implementation instead of serial; not sure how, and I wouldn't have
> > noticed if I weren't running interactively.
> >
> > Is there any mechanism in petsc or its build system to prevent linking
> > against an inappropriate BLAS, or do I need to be diligent about manually
> > setting the BLAS library in the configuration stage?
> >
> > Thanks,
> > Mark
> >
>
>


Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Satish Balay via petsc-users
If you don't specify a blas to use - petsc configure will guess and use what it 
can find.

So only way to force it use a particular blas is to specify one [one way is 
--download-fblaslapack]

Wrt multi-thread openblas -  you can force it run single threaded [by one of 
these 2 env variables]

# Use single thread openblas
export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1

Satish


On Wed, 7 Dec 2022, Mark Lohry wrote:

> I ran into an unexpected issue -- on an NP-core machine, each MPI rank of
> my application was launching NP threads, such that when running with
> multiple ranks the machine was quickly oversubscribed and performance
> tanked.
> 
> The root cause of this was petsc linking against the system-provided
> library (libopenblas0-pthread in this case) set by the update-alternatives
> in ubuntu. At some point this machine got updated to using the threaded
> blas implementation instead of serial; not sure how, and I wouldn't have
> noticed if I weren't running interactively.
> 
> Is there any mechanism in petsc or its build system to prevent linking
> against an inappropriate BLAS, or do I need to be diligent about manually
> setting the BLAS library in the configuration stage?
> 
> Thanks,
> Mark
> 



Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Barry Smith


  We don't have configure code to detect if the BLAS is thread parallel, nor do 
we have code to tell it not to use a thread parallel version. 

  Except if it is using MKL then we do force it to not use the threaded BLAS.

  A "cheat" would be for you to just set the environmental variable BLAS uses 
for number of threads to 1 always, then you would not need to worry about 
checking to avoid the "bad" library.

  Barry




> On Dec 7, 2022, at 4:21 PM, Mark Lohry  wrote:
> 
> I ran into an unexpected issue -- on an NP-core machine, each MPI rank of my 
> application was launching NP threads, such that when running with multiple 
> ranks the machine was quickly oversubscribed and performance tanked.
> 
> The root cause of this was petsc linking against the system-provided library 
> (libopenblas0-pthread in this case) set by the update-alternatives in ubuntu. 
> At some point this machine got updated to using the threaded blas 
> implementation instead of serial; not sure how, and I wouldn't have noticed 
> if I weren't running interactively.
> 
> Is there any mechanism in petsc or its build system to prevent linking 
> against an inappropriate BLAS, or do I need to be diligent about manually 
> setting the BLAS library in the configuration stage?
> 
> Thanks,
> Mark



[petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Mark Lohry
I ran into an unexpected issue -- on an NP-core machine, each MPI rank of
my application was launching NP threads, such that when running with
multiple ranks the machine was quickly oversubscribed and performance
tanked.

The root cause of this was petsc linking against the system-provided
library (libopenblas0-pthread in this case) set by the update-alternatives
in ubuntu. At some point this machine got updated to using the threaded
blas implementation instead of serial; not sure how, and I wouldn't have
noticed if I weren't running interactively.

Is there any mechanism in petsc or its build system to prevent linking
against an inappropriate BLAS, or do I need to be diligent about manually
setting the BLAS library in the configuration stage?

Thanks,
Mark


Re: [petsc-users] About Preconditioner and MUMPS

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 8:15 AM 김성익  wrote:

> Following your comments,
> I used below command
> mpirun -np 4 ./app -ksp_type preonly -pc_type mpi
> -mpi_linear_solver_server -mpi_pc_type lu -mpi_pc_factor_mat_solver_type
> mumps -mpi_mat_mumps_icntl_7 5 -mpi_ksp_view
>
>
> so the output is as below
>
> KSP Object: (mpi_) 1 MPI process
>   type: gmres
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object: (mpi_) 1 MPI process
>   type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: external
> factor fill ratio given 0., needed 0.
>   Factored matrix follows:
> Mat Object: (mpi_) 1 MPI process
>   type: mumps
>   rows=192, cols=192
>   package used to perform factorization: mumps
>   total: nonzeros=17334, allocated nonzeros=17334
> MUMPS run parameters:
>   Use -mpi_ksp_view ::ascii_info_detail to display information
> for all processes
>   RINFOG(1) (global estimated flops for the elimination after
> analysis): 949441.
>   RINFOG(2) (global estimated flops for the assembly after
> factorization): 18774.
>   RINFOG(3) (global estimated flops for the elimination after
> factorization): 949441.
>   (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
>   INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 17334
>   INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 1724
>   INFOG(5) (estimated maximum front size in the complete
> tree): 96
>   INFOG(6) (number of nodes in the complete tree): 16
>   INFOG(7) (ordering option effectively used after analysis): 5
>   INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100
>   INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 17334
>   INFOG(10) (total integer space store the matrix factors
> after factorization): 1724
>   INFOG(11) (order of largest frontal matrix after
> factorization): 96
>   INFOG(12) (number of off-diagonal pivots): 0
>   INFOG(13) (number of delayed pivots after factorization): 0
>   INFOG(14) (number of memory compress after factorization): 0
>   INFOG(15) (number of steps of iterative refinement after
> solution): 0
>   INFOG(16) (estimated size (in MB) of all MUMPS internal data
> for factorization after analysis: value on the most memory consuming
> processor): 1
>   INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 1
>   INFOG(18) (size of all MUMPS internal data allocated during
> factorization: value on the most memory consuming processor): 1
>   INFOG(19) (size of all MUMPS internal data allocated during
> factorization: sum over all processors): 1
>   INFOG(20) (estimated number of entries in the factors): 17334
>   INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 1
>   INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 1
>   INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0
>   INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1
>   INFOG(25) (after factorization: number of pivots modified by
> static pivoting): 0
>   INFOG(28) (after factorization: number of null pivots
> encountered): 0
>   INFOG(29) (after factorization: effective number of entries
> in the factors (sum over all processors)): 17334
>   INFOG(30, 31) (after solution: size in Mbytes of memory used
> during solution phase): 0, 0
>   INFOG(32) (after analysis: type of analysis done): 1
>   INFOG(33) (value used for ICNTL(8)): 7
>   INFOG(34) (exponent of the determinant if determinant is
> requested): 0
>   INFOG(35) (after factorization: number of entries taking
> into account BLR factor compression - sum over all processors): 17334
>   INFOG(36) (after analysis: estimated size of all MUMPS
> internal data for running BLR in-core - value on the most memory consuming
> processor): 0
>   INFOG(37) (after analysis: estimated size of all MUMPS
> internal data for running BLR in-core - sum over all processors): 0
>   INFOG(38) (after 

Re: [petsc-users] About Preconditioner and MUMPS

2022-12-07 Thread 김성익
Following your comments,
I used below command
mpirun -np 4 ./app -ksp_type preonly -pc_type mpi -mpi_linear_solver_server
-mpi_pc_type lu -mpi_pc_factor_mat_solver_type mumps -mpi_mat_mumps_icntl_7
5 -mpi_ksp_view


so the output is as below

KSP Object: (mpi_) 1 MPI process
  type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: (mpi_) 1 MPI process
  type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: external
factor fill ratio given 0., needed 0.
  Factored matrix follows:
Mat Object: (mpi_) 1 MPI process
  type: mumps
  rows=192, cols=192
  package used to perform factorization: mumps
  total: nonzeros=17334, allocated nonzeros=17334
MUMPS run parameters:
  Use -mpi_ksp_view ::ascii_info_detail to display information
for all processes
  RINFOG(1) (global estimated flops for the elimination after
analysis): 949441.
  RINFOG(2) (global estimated flops for the assembly after
factorization): 18774.
  RINFOG(3) (global estimated flops for the elimination after
factorization): 949441.
  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
(0.,0.)*(2^0)
  INFOG(3) (estimated real workspace for factors on all
processors after analysis): 17334
  INFOG(4) (estimated integer workspace for factors on all
processors after analysis): 1724
  INFOG(5) (estimated maximum front size in the complete tree):
96
  INFOG(6) (number of nodes in the complete tree): 16
  INFOG(7) (ordering option effectively used after analysis): 5
  INFOG(8) (structural symmetry in percent of the permuted
matrix after analysis): 100
  INFOG(9) (total real/complex workspace to store the matrix
factors after factorization): 17334
  INFOG(10) (total integer space store the matrix factors after
factorization): 1724
  INFOG(11) (order of largest frontal matrix after
factorization): 96
  INFOG(12) (number of off-diagonal pivots): 0
  INFOG(13) (number of delayed pivots after factorization): 0
  INFOG(14) (number of memory compress after factorization): 0
  INFOG(15) (number of steps of iterative refinement after
solution): 0
  INFOG(16) (estimated size (in MB) of all MUMPS internal data
for factorization after analysis: value on the most memory consuming
processor): 1
  INFOG(17) (estimated size of all MUMPS internal data for
factorization after analysis: sum over all processors): 1
  INFOG(18) (size of all MUMPS internal data allocated during
factorization: value on the most memory consuming processor): 1
  INFOG(19) (size of all MUMPS internal data allocated during
factorization: sum over all processors): 1
  INFOG(20) (estimated number of entries in the factors): 17334
  INFOG(21) (size in MB of memory effectively used during
factorization - value on the most memory consuming processor): 1
  INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 1
  INFOG(23) (after analysis: value of ICNTL(6) effectively
used): 0
  INFOG(24) (after analysis: value of ICNTL(12) effectively
used): 1
  INFOG(25) (after factorization: number of pivots modified by
static pivoting): 0
  INFOG(28) (after factorization: number of null pivots
encountered): 0
  INFOG(29) (after factorization: effective number of entries
in the factors (sum over all processors)): 17334
  INFOG(30, 31) (after solution: size in Mbytes of memory used
during solution phase): 0, 0
  INFOG(32) (after analysis: type of analysis done): 1
  INFOG(33) (value used for ICNTL(8)): 7
  INFOG(34) (exponent of the determinant if determinant is
requested): 0
  INFOG(35) (after factorization: number of entries taking into
account BLR factor compression - sum over all processors): 17334
  INFOG(36) (after analysis: estimated size of all MUMPS
internal data for running BLR in-core - value on the most memory consuming
processor): 0
  INFOG(37) (after analysis: estimated size of all MUMPS
internal data for running BLR in-core - sum over all processors): 0
  INFOG(38) (after analysis: estimated size of all MUMPS
internal data for running BLR out-of-core - value on the most memory
consuming processor): 0
  INFOG(39) (after analysis: estimated size of all MUMPS
internal data for running BLR out-of-core - sum 

Re: [petsc-users] Petsc Section in DMPlex

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 6:51 AM Nicholas Arnold-Medabalimi <
narno...@umich.edu> wrote:

> Hi
>
> Thank you so much for your patience. One thing to note: I don't have any
> need to go back from the filtered distributed mapping back to the full but
> it is good to know.
>
> One aside question.
> 1) Is natural and global ordering the same in this context?
>

No.


> As far as implementing what you have described.
>
> When I call ISView on the generated SubpointIS, I get an unusual error
> which I'm not sure how to interpret. (this case is running on 2 ranks and
> the filter label has points located on both ranks of the original DM.
> However, if I manually get the indices (the commented lines), it seems to
> not have any issues.
> call DMPlexFilter(dmplex_full, iBlankLabel, 1, dmplex_filtered,ierr)
> call DMPlexGetSubpointIS(dmplex_filtered, subpointsIS,ierr)
> !call ISGetIndicesF90(subpointsIS, subPointKey,ierr)
> !write(*,*) subPointKey
> !call ISRestoreIndicesF90(subpointsIS, subPointKey,ierr)
> call ISView(subpointsIS,PETSC_VIEWER_STDOUT_WORLD,ierr)
>
> [1]PETSC ERROR: - Error Message
> --
> [1]PETSC ERROR: Arguments must have same communicators
> [1]PETSC ERROR: Different communicators in the two objects: Argument # 1
> and 2 flag 3
> [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [1]PETSC ERROR: Petsc Development GIT revision: v3.18.1-320-g7810d690132
>  GIT Date: 2022-11-20 20:25:41 -0600
> [1]PETSC ERROR: Configure options with-fc=mpiifort with-mpi-f90=mpiifort
> --download-triangle --download-parmetis --download-metis --with-debugging=1
> --download-hdf5 --prefix=/home/narnoldm/packages/petsc_install
> [1]PETSC ERROR: #1 ISView() at
> /home/narnoldm/packages/petsc/src/vec/is/is/interface/index.c:1629
>

The problem here is the subpointsIS is a _serial_ object, and you are using
a parallel viewer. You can use PETSC_VIEWER_STDOUT_SELF,
or you can pull out the singleton viewer from STDOUT_WORLD if you want them
all to print in order.


> As far as the overall process you have described my question on first
> glance is do I have to allocate/create the vector that is output by
> VecISCopy before calling it, or does it create the vector automatically?
>

You create both vectors. I would do it using DMCreateGlobalVector() from
both DMs.


> I think I would need to create it first using a section and Setting the
> Vec in the filtered DM?
>

Setting the Section in the filtered DM.


> And I presume in this case I would be using the scatter reverse option to
> go from the full set to the reduced set?
>

Yes

  Thanks

 Matt


> Sincerely
> Nicholas
>
>
>
>
>
>
> Sincerely
> Nick
>
> On Wed, Dec 7, 2022 at 6:00 AM Matthew Knepley  wrote:
>
>> On Wed, Dec 7, 2022 at 3:35 AM Nicholas Arnold-Medabalimi <
>> narno...@umich.edu> wrote:
>>
>>> Hi Matthew
>>>
>>> Thank you for the help. This clarified a great deal.
>>>
>>> I have a follow-up question related to DMPlexFilter. It may be better to
>>> describe what I'm trying to achieve.
>>>
>>> I have a general mesh I am solving which has a section with cell center
>>> finite volume states, as described in my initial email. After calculating
>>> some metrics, I tag a bunch of cells with an identifying Label and use
>>> DMFilter to generate a new DM which is only that subset of cells.
>>> Generally, this leads to a pretty unbalanced DM so I then plan to use
>>> DMPlexDIstribute to balance that DM across the processors. The coordinates
>>> pass along fine, but the state(or I should say Section) does not at least
>>> as far as I can tell.
>>>
>>> Assuming I can get a filtered DM I then distribute the DM and state
>>> using the method you described above and it seems to be working ok.
>>>
>>> The last connection I have to make is the transfer of information from
>>> the full mesh to the "sampled" filtered mesh. From what I can gather I
>>> would need to get the mapping of points using DMPlexGetSubpointIS and then
>>> manually copy the values from the full DM section to the filtered DM? I
>>> have the process from full->filtered->distributed all working for the
>>> coordinates so its just a matter of transferring the section correctly.
>>>
>>> I appreciate all the help you have provided.
>>>
>>
>> Let's do this in two steps, which makes it easier to debug. First, do not
>> redistribute the submesh. Just use DMPlexGetSubpointIS()
>> to get the mapping of filtered points to points in the original mesh.
>> Then create an expanded IS using the Section which makes
>> dofs in the filtered mesh to dofs in the original mesh. From this use
>>
>>   https://petsc.org/main/docs/manualpages/Vec/VecISCopy/
>>
>> to move values between the original vector and the filtered vector.
>>
>> Once that works, you can try redistributing the filtered mesh. Before
>> calling DMPlexDistribute() on the filtered mesh, you need to call
>>
>>   

Re: [petsc-users] Petsc Section in DMPlex

2022-12-07 Thread Nicholas Arnold-Medabalimi
Hi

Thank you so much for your patience. One thing to note: I don't have any
need to go back from the filtered distributed mapping back to the full but
it is good to know.

One aside question.
1) Is natural and global ordering the same in this context?

As far as implementing what you have described.

When I call ISView on the generated SubpointIS, I get an unusual error
which I'm not sure how to interpret. (this case is running on 2 ranks and
the filter label has points located on both ranks of the original DM.
However, if I manually get the indices (the commented lines), it seems to
not have any issues.
call DMPlexFilter(dmplex_full, iBlankLabel, 1, dmplex_filtered,ierr)
call DMPlexGetSubpointIS(dmplex_filtered, subpointsIS,ierr)
!call ISGetIndicesF90(subpointsIS, subPointKey,ierr)
!write(*,*) subPointKey
!call ISRestoreIndicesF90(subpointsIS, subPointKey,ierr)
call ISView(subpointsIS,PETSC_VIEWER_STDOUT_WORLD,ierr)

[1]PETSC ERROR: - Error Message
--
[1]PETSC ERROR: Arguments must have same communicators
[1]PETSC ERROR: Different communicators in the two objects: Argument # 1
and 2 flag 3
[1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.18.1-320-g7810d690132
 GIT Date: 2022-11-20 20:25:41 -0600
[1]PETSC ERROR: Configure options with-fc=mpiifort with-mpi-f90=mpiifort
--download-triangle --download-parmetis --download-metis --with-debugging=1
--download-hdf5 --prefix=/home/narnoldm/packages/petsc_install
[1]PETSC ERROR: #1 ISView() at
/home/narnoldm/packages/petsc/src/vec/is/is/interface/index.c:1629


As far as the overall process you have described my question on first
glance is do I have to allocate/create the vector that is output by
VecISCopy before calling it, or does it create the vector automatically? I
think I would need to create it first using a section and Setting the Vec
in the filtered DM? And I presume in this case I would be using the scatter
reverse option to go from the full set to the reduced set?


Sincerely
Nicholas






Sincerely
Nick

On Wed, Dec 7, 2022 at 6:00 AM Matthew Knepley  wrote:

> On Wed, Dec 7, 2022 at 3:35 AM Nicholas Arnold-Medabalimi <
> narno...@umich.edu> wrote:
>
>> Hi Matthew
>>
>> Thank you for the help. This clarified a great deal.
>>
>> I have a follow-up question related to DMPlexFilter. It may be better to
>> describe what I'm trying to achieve.
>>
>> I have a general mesh I am solving which has a section with cell center
>> finite volume states, as described in my initial email. After calculating
>> some metrics, I tag a bunch of cells with an identifying Label and use
>> DMFilter to generate a new DM which is only that subset of cells.
>> Generally, this leads to a pretty unbalanced DM so I then plan to use
>> DMPlexDIstribute to balance that DM across the processors. The coordinates
>> pass along fine, but the state(or I should say Section) does not at least
>> as far as I can tell.
>>
>> Assuming I can get a filtered DM I then distribute the DM and state using
>> the method you described above and it seems to be working ok.
>>
>> The last connection I have to make is the transfer of information from
>> the full mesh to the "sampled" filtered mesh. From what I can gather I
>> would need to get the mapping of points using DMPlexGetSubpointIS and then
>> manually copy the values from the full DM section to the filtered DM? I
>> have the process from full->filtered->distributed all working for the
>> coordinates so its just a matter of transferring the section correctly.
>>
>> I appreciate all the help you have provided.
>>
>
> Let's do this in two steps, which makes it easier to debug. First, do not
> redistribute the submesh. Just use DMPlexGetSubpointIS()
> to get the mapping of filtered points to points in the original mesh. Then
> create an expanded IS using the Section which makes
> dofs in the filtered mesh to dofs in the original mesh. From this use
>
>   https://petsc.org/main/docs/manualpages/Vec/VecISCopy/
>
> to move values between the original vector and the filtered vector.
>
> Once that works, you can try redistributing the filtered mesh. Before
> calling DMPlexDistribute() on the filtered mesh, you need to call
>
>   https://petsc.org/main/docs/manualpages/DM/DMSetUseNatural/
>
> When you redistribute, it will compute a mapping back to the original
> layout. Now when you want to transfer values, you
>
>   1) Create a natural vector with DMCreateNaturalVec()
>
>   2) Use DMGlobalToNaturalBegin/End() to move values from the filtered
> vector to the natural vector
>
>   3) Use VecISCopy() to move values from the natural vector to the
> original vector
>
> Let me know if you have any problems.
>
>   Thanks,
>
>  Matt
>
>
>> Sincerely
>> Nicholas
>>
>>
>>
>> On Mon, Nov 28, 2022 at 6:19 AM Matthew Knepley 
>> wrote:
>>
>>> On Sun, Nov 27, 2022 at 10:22 PM Nicholas Arnold-Medabalimi <
>>> 

Re: [petsc-users] About Preconditioner and MUMPS

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 6:15 AM 김성익  wrote:

> I think I don't understand the meaning of
> -pc_type mpi
>

This option says to use the PCMPI preconditioner. This allows you to
parallelize the
solver in what is otherwise a serial code.


> -mpi_pc_type lu
>

This tells the underlying solver in PCMPI to use the LU preconditioner.


> What's the exact meaning of -pc_type mpi and -mpi_pc_type lu??
> Is this difference coming from 'mpi_linear_solver_server' option??
>

Please use -ksp_view as I asked to look at the entire solver. Send it
anytime you mail about solver questions.

  Thanks

 Matt


> Thanks,
> Hyung Kim
>
> 2022년 12월 7일 (수) 오후 8:05, Matthew Knepley 님이 작성:
>
>> On Wed, Dec 7, 2022 at 5:13 AM 김성익  wrote:
>>
>>> I want to use METIS for ordering.
>>> I heard the MUMPS has good performance with METIS ordering.
>>>
>>> However there are some wonder things.
>>> 1. With option   "-mpi_linear_solver_server -ksp_type preonly -pc_type
>>> mpi -mpi_pc_type lu " the MUMPS solving is slower than with option
>>> "-mpi_linear_solver_server -pc_type mpi  -ksp_type preonly".
>>>Why does this result happen?
>>>
>>
>> You are probably not using MUMPS. Always always always use -ksp_view to
>> see exactly what solver you are using.
>>
>>
>>> 2. (MPIRUN case  (actually, mpi_linear_solver_server case)))  In my
>>> code, there is already has "PetscCall(PCSetType(pc,PCLU))" . However, to
>>> use METIS by using "-mpi_mat_mumps_icntl_7 5"  I must append this option
>>> "-mpi_pc_type pu".
>>> If I don't apply "-mpi_pc_type lu", the metis option
>>> ("-mpi_mat_mumps_icntl_7 5"). Can I get some information about this?
>>>
>>
>> Again, it seems like the solver configuration is not what you think it is.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Thanks,
>>> Hyung Kim
>>>
>>> 2022년 12월 7일 (수) 오전 12:24, Barry Smith 님이 작성:
>>>


 On Dec 6, 2022, at 5:15 AM, 김성익  wrote:

 Hello,


 I have some questions about pc and mumps_icntl.

 1. What’s the difference between adopt preconditioner by code (for
 example, PetscCall(PCSetType(pc,PCLU)) and option -pc_type lu??
 And also, What’s the priority between code pcsettype and option
 -pc_type ??

 2. When I tried to use METIS in MUMPS, I adopted metis by option
 (for example, -mat_mumps_icntl_7 5). In this situation, it is impossible to
 use metis without pc_type lu. However, in my case pc type lu makes the
 performance poor. So I don’t want to use lu preconditioner. How can I do
 this?

The package MUMPS has an option to use metis in its ordering process
 which can be turned on as indicated while using MUMPS.  Most
 preconditioners that PETSc can use do not use metis for any purpose hence
 there is no option to turn on its use.  For what purpose do you wish to use
 metis? Partitioning, ordering, ?






 Thanks,

 Hyung Kim



>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] About Preconditioner and MUMPS

2022-12-07 Thread 김성익
I think I don't understand the meaning of
-pc_type mpi
-mpi_pc_type lu

What's the exact meaning of -pc_type mpi and -mpi_pc_type lu??
Is this difference coming from 'mpi_linear_solver_server' option??

Thanks,
Hyung Kim

2022년 12월 7일 (수) 오후 8:05, Matthew Knepley 님이 작성:

> On Wed, Dec 7, 2022 at 5:13 AM 김성익  wrote:
>
>> I want to use METIS for ordering.
>> I heard the MUMPS has good performance with METIS ordering.
>>
>> However there are some wonder things.
>> 1. With option   "-mpi_linear_solver_server -ksp_type preonly -pc_type
>> mpi -mpi_pc_type lu " the MUMPS solving is slower than with option
>> "-mpi_linear_solver_server -pc_type mpi  -ksp_type preonly".
>>Why does this result happen?
>>
>
> You are probably not using MUMPS. Always always always use -ksp_view to
> see exactly what solver you are using.
>
>
>> 2. (MPIRUN case  (actually, mpi_linear_solver_server case)))  In my code,
>> there is already has "PetscCall(PCSetType(pc,PCLU))" . However, to use
>> METIS by using "-mpi_mat_mumps_icntl_7 5"  I must append this option
>> "-mpi_pc_type pu".
>> If I don't apply "-mpi_pc_type lu", the metis option
>> ("-mpi_mat_mumps_icntl_7 5"). Can I get some information about this?
>>
>
> Again, it seems like the solver configuration is not what you think it is.
>
>   Thanks,
>
>  Matt
>
>
>> Thanks,
>> Hyung Kim
>>
>> 2022년 12월 7일 (수) 오전 12:24, Barry Smith 님이 작성:
>>
>>>
>>>
>>> On Dec 6, 2022, at 5:15 AM, 김성익  wrote:
>>>
>>> Hello,
>>>
>>>
>>> I have some questions about pc and mumps_icntl.
>>>
>>> 1. What’s the difference between adopt preconditioner by code (for
>>> example, PetscCall(PCSetType(pc,PCLU)) and option -pc_type lu??
>>> And also, What’s the priority between code pcsettype and option -pc_type
>>> ??
>>>
>>> 2. When I tried to use METIS in MUMPS, I adopted metis by option
>>> (for example, -mat_mumps_icntl_7 5). In this situation, it is impossible to
>>> use metis without pc_type lu. However, in my case pc type lu makes the
>>> performance poor. So I don’t want to use lu preconditioner. How can I do
>>> this?
>>>
>>>The package MUMPS has an option to use metis in its ordering process
>>> which can be turned on as indicated while using MUMPS.  Most
>>> preconditioners that PETSc can use do not use metis for any purpose hence
>>> there is no option to turn on its use.  For what purpose do you wish to use
>>> metis? Partitioning, ordering, ?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Hyung Kim
>>>
>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] About Preconditioner and MUMPS

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 5:13 AM 김성익  wrote:

> I want to use METIS for ordering.
> I heard the MUMPS has good performance with METIS ordering.
>
> However there are some wonder things.
> 1. With option   "-mpi_linear_solver_server -ksp_type preonly -pc_type mpi
> -mpi_pc_type lu " the MUMPS solving is slower than with option
> "-mpi_linear_solver_server -pc_type mpi  -ksp_type preonly".
>Why does this result happen?
>

You are probably not using MUMPS. Always always always use -ksp_view to see
exactly what solver you are using.


> 2. (MPIRUN case  (actually, mpi_linear_solver_server case)))  In my code,
> there is already has "PetscCall(PCSetType(pc,PCLU))" . However, to use
> METIS by using "-mpi_mat_mumps_icntl_7 5"  I must append this option
> "-mpi_pc_type pu".
> If I don't apply "-mpi_pc_type lu", the metis option
> ("-mpi_mat_mumps_icntl_7 5"). Can I get some information about this?
>

Again, it seems like the solver configuration is not what you think it is.

  Thanks,

 Matt


> Thanks,
> Hyung Kim
>
> 2022년 12월 7일 (수) 오전 12:24, Barry Smith 님이 작성:
>
>>
>>
>> On Dec 6, 2022, at 5:15 AM, 김성익  wrote:
>>
>> Hello,
>>
>>
>> I have some questions about pc and mumps_icntl.
>>
>> 1. What’s the difference between adopt preconditioner by code (for
>> example, PetscCall(PCSetType(pc,PCLU)) and option -pc_type lu??
>> And also, What’s the priority between code pcsettype and option -pc_type
>> ??
>>
>> 2. When I tried to use METIS in MUMPS, I adopted metis by option
>> (for example, -mat_mumps_icntl_7 5). In this situation, it is impossible to
>> use metis without pc_type lu. However, in my case pc type lu makes the
>> performance poor. So I don’t want to use lu preconditioner. How can I do
>> this?
>>
>>The package MUMPS has an option to use metis in its ordering process
>> which can be turned on as indicated while using MUMPS.  Most
>> preconditioners that PETSc can use do not use metis for any purpose hence
>> there is no option to turn on its use.  For what purpose do you wish to use
>> metis? Partitioning, ordering, ?
>>
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Hyung Kim
>>
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] Petsc Section in DMPlex

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 3:35 AM Nicholas Arnold-Medabalimi <
narno...@umich.edu> wrote:

> Hi Matthew
>
> Thank you for the help. This clarified a great deal.
>
> I have a follow-up question related to DMPlexFilter. It may be better to
> describe what I'm trying to achieve.
>
> I have a general mesh I am solving which has a section with cell center
> finite volume states, as described in my initial email. After calculating
> some metrics, I tag a bunch of cells with an identifying Label and use
> DMFilter to generate a new DM which is only that subset of cells.
> Generally, this leads to a pretty unbalanced DM so I then plan to use
> DMPlexDIstribute to balance that DM across the processors. The coordinates
> pass along fine, but the state(or I should say Section) does not at least
> as far as I can tell.
>
> Assuming I can get a filtered DM I then distribute the DM and state using
> the method you described above and it seems to be working ok.
>
> The last connection I have to make is the transfer of information from the
> full mesh to the "sampled" filtered mesh. From what I can gather I would
> need to get the mapping of points using DMPlexGetSubpointIS and then
> manually copy the values from the full DM section to the filtered DM? I
> have the process from full->filtered->distributed all working for the
> coordinates so its just a matter of transferring the section correctly.
>
> I appreciate all the help you have provided.
>

Let's do this in two steps, which makes it easier to debug. First, do not
redistribute the submesh. Just use DMPlexGetSubpointIS()
to get the mapping of filtered points to points in the original mesh. Then
create an expanded IS using the Section which makes
dofs in the filtered mesh to dofs in the original mesh. From this use

  https://petsc.org/main/docs/manualpages/Vec/VecISCopy/

to move values between the original vector and the filtered vector.

Once that works, you can try redistributing the filtered mesh. Before
calling DMPlexDistribute() on the filtered mesh, you need to call

  https://petsc.org/main/docs/manualpages/DM/DMSetUseNatural/

When you redistribute, it will compute a mapping back to the original
layout. Now when you want to transfer values, you

  1) Create a natural vector with DMCreateNaturalVec()

  2) Use DMGlobalToNaturalBegin/End() to move values from the filtered
vector to the natural vector

  3) Use VecISCopy() to move values from the natural vector to the original
vector

Let me know if you have any problems.

  Thanks,

 Matt


> Sincerely
> Nicholas
>
>
>
> On Mon, Nov 28, 2022 at 6:19 AM Matthew Knepley  wrote:
>
>> On Sun, Nov 27, 2022 at 10:22 PM Nicholas Arnold-Medabalimi <
>> narno...@umich.edu> wrote:
>>
>>> Hi Petsc Users
>>>
>>> I have a question about properly using PetscSection to assign state
>>> variables to a DM. I have an existing DMPlex mesh distributed on 2
>>> processors. My goal is to have state variables set to the cell centers. I
>>> then want to call DMPlexDistribute, which I hope will balance the mesh
>>> elements and hopefully transport the state variables to the hosting
>>> processors as the cells are distributed to a different processor count or
>>> simply just redistributing after doing mesh adaption.
>>>
>>> Looking at the DMPlex User guide, I should be able to achieve this with
>>> a single field section using SetDof and assigning the DOF to the points
>>> corresponding to cells.
>>>
>>
>> Note that if you want several different fields, you can clone the DM
>> first for this field
>>
>>   call DMClone(dm,dmState,ierr)
>>
>> and use dmState in your calls below.
>>
>>
>>>
>>> call DMPlexGetHeightStratum(dm,0,c0,c1,ierr)
>>> call DMPlexGetChart(dm,p0,p1,ierr)
>>> call PetscSectionCreate(PETSC_COMM_WORLD,section,ierr)
>>> call PetscSectionSetNumFields(section,1,ierr) call
>>> PetscSectionSetChart(section,p0,p1,ierr)
>>> do i = c0, (c1-1)
>>> call PetscSectionSetDof(section,i,nvar,ierr)
>>> end do
>>> call PetscSectionSetup(section,ierr)
>>> call DMSetLocalSection(dm,section,ierr)
>>>
>>
>> In the loop, I would add a call to
>>
>>   call PetscSectionSetFieldDof(section,i,0,nvar,ierr)
>>
>> This also puts in the field breakdown. It is not essential, but nicer.
>>
>>
>>> From here, it looks like I can access and set the state vars using
>>>
>>> call DMGetGlobalVector(dmplex,state,ierr)
>>> call DMGetGlobalSection(dmplex,section,ierr)
>>> call VecGetArrayF90(state,stateVec,ierr)
>>> do i = c0, (c1-1)
>>> call PetscSectionGetOffset(section,i,offset,ierr)
>>> stateVec(offset:(offset+nvar))=state_i(:) !simplified assignment
>>> end do
>>> call VecRestoreArrayF90(state,stateVec,ierr)
>>> call DMRestoreGlobalVector(dmplex,state,ierr)
>>>
>>> To my understanding, I should be using Global vector since this is a
>>> pure assignment operation and I don't need the ghost cells.
>>>
>>
>> Yes.
>>
>> But the behavior I am seeing isn't exactly what I'd expect.
>>>
>>> To be honest, I'm somewhat unclear on a few things
>>>
>>> 1) 

Re: [petsc-users] About MPIRUN

2022-12-07 Thread Matthew Knepley
On Wed, Dec 7, 2022 at 5:03 AM 김성익  wrote:

> This error was caused by the inconsistent index of vecgetvalues in the
> mpirun case.
>
> For example, for the problem that the global vector size is 4, when mpirun
> -np 2, the value obtained from each process with vecgetvalues should be 2,
> but in my code tried to get 4 values, so it became a problem.
>
> How to solve this problem?
> I want to get a scalar array so that all process array has the same value
> with global vector size and values.
>

This is a fundamentally nonscalable operation. Are you sure you want to do
this?

If so, you can use

  https://petsc.org/main/docs/manualpages/PetscSF/VecScatterCreateToZero/

  Thanks

 Matt


> Thanks,
> Hyung Kim
>
> 2022년 12월 7일 (수) 오후 1:34, 김성익 님이 작성:
>
>> I already done VecAssemblyBegin/End().
>>
>> However, only mpirun case these outputs are represented.
>> There are more error outputs as below.
>>
>> --
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
>> with errorcode 73.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> --
>> [ubuntu:02473] PMIX ERROR: UNREACHABLE in file
>> ../../../src/server/pmix_server.c at line 2193
>> [ubuntu:02473] PMIX ERROR: UNREACHABLE in file
>> ../../../src/server/pmix_server.c at line 2193
>> [ubuntu:02473] PMIX ERROR: UNREACHABLE in file
>> ../../../src/server/pmix_server.c at line 2193
>> [ubuntu:02473] 3 more processes have sent help message help-mpi-api.txt /
>> mpi-abort
>> [ubuntu:02473] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>>
>> Could this be the cause of the former petsc error??
>>
>>
>> Thanks,
>> Hyung Kim
>>
>> 2022년 12월 6일 (화) 오후 10:58, Matthew Knepley 님이 작성:
>>
>>> On Tue, Dec 6, 2022 at 6:45 AM 김성익  wrote:
>>>
 Hello,


 There is a code which can run in not mpirun and also it can run in
 mpi_linear_solver_server.
 However, it has an error in just mpirun case such as mpirun -np
 ./program.
 The error output is as below.
 [0]PETSC ERROR: - Error Message
 --
 [0]PETSC ERROR: Object is in wrong state
 [0]PETSC ERROR: Not for unassembled vector
 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
 shooting.
 [0]PETSC ERROR: Petsc Release Version 3.18.1, unknown
 [0]PETSC ERROR: ./app on a arch-linux-c-debug named ubuntu by ksi2443
 Tue Dec  6 03:39:13 2022
 [0]PETSC ERROR: Configure options -download-mumps -download-scalapack
 -download-parmetis -download-metis
 [0]PETSC ERROR: #1 VecCopy() at
 /home/ksi2443/petsc/src/vec/vec/interface/vector.c:1625
 [0]PETSC ERROR: #2 KSPInitialResidual() at
 /home/ksi2443/petsc/src/ksp/ksp/interface/itres.c:60
 [0]PETSC ERROR: #3 KSPSolve_GMRES() at
 /home/ksi2443/petsc/src/ksp/ksp/impls/gmres/gmres.c:227
 [0]PETSC ERROR: #4 KSPSolve_Private() at
 /home/ksi2443/petsc/src/ksp/ksp/interface/itfunc.c:899
 [0]PETSC ERROR: #5 KSPSolve() at
 /home/ksi2443/petsc/src/ksp/ksp/interface/itfunc.c:1071
 [0]PETSC ERROR: #6 main() at /home/ksi2443/Downloads/coding/a1.c:450
 [0]PETSC ERROR: No PETSc Option Table entries
 [0]PETSC ERROR: End of Error Message ---send entire
 error message to petsc-ma...@mcs.anl.gov--

 How can I fix this??

>>>
>>> It looks like we do not check the assembled state in parallel, since it
>>> cannot cause a problem, but every time you
>>> update values with VecSetValues(), you should call
>>> VecAssemblyBegin/End().
>>>
>>>   Thanks
>>>
>>>   Matt
>>>
>>>
 Thanks,
 Hyung Kim

>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] About Preconditioner and MUMPS

2022-12-07 Thread 김성익
I want to use METIS for ordering.
I heard the MUMPS has good performance with METIS ordering.

However there are some wonder things.
1. With option   "-mpi_linear_solver_server -ksp_type preonly -pc_type mpi
-mpi_pc_type lu " the MUMPS solving is slower than with option
"-mpi_linear_solver_server -pc_type mpi  -ksp_type preonly".
   Why does this result happen?


2. (MPIRUN case  (actually, mpi_linear_solver_server case)))  In my code,
there is already has "PetscCall(PCSetType(pc,PCLU))" . However, to use
METIS by using "-mpi_mat_mumps_icntl_7 5"  I must append this option
"-mpi_pc_type pu".
If I don't apply "-mpi_pc_type lu", the metis option
("-mpi_mat_mumps_icntl_7 5"). Can I get some information about this?



Thanks,
Hyung Kim

2022년 12월 7일 (수) 오전 12:24, Barry Smith 님이 작성:

>
>
> On Dec 6, 2022, at 5:15 AM, 김성익  wrote:
>
> Hello,
>
>
> I have some questions about pc and mumps_icntl.
>
> 1. What’s the difference between adopt preconditioner by code (for
> example, PetscCall(PCSetType(pc,PCLU)) and option -pc_type lu??
> And also, What’s the priority between code pcsettype and option -pc_type ??
>
> 2. When I tried to use METIS in MUMPS, I adopted metis by option (for
> example, -mat_mumps_icntl_7 5). In this situation, it is impossible to use
> metis without pc_type lu. However, in my case pc type lu makes the
> performance poor. So I don’t want to use lu preconditioner. How can I do
> this?
>
>The package MUMPS has an option to use metis in its ordering process
> which can be turned on as indicated while using MUMPS.  Most
> preconditioners that PETSc can use do not use metis for any purpose hence
> there is no option to turn on its use.  For what purpose do you wish to use
> metis? Partitioning, ordering, ?
>
>
>
>
>
>
> Thanks,
>
> Hyung Kim
>
>
>


Re: [petsc-users] About MPIRUN

2022-12-07 Thread 김성익
This error was caused by the inconsistent index of vecgetvalues in the
mpirun case.

For example, for the problem that the global vector size is 4, when mpirun
-np 2, the value obtained from each process with vecgetvalues should be 2,
but in my code tried to get 4 values, so it became a problem.

How to solve this problem?
I want to get a scalar array so that all process array has the same value
with global vector size and values.

Thanks,
Hyung Kim

2022년 12월 7일 (수) 오후 1:34, 김성익 님이 작성:

> I already done VecAssemblyBegin/End().
>
> However, only mpirun case these outputs are represented.
> There are more error outputs as below.
>
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> with errorcode 73.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
> [ubuntu:02473] PMIX ERROR: UNREACHABLE in file
> ../../../src/server/pmix_server.c at line 2193
> [ubuntu:02473] PMIX ERROR: UNREACHABLE in file
> ../../../src/server/pmix_server.c at line 2193
> [ubuntu:02473] PMIX ERROR: UNREACHABLE in file
> ../../../src/server/pmix_server.c at line 2193
> [ubuntu:02473] 3 more processes have sent help message help-mpi-api.txt /
> mpi-abort
> [ubuntu:02473] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
> Could this be the cause of the former petsc error??
>
>
> Thanks,
> Hyung Kim
>
> 2022년 12월 6일 (화) 오후 10:58, Matthew Knepley 님이 작성:
>
>> On Tue, Dec 6, 2022 at 6:45 AM 김성익  wrote:
>>
>>> Hello,
>>>
>>>
>>> There is a code which can run in not mpirun and also it can run in
>>> mpi_linear_solver_server.
>>> However, it has an error in just mpirun case such as mpirun -np
>>> ./program.
>>> The error output is as below.
>>> [0]PETSC ERROR: - Error Message
>>> --
>>> [0]PETSC ERROR: Object is in wrong state
>>> [0]PETSC ERROR: Not for unassembled vector
>>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>>> [0]PETSC ERROR: Petsc Release Version 3.18.1, unknown
>>> [0]PETSC ERROR: ./app on a arch-linux-c-debug named ubuntu by ksi2443
>>> Tue Dec  6 03:39:13 2022
>>> [0]PETSC ERROR: Configure options -download-mumps -download-scalapack
>>> -download-parmetis -download-metis
>>> [0]PETSC ERROR: #1 VecCopy() at
>>> /home/ksi2443/petsc/src/vec/vec/interface/vector.c:1625
>>> [0]PETSC ERROR: #2 KSPInitialResidual() at
>>> /home/ksi2443/petsc/src/ksp/ksp/interface/itres.c:60
>>> [0]PETSC ERROR: #3 KSPSolve_GMRES() at
>>> /home/ksi2443/petsc/src/ksp/ksp/impls/gmres/gmres.c:227
>>> [0]PETSC ERROR: #4 KSPSolve_Private() at
>>> /home/ksi2443/petsc/src/ksp/ksp/interface/itfunc.c:899
>>> [0]PETSC ERROR: #5 KSPSolve() at
>>> /home/ksi2443/petsc/src/ksp/ksp/interface/itfunc.c:1071
>>> [0]PETSC ERROR: #6 main() at /home/ksi2443/Downloads/coding/a1.c:450
>>> [0]PETSC ERROR: No PETSc Option Table entries
>>> [0]PETSC ERROR: End of Error Message ---send entire
>>> error message to petsc-ma...@mcs.anl.gov--
>>>
>>> How can I fix this??
>>>
>>
>> It looks like we do not check the assembled state in parallel, since it
>> cannot cause a problem, but every time you
>> update values with VecSetValues(), you should call VecAssemblyBegin/End().
>>
>>   Thanks
>>
>>   Matt
>>
>>
>>> Thanks,
>>> Hyung Kim
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>


Re: [petsc-users] Petsc Section in DMPlex

2022-12-07 Thread Nicholas Arnold-Medabalimi
Hi Matthew

Thank you for the help. This clarified a great deal.

I have a follow-up question related to DMPlexFilter. It may be better to
describe what I'm trying to achieve.

I have a general mesh I am solving which has a section with cell center
finite volume states, as described in my initial email. After calculating
some metrics, I tag a bunch of cells with an identifying Label and use
DMFilter to generate a new DM which is only that subset of cells.
Generally, this leads to a pretty unbalanced DM so I then plan to use
DMPlexDIstribute to balance that DM across the processors. The coordinates
pass along fine, but the state(or I should say Section) does not at least
as far as I can tell.

Assuming I can get a filtered DM I then distribute the DM and state using
the method you described above and it seems to be working ok.

The last connection I have to make is the transfer of information from the
full mesh to the "sampled" filtered mesh. From what I can gather I would
need to get the mapping of points using DMPlexGetSubpointIS and then
manually copy the values from the full DM section to the filtered DM? I
have the process from full->filtered->distributed all working for the
coordinates so its just a matter of transferring the section correctly.

I appreciate all the help you have provided.

Sincerely
Nicholas



On Mon, Nov 28, 2022 at 6:19 AM Matthew Knepley  wrote:

> On Sun, Nov 27, 2022 at 10:22 PM Nicholas Arnold-Medabalimi <
> narno...@umich.edu> wrote:
>
>> Hi Petsc Users
>>
>> I have a question about properly using PetscSection to assign state
>> variables to a DM. I have an existing DMPlex mesh distributed on 2
>> processors. My goal is to have state variables set to the cell centers. I
>> then want to call DMPlexDistribute, which I hope will balance the mesh
>> elements and hopefully transport the state variables to the hosting
>> processors as the cells are distributed to a different processor count or
>> simply just redistributing after doing mesh adaption.
>>
>> Looking at the DMPlex User guide, I should be able to achieve this with a
>> single field section using SetDof and assigning the DOF to the points
>> corresponding to cells.
>>
>
> Note that if you want several different fields, you can clone the DM
> first for this field
>
>   call DMClone(dm,dmState,ierr)
>
> and use dmState in your calls below.
>
>
>>
>> call DMPlexGetHeightStratum(dm,0,c0,c1,ierr)
>> call DMPlexGetChart(dm,p0,p1,ierr)
>> call PetscSectionCreate(PETSC_COMM_WORLD,section,ierr)
>> call PetscSectionSetNumFields(section,1,ierr) call
>> PetscSectionSetChart(section,p0,p1,ierr)
>> do i = c0, (c1-1)
>> call PetscSectionSetDof(section,i,nvar,ierr)
>> end do
>> call PetscSectionSetup(section,ierr)
>> call DMSetLocalSection(dm,section,ierr)
>>
>
> In the loop, I would add a call to
>
>   call PetscSectionSetFieldDof(section,i,0,nvar,ierr)
>
> This also puts in the field breakdown. It is not essential, but nicer.
>
>
>> From here, it looks like I can access and set the state vars using
>>
>> call DMGetGlobalVector(dmplex,state,ierr)
>> call DMGetGlobalSection(dmplex,section,ierr)
>> call VecGetArrayF90(state,stateVec,ierr)
>> do i = c0, (c1-1)
>> call PetscSectionGetOffset(section,i,offset,ierr)
>> stateVec(offset:(offset+nvar))=state_i(:) !simplified assignment
>> end do
>> call VecRestoreArrayF90(state,stateVec,ierr)
>> call DMRestoreGlobalVector(dmplex,state,ierr)
>>
>> To my understanding, I should be using Global vector since this is a pure
>> assignment operation and I don't need the ghost cells.
>>
>
> Yes.
>
> But the behavior I am seeing isn't exactly what I'd expect.
>>
>> To be honest, I'm somewhat unclear on a few things
>>
>> 1) Should be using nvar fields with 1 DOF each or 1 field with nvar
>> DOFs or what the distinction between the two methods are?
>>
>
> We have two divisions in a Section. A field can have a number of
> components. This is intended to model a vector or tensor field.
> Then a Section can have a number of fields, such as velocity and pressure
> for a Stokes problem. The division is mainly to help the
> user, so I would use the most natural one.
>
>
>> 2) Adding a print statement after the offset assignment I get (on rank 0
>> of 2)
>> cell   1 offset   0
>>  cell   2 offset  18
>>  cell   3 offset  36
>> which is expected and works but on rank 1 I get
>>  cell   1 offset9000
>>  cell   2 offset9018
>>  cell   3 offset9036
>>
>> which isn't exactly what I would expect. Shouldn't the offsets reset at 0
>> for the next rank?
>>
>
> The local and global sections hold different information. This is the
> source of the confusion. The local section does describe a local
> vector, and thus includes overlap or "ghost" dofs. The global section
> describes a global vector. However, it is intended to deliver
> global indices, and thus the offsets give back global indices. When you
> use VecGetArray*()