Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
It's interesting... I'm now thinking about MOOSE. MOOSE doesn't dup the comm you give it either. It simply takes it and uses it. It assumes you've already done the duping. I like it that way because the calling code can do whatever it needs to do to get that communicator (for instance, split

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Matthew Knepley writes: > On Tue, Apr 3, 2018 at 6:20 PM, Jed Brown wrote: > >> Derek Gaston writes: >> >> > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown wrote: >> > >> >> Communicators should be cheap. One per library

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
My mistake, xSDK only says you need to use the given communicators; it doesn't say you need to use a duped version. M3. Each xSDK-compatible package that utilizes MPI must restrict its MPI operations to MPI communicators that are provided to it and not use directly MPI_COMM_WORLD. The

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Matthew Knepley
On Tue, Apr 3, 2018 at 6:20 PM, Jed Brown wrote: > Derek Gaston writes: > > > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown wrote: > > > >> Communicators should be cheap. One per library per "size" isn't a huge > >> number of communicators.

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston writes: > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown wrote: > >> Communicators should be cheap. One per library per "size" isn't a huge >> number of communicators. >> > > I agree - but that's not what we're getting here. We're getting one per

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Matthew Knepley
On Tue, Apr 3, 2018 at 6:15 PM, Jed Brown wrote: > Matthew Knepley writes: > > > On Tue, Apr 3, 2018 at 6:06 PM, Jed Brown wrote: > > > >> Derek Gaston writes: > >> > >> > Sounds great to me - what library do I

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
On Tue, Apr 3, 2018 at 4:06 PM Jed Brown wrote: > Communicators should be cheap. One per library per "size" isn't a huge > number of communicators. > I agree - but that's not what we're getting here. We're getting one per "object" (Mat / Preconditioner, etc.) associated

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Matthew Knepley writes: > On Tue, Apr 3, 2018 at 6:06 PM, Jed Brown wrote: > >> Derek Gaston writes: >> >> > Sounds great to me - what library do I download that we're all going to >> use >> > for managing the memory pool? :-) >> > >>

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Matthew Knepley
On Tue, Apr 3, 2018 at 6:06 PM, Jed Brown wrote: > Derek Gaston writes: > > > Sounds great to me - what library do I download that we're all going to > use > > for managing the memory pool? :-) > > > > Seriously though: why doesn't MPI give us an ability

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston writes: > Sounds great to me - what library do I download that we're all going to use > for managing the memory pool? :-) > > Seriously though: why doesn't MPI give us an ability to get unique tag IDs > for a given communicator? It's called a dup'd

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
I would only implement it for hypre since we don't know if it is safe for other packages. Best to insure the comm is freed as soon as possible (not waiting for PetscFinalize()) > On Apr 3, 2018, at 3:04 PM, Stefano Zampini wrote: > > What about > >

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
Sounds great to me - what library do I download that we're all going to use for managing the memory pool? :-) Seriously though: why doesn't MPI give us an ability to get unique tag IDs for a given communicator? I like the way libMesh deals with this:

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston writes: > Do you think there is any possibility of getting Hypre to use disjoint tags > from PETSc so you can just use the same comm? Maybe a configure option to > Hypre to tell it what number to start at for its tags? Why have malloc when we could just

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
It looks nice for me. Fande, On Tue, Apr 3, 2018 at 3:04 PM, Stefano Zampini wrote: > What about > > PetscCommGetPkgComm(MPI_Comm comm ,const char* package, MPI_Comm* pkgcomm) > > with a key for each of the external packages PETSc can use? > > > On Apr 3, 2018, at

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Stefano Zampini
What about PetscCommGetPkgComm(MPI_Comm comm ,const char* package, MPI_Comm* pkgcomm) with a key for each of the external packages PETSc can use? > On Apr 3, 2018, at 10:56 PM, Kong, Fande wrote: > > I think we could add an inner comm for external package. If the same

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
I think we could add an inner comm for external package. If the same comm is passed in again, we just retrieve the same communicator, instead of MPI_Comm_dup(), for that external package (at least HYPRE team claimed this will be fine). I did not see any issue with this idea so far. I might be

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Smith, Barry F. wrote: > > > > On Apr 3, 2018, at 11:59 AM, Balay, Satish wrote: > > > > On Tue, 3 Apr 2018, Smith, Barry F. wrote: > > > >> Note that PETSc does one MPI_Comm_dup() for each hypre matrix. > >> Internally hypre does at least one

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
Do you think there is any possibility of getting Hypre to use disjoint tags from PETSc so you can just use the same comm? Maybe a configure option to Hypre to tell it what number to start at for its tags? Derek On Tue, Apr 3, 2018 at 11:59 AM Satish Balay wrote: > On Tue, 3

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
> On Apr 3, 2018, at 11:59 AM, Balay, Satish wrote: > > On Tue, 3 Apr 2018, Smith, Barry F. wrote: > >> Note that PETSc does one MPI_Comm_dup() for each hypre matrix. Internally >> hypre does at least one MPI_Comm_create() per hypre boomerAMG solver. So >> even if PETSc

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Smith, Barry F. wrote: >Note that PETSc does one MPI_Comm_dup() for each hypre matrix. Internally > hypre does at least one MPI_Comm_create() per hypre boomerAMG solver. So even > if PETSc does not do the MPI_Comm_dup() you will still be limited due to > hypre's

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
> On Apr 3, 2018, at 11:41 AM, Kong, Fande wrote: > > > > On Tue, Apr 3, 2018 at 11:29 AM, Smith, Barry F. wrote: > > Fande, > > The reason for MPI_Comm_dup() and the inner communicator is that this > communicator is used by hypre and so

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston writes: > Sorry, should read: "any one MPI process is not involved in more than ~2000 > *communicators*" Yes, as intended. Only the ranks in a communicator's group need to know about the existence of that communicator. > Derek > > On Tue, Apr 3, 2018 at 11:47

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
Sorry, should read: "any one MPI process is not involved in more than ~2000 *communicators*" Derek On Tue, Apr 3, 2018 at 11:47 AM Derek Gaston wrote: > On Tue, Apr 3, 2018 at 10:31 AM Satish Balay wrote: > >> On Tue, 3 Apr 2018, Derek Gaston wrote: >> >

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
On Tue, Apr 3, 2018 at 10:31 AM Satish Balay wrote: > On Tue, 3 Apr 2018, Derek Gaston wrote: > > Which does bring up a point: I have been able to do solves before with > > ~50,000 separate PETSc solves without issue. Is it because I was working > > with MVAPICH on a cluster?

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
On Tue, Apr 3, 2018 at 11:29 AM, Smith, Barry F. wrote: > > Fande, > > The reason for MPI_Comm_dup() and the inner communicator is that this > communicator is used by hypre and so cannot "just" be a PETSc communicator. > We cannot have PETSc and hypre using the same

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
Fande, The reason for MPI_Comm_dup() and the inner communicator is that this communicator is used by hypre and so cannot "just" be a PETSc communicator. We cannot have PETSc and hypre using the same communicator since they may capture each others messages etc. See my pull

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Kong, Fande wrote: > Figured out: > > The reason is that in MatCreate_HYPRE(Mat B), we call MPI_Comm_dup > instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it > does not actually create a communicator if the communicator is already > known to PETSc. > >

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
Fande, Please try the branch https://bitbucket.org/petsc/petsc/pull-requests/921/boomeramg-unlike-2-other-hypre/diff It does not "solve" the problem but it should get your current test that now fails to run again, Barry > On Apr 3, 2018, at 10:14 AM, Kong, Fande

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
Figured out: The reason is that in MatCreate_HYPRE(Mat B), we call MPI_Comm_dup instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it does not actually create a communicator if the communicator is already known to PETSc. Furthermore, I do not think we should a comm in

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Satish Balay wrote: > On Tue, 3 Apr 2018, Derek Gaston wrote: > > > One thing I want to be clear of here: is that we're not trying to solve > > this particular problem (where we're creating 1000 instances of Hypre to > > precondition each variable independently)... this

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Derek Gaston wrote: > One thing I want to be clear of here: is that we're not trying to solve > this particular problem (where we're creating 1000 instances of Hypre to > precondition each variable independently)... this particular problem is > just a test (that we've had in

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
The first bad commit: *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini >Date: Sat Nov 5 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE hpmat already stores two HYPRE vectors* Hypre version:

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Stefano Zampini
> On Apr 3, 2018, at 5:43 PM, Fande Kong wrote: > > > > On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini > wrote: > >> On Apr 3, 2018, at 4:58 PM, Satish Balay >

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
One thing I want to be clear of here: is that we're not trying to solve this particular problem (where we're creating 1000 instances of Hypre to precondition each variable independently)... this particular problem is just a test (that we've had in our test suite for a long time) to stress test

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Fande Kong
On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini wrote: > > On Apr 3, 2018, at 4:58 PM, Satish Balay wrote: > > On Tue, 3 Apr 2018, Kong, Fande wrote: > > On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. > wrote: > > > Each

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Kong, Fande wrote: > > I thought this trace comes up after applying your patch > > > > This trace comes from Mac > > Too many communicators (0/2048 free on this process; ignore_id=0) > This comes from a Linux (it is a test box), and I do not have access to it. Then its

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
On Tue, Apr 3, 2018 at 9:32 AM, Satish Balay wrote: > On Tue, 3 Apr 2018, Stefano Zampini wrote: > > > > > > On Apr 3, 2018, at 4:58 PM, Satish Balay wrote: > > > > > > On Tue, 3 Apr 2018, Kong, Fande wrote: > > > > > >> On Tue, Apr 3, 2018 at 1:17 AM,

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Stefano Zampini wrote: > > > On Apr 3, 2018, at 4:58 PM, Satish Balay wrote: > > > > On Tue, 3 Apr 2018, Kong, Fande wrote: > > > >> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. wrote: > >> > >>> > >>> Each external package

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Stefano Zampini
> On Apr 3, 2018, at 4:58 PM, Satish Balay wrote: > > On Tue, 3 Apr 2018, Kong, Fande wrote: > >> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. wrote: >> >>> >>> Each external package definitely needs its own duplicated communicator; >>> cannot

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Kong, Fande wrote: > On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. wrote: > > > > >Each external package definitely needs its own duplicated communicator; > > cannot share between packages. > > > >The only problem with the dups below is if they

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
The PETSc model is that the "outer" communicator (passed by the caller) is dup'd to create an "inner" communicator which as attached (using MPI attributes) to the outer communicator. In the future, PETSc will find the inner communicator and use that, instead of dup'ing again. Derek Gaston

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. wrote: > >Each external package definitely needs its own duplicated communicator; > cannot share between packages. > >The only problem with the dups below is if they are in a loop and get > called many times. > The

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
I like the idea that Hypre (as a package) would get _one_ comm (for all the solvers/matrices created) that was duped from the one given to PETSc in Vec/MatCreate(). Seems like the tricky part would be figuring out _which_ comm that is based on the incoming comm. For instance - we would

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
Fande claimed 49a781f5cee36db85e8d5b951eec29f10ac13593 made a difference. [so assuming same hypre version was used before and after this commit - for this bisection] So the extra MPI_Comm_dup() calls due to MATHYPRE must be pushing the total communicators over the limit. And wrt debugging -

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.
Each external package definitely needs its own duplicated communicator; cannot share between packages. The only problem with the dups below is if they are in a loop and get called many times. To debug the hypre/duplication issue in MOOSE I would run in the debugger with a break

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
We do a MPI_Comm_dup() for objects related to externalpackages. Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
Sorry - I was going down the wrong path.. Sure MPI_COMM_WORLD vs PETSC_COMM_WORLD shouldn't make a difference [except for a couple of extra mpi_comm_dup() calls.] Satish On Tue, 3 Apr 2018, Derek Gaston wrote: > I’m working with Fande on this and I would like to add a bit more. There > are

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Smith, Barry F.
Are we sure this is a PETSc comm issue and not a hypre comm duplication issue frame #6: 0x0001061345d9 libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, participate=, new_comm_ptr=) + 409 at gen_redcs_mat.c:531 [opt] Looks like hypre is needed to generate subcomms, perhaps

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Derek Gaston
I’m working with Fande on this and I would like to add a bit more. There are many circumstances where we aren’t working on COMM_WORLD at all (e.g. working on a sub-communicator) but PETSc was initialized using MPI_COMM_WORLD (think multi-level solves)… and we need to create arbitrarily many PETSc

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
Why we do not use user-level MPI communicators directly? What are potential risks here? Fande, On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay wrote: > PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to > MPI_Comm_dup() - thus potentially avoiding such

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to MPI_Comm_dup() - thus potentially avoiding such errors http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscCommDuplicate.html Satish On Mon, 2 Apr 2018, Kong, Fande wrote: > On Mon, Apr 2, 2018 at 4:23

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay wrote: > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects? > > If so - you could try changing to PETSC_COMM_WORLD > I do not think we are using PETSC_COMM_WORLD when creating PETSc objects. Why we can not use

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects? If so - you could try changing to PETSC_COMM_WORLD Satish On Mon, 2 Apr 2018, Kong, Fande wrote: > Hi All, > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its > applications. I have a error message for a

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
Nope. There is a back trace: ** thread #1: tid = 0x3b477b4, 0x7fffb306cd42 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x7fffb306cd42 libsystem_kernel.dylib`__pthread_kill + 10

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Stefano Zampini
maybe this will fix ? *diff --git a/src/ksp/pc/impls/hypre/hypre.c b/src/ksp/pc/impls/hypre/hypre.c* *index 28addcf533..6a756d4c57 100644* *--- a/src/ksp/pc/impls/hypre/hypre.c* *+++ b/src/ksp/pc/impls/hypre/hypre.c* @@ -142,8 +142,7 @@ static PetscErrorCode PCSetUp_HYPRE(PC pc) ierr =

[petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
Hi All, I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its applications. I have a error message for a standard test: *preconditioners/pbp.lots_of_variables: MPI had an errorpreconditioners/pbp.lots_of_variables: