Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
It's interesting... I'm now thinking about MOOSE.  MOOSE doesn't dup the
comm you give it either.  It simply takes it and uses it.  It assumes
you've already done the duping.

I like it that way because the calling code can do whatever it needs to do
to get that communicator (for instance, split it).  Duping it on the way in
would be unnecessary in many circumstances where MOOSE is getting passed a
"clean" comm (for instance, from a split that is meant only for that
instance of MOOSE).  It's up to the caller to make sure to pass in a clean
comm for MOOSE to use.

So: I don't fault Hypre... it seems to work under the same assumption.  I
can see both sides.

At the same time, I can see the PETSc model because it simplifies things
for the calling library - you don't have to worry about passing in a
"clean" comm... PETSc generates its own internally.

I think both models are valid - and there are tradeoffs both ways.  We just
need to do the "annoying" thing and keep track of a clean comm that we're
passing in to Hypre.  When MOOSE is running sub-solves of itself the
calling instance of MOOSE creates and keeps track of those clean comms for
each sub-solve...

Derek

On Tue, Apr 3, 2018 at 4:22 PM Matthew Knepley  wrote:

> On Tue, Apr 3, 2018 at 6:20 PM, Jed Brown  wrote:
>
>> Derek Gaston  writes:
>>
>> > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown  wrote:
>> >
>> >> Communicators should be cheap.  One per library per "size" isn't a huge
>> >> number of communicators.
>> >>
>> >
>> > I agree - but that's not what we're getting here.  We're getting one per
>> > "object" (Mat / Preconditioner, etc.) associated with the library per
>> > "size".  If we can fix that I agree that there's no problem (we use a
>> lot
>> > of libraries... but not 2000 separate ones simultaneously!).
>>
>> So PETSc needs to dup and attach a hypre communicator because they
>> aren't interested in doing it themselves.  Not hard to implement, just
>> mildly annoying.
>>
>
> Can't someone tell them its an xSDK requirement?
>
>Matt
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Matthew Knepley  writes:

> On Tue, Apr 3, 2018 at 6:20 PM, Jed Brown  wrote:
>
>> Derek Gaston  writes:
>>
>> > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown  wrote:
>> >
>> >> Communicators should be cheap.  One per library per "size" isn't a huge
>> >> number of communicators.
>> >>
>> >
>> > I agree - but that's not what we're getting here.  We're getting one per
>> > "object" (Mat / Preconditioner, etc.) associated with the library per
>> > "size".  If we can fix that I agree that there's no problem (we use a lot
>> > of libraries... but not 2000 separate ones simultaneously!).
>>
>> So PETSc needs to dup and attach a hypre communicator because they
>> aren't interested in doing it themselves.  Not hard to implement, just
>> mildly annoying.
>>
>
> Can't someone tell them its an xSDK requirement?

xSDX is busy with threads... ;-)


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.

   My mistake, xSDK only says you need to use the given communicators; it 
doesn't say you need to use a duped version.

M3. Each xSDK-compatible package that utilizes MPI must restrict its MPI 
operations to MPI communicators that are provided to it and not use directly 
MPI_COMM_WORLD. The package should use configure tests or version tests to 
detect MPI 2 or MPI 3 features that may not be available; it should not be 
assumed that a full MPI 2 or MPI 3 implementation is available. The package can 
change the MPI error-handling mode by default but should have an option to 
prevent it from changing the MPI error handling (which may have been set by 
another package or the application). The package should also behave 
appropriately regardless of the MPI error handling being used. There is no 
requirement that the package provide a sequential (non-MPI) version, although 
this functionality is welcome, too. If the package provides a sequential 
version, there is no requirement that it be compatible or usable with other 
xSDK-compliant packages running without MPI.

  Barry



> On Apr 3, 2018, at 4:22 PM, Matthew Knepley  wrote:
> 
> On Tue, Apr 3, 2018 at 6:20 PM, Jed Brown  wrote:
> Derek Gaston  writes:
> 
> > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown  wrote:
> >
> >> Communicators should be cheap.  One per library per "size" isn't a huge
> >> number of communicators.
> >>
> >
> > I agree - but that's not what we're getting here.  We're getting one per
> > "object" (Mat / Preconditioner, etc.) associated with the library per
> > "size".  If we can fix that I agree that there's no problem (we use a lot
> > of libraries... but not 2000 separate ones simultaneously!).
> 
> So PETSc needs to dup and attach a hypre communicator because they
> aren't interested in doing it themselves.  Not hard to implement, just
> mildly annoying.
> 
> Can't someone tell them its an xSDK requirement?
> 
>Matt
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Matthew Knepley
On Tue, Apr 3, 2018 at 6:20 PM, Jed Brown  wrote:

> Derek Gaston  writes:
>
> > On Tue, Apr 3, 2018 at 4:06 PM Jed Brown  wrote:
> >
> >> Communicators should be cheap.  One per library per "size" isn't a huge
> >> number of communicators.
> >>
> >
> > I agree - but that's not what we're getting here.  We're getting one per
> > "object" (Mat / Preconditioner, etc.) associated with the library per
> > "size".  If we can fix that I agree that there's no problem (we use a lot
> > of libraries... but not 2000 separate ones simultaneously!).
>
> So PETSc needs to dup and attach a hypre communicator because they
> aren't interested in doing it themselves.  Not hard to implement, just
> mildly annoying.
>

Can't someone tell them its an xSDK requirement?

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston  writes:

> On Tue, Apr 3, 2018 at 4:06 PM Jed Brown  wrote:
>
>> Communicators should be cheap.  One per library per "size" isn't a huge
>> number of communicators.
>>
>
> I agree - but that's not what we're getting here.  We're getting one per
> "object" (Mat / Preconditioner, etc.) associated with the library per
> "size".  If we can fix that I agree that there's no problem (we use a lot
> of libraries... but not 2000 separate ones simultaneously!).

So PETSc needs to dup and attach a hypre communicator because they
aren't interested in doing it themselves.  Not hard to implement, just
mildly annoying.


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Matthew Knepley
On Tue, Apr 3, 2018 at 6:15 PM, Jed Brown  wrote:

> Matthew Knepley  writes:
>
> > On Tue, Apr 3, 2018 at 6:06 PM, Jed Brown  wrote:
> >
> >> Derek Gaston  writes:
> >>
> >> > Sounds great to me - what library do I download that we're all going
> to
> >> use
> >> > for managing the memory pool?  :-)
> >> >
> >> > Seriously though: why doesn't MPI give us an ability to get unique tag
> >> IDs
> >> > for a given communicator?
> >>
> >> It's called a dup'd communicator.
> >>
> >> > I like the way libMesh deals with this:
> >> > https://github.com/libMesh/libmesh/blob/master/include/
> >> parallel/parallel_implementation.h#L1343
> >>
> >> PETSc does something similar, but using attributes inside the MPI_Comm
> >> instead of as a wrapper that goes around the communicator.
> >>
> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/
> >> PetscCommGetNewTag.html
> >>
> >> > I would definitely sign on for all of us to use the same library for
> >> > getting unique tag IDs... and then we would need a lot less
> >> communicators...
> >>
> >> Communicators should be cheap.  One per library per "size" isn't a huge
> >> number of communicators.
> >>
> >
> > And this
> > https://experts.illinois.edu/en/publications/mpi-on-millions-of-cores
> still
> > got published? I guess
> > the reviewers never wanted any more than 2K communicators ;)
>
> These are unrelated concerns.  A million is only 2^20 and 20 is much
> less thann 2000.  The issue is that communicator etiquette between
> libraries isn't expressed in some visible statement of best practices,
> perhaps even a recommendation in the MPI standard.  If hypre dup'd
> communicators like PETSc, then we would all have less code and it would
> be nowhere near 2000 even in the massive MOOSE systems.
>

I understand. I was pointing out that even Bill went for the obvious
target, rather
than the usability standard. And if they agreed with the last line, they
should have pointed it out.

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
On Tue, Apr 3, 2018 at 4:06 PM Jed Brown  wrote:

> Communicators should be cheap.  One per library per "size" isn't a huge
> number of communicators.
>

I agree - but that's not what we're getting here.  We're getting one per
"object" (Mat / Preconditioner, etc.) associated with the library per
"size".  If we can fix that I agree that there's no problem (we use a lot
of libraries... but not 2000 separate ones simultaneously!).

Derek


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Matthew Knepley  writes:

> On Tue, Apr 3, 2018 at 6:06 PM, Jed Brown  wrote:
>
>> Derek Gaston  writes:
>>
>> > Sounds great to me - what library do I download that we're all going to
>> use
>> > for managing the memory pool?  :-)
>> >
>> > Seriously though: why doesn't MPI give us an ability to get unique tag
>> IDs
>> > for a given communicator?
>>
>> It's called a dup'd communicator.
>>
>> > I like the way libMesh deals with this:
>> > https://github.com/libMesh/libmesh/blob/master/include/
>> parallel/parallel_implementation.h#L1343
>>
>> PETSc does something similar, but using attributes inside the MPI_Comm
>> instead of as a wrapper that goes around the communicator.
>>
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/
>> PetscCommGetNewTag.html
>>
>> > I would definitely sign on for all of us to use the same library for
>> > getting unique tag IDs... and then we would need a lot less
>> communicators...
>>
>> Communicators should be cheap.  One per library per "size" isn't a huge
>> number of communicators.
>>
>
> And this
> https://experts.illinois.edu/en/publications/mpi-on-millions-of-cores still
> got published? I guess
> the reviewers never wanted any more than 2K communicators ;)

These are unrelated concerns.  A million is only 2^20 and 20 is much
less thann 2000.  The issue is that communicator etiquette between
libraries isn't expressed in some visible statement of best practices,
perhaps even a recommendation in the MPI standard.  If hypre dup'd
communicators like PETSc, then we would all have less code and it would
be nowhere near 2000 even in the massive MOOSE systems.


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Matthew Knepley
On Tue, Apr 3, 2018 at 6:06 PM, Jed Brown  wrote:

> Derek Gaston  writes:
>
> > Sounds great to me - what library do I download that we're all going to
> use
> > for managing the memory pool?  :-)
> >
> > Seriously though: why doesn't MPI give us an ability to get unique tag
> IDs
> > for a given communicator?
>
> It's called a dup'd communicator.
>
> > I like the way libMesh deals with this:
> > https://github.com/libMesh/libmesh/blob/master/include/
> parallel/parallel_implementation.h#L1343
>
> PETSc does something similar, but using attributes inside the MPI_Comm
> instead of as a wrapper that goes around the communicator.
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/
> PetscCommGetNewTag.html
>
> > I would definitely sign on for all of us to use the same library for
> > getting unique tag IDs... and then we would need a lot less
> communicators...
>
> Communicators should be cheap.  One per library per "size" isn't a huge
> number of communicators.
>

And this
https://experts.illinois.edu/en/publications/mpi-on-millions-of-cores still
got published? I guess
the reviewers never wanted any more than 2K communicators ;)

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston  writes:

> Sounds great to me - what library do I download that we're all going to use
> for managing the memory pool?  :-)
>
> Seriously though: why doesn't MPI give us an ability to get unique tag IDs
> for a given communicator?  

It's called a dup'd communicator.

> I like the way libMesh deals with this:
> https://github.com/libMesh/libmesh/blob/master/include/parallel/parallel_implementation.h#L1343

PETSc does something similar, but using attributes inside the MPI_Comm
instead of as a wrapper that goes around the communicator.

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscCommGetNewTag.html

> I would definitely sign on for all of us to use the same library for
> getting unique tag IDs... and then we would need a lot less communicators...

Communicators should be cheap.  One per library per "size" isn't a huge
number of communicators.


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.

   I would only implement it for hypre since we don't know if it is safe for 
other packages.

Best to insure the comm is freed as soon as possible (not waiting for 
PetscFinalize())

> On Apr 3, 2018, at 3:04 PM, Stefano Zampini  wrote:
> 
> What about
> 
> PetscCommGetPkgComm(MPI_Comm comm ,const char* package, MPI_Comm* pkgcomm)
> 
> with a key for each of the external packages PETSc can use?
> 
> 
>> On Apr 3, 2018, at 10:56 PM, Kong, Fande  wrote:
>> 
>> I think we could add an inner comm for external package. If the same comm is 
>> passed in again, we just retrieve the same communicator, instead of 
>> MPI_Comm_dup(), for that external package (at least HYPRE team claimed this 
>> will be fine).   I did not see any issue with this idea so far. 
>> 
>> I might be missing something here 
>> 
>> 
>> Fande,
>> 
>> On Tue, Apr 3, 2018 at 1:45 PM, Satish Balay  wrote:
>> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>> 
>> >
>> >
>> > > On Apr 3, 2018, at 11:59 AM, Balay, Satish  wrote:
>> > >
>> > > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>> > >
>> > >>   Note that PETSc does one MPI_Comm_dup() for each hypre matrix. 
>> > >> Internally hypre does at least one MPI_Comm_create() per hypre 
>> > >> boomerAMG solver. So even if PETSc does not do the MPI_Comm_dup() you 
>> > >> will still be limited due to hypre's MPI_Comm_create.
>> > >>
>> > >>I will compose an email to hypre cc:ing everyone to get information 
>> > >> from them.
>> > >
>> > > Actually I don't see any calls to MPI_Comm_dup() in hypre sources [there 
>> > > are stubs for it for non-mpi build]
>> > >
>> > > There was that call to MPI_Comm_create() in the stack trace [via 
>> > > hypre_BoomerAMGSetup]
>> >
>> >This is what I said. The MPI_Comm_create() is called for each solver 
>> > and hence uses a slot for each solver.
>> 
>> Ops sorry - misread the text..
>> 
>> Satish
>> 
> 



Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
Sounds great to me - what library do I download that we're all going to use
for managing the memory pool?  :-)

Seriously though: why doesn't MPI give us an ability to get unique tag IDs
for a given communicator?  I like the way libMesh deals with this:
https://github.com/libMesh/libmesh/blob/master/include/parallel/parallel_implementation.h#L1343

I would definitely sign on for all of us to use the same library for
getting unique tag IDs... and then we would need a lot less communicators...

Derek



On Tue, Apr 3, 2018 at 3:20 PM Jed Brown  wrote:

> Derek Gaston  writes:
>
> > Do you think there is any possibility of getting Hypre to use disjoint
> tags
> > from PETSc so you can just use the same comm?  Maybe a configure option
> to
> > Hypre to tell it what number to start at for its tags?
>
> Why have malloc when we could just coordinate each of our libraries to
> use non-overlapping memory segments???
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston  writes:

> Do you think there is any possibility of getting Hypre to use disjoint tags
> from PETSc so you can just use the same comm?  Maybe a configure option to
> Hypre to tell it what number to start at for its tags?

Why have malloc when we could just coordinate each of our libraries to
use non-overlapping memory segments???


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
It looks nice for me.

Fande,

On Tue, Apr 3, 2018 at 3:04 PM, Stefano Zampini 
wrote:

> What about
>
> PetscCommGetPkgComm(MPI_Comm comm ,const char* package, MPI_Comm* pkgcomm)
>
> with a key for each of the external packages PETSc can use?
>
>
> On Apr 3, 2018, at 10:56 PM, Kong, Fande  wrote:
>
> I think we could add an inner comm for external package. If the same comm
> is passed in again, we just retrieve the same communicator, instead of
> MPI_Comm_dup(), for that external package (at least HYPRE team claimed
> this will be fine).   I did not see any issue with this idea so far.
>
> I might be missing something here
>
>
> Fande,
>
> On Tue, Apr 3, 2018 at 1:45 PM, Satish Balay  wrote:
>
>> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>>
>> >
>> >
>> > > On Apr 3, 2018, at 11:59 AM, Balay, Satish  wrote:
>> > >
>> > > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>> > >
>> > >>   Note that PETSc does one MPI_Comm_dup() for each hypre matrix.
>> Internally hypre does at least one MPI_Comm_create() per hypre boomerAMG
>> solver. So even if PETSc does not do the MPI_Comm_dup() you will still be
>> limited due to hypre's MPI_Comm_create.
>> > >>
>> > >>I will compose an email to hypre cc:ing everyone to get
>> information from them.
>> > >
>> > > Actually I don't see any calls to MPI_Comm_dup() in hypre sources
>> [there are stubs for it for non-mpi build]
>> > >
>> > > There was that call to MPI_Comm_create() in the stack trace [via
>> hypre_BoomerAMGSetup]
>> >
>> >This is what I said. The MPI_Comm_create() is called for each solver
>> and hence uses a slot for each solver.
>>
>> Ops sorry - misread the text..
>>
>> Satish
>>
>
>
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Stefano Zampini
What about

PetscCommGetPkgComm(MPI_Comm comm ,const char* package, MPI_Comm* pkgcomm)

with a key for each of the external packages PETSc can use?


> On Apr 3, 2018, at 10:56 PM, Kong, Fande  wrote:
> 
> I think we could add an inner comm for external package. If the same comm is 
> passed in again, we just retrieve the same communicator, instead of 
> MPI_Comm_dup(), for that external package (at least HYPRE team claimed this 
> will be fine).   I did not see any issue with this idea so far. 
> 
> I might be missing something here 
> 
> 
> Fande,
> 
> On Tue, Apr 3, 2018 at 1:45 PM, Satish Balay  > wrote:
> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> 
> >
> >
> > > On Apr 3, 2018, at 11:59 AM, Balay, Satish  > > > wrote:
> > >
> > > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> > >
> > >>   Note that PETSc does one MPI_Comm_dup() for each hypre matrix. 
> > >> Internally hypre does at least one MPI_Comm_create() per hypre boomerAMG 
> > >> solver. So even if PETSc does not do the MPI_Comm_dup() you will still 
> > >> be limited due to hypre's MPI_Comm_create.
> > >>
> > >>I will compose an email to hypre cc:ing everyone to get information 
> > >> from them.
> > >
> > > Actually I don't see any calls to MPI_Comm_dup() in hypre sources [there 
> > > are stubs for it for non-mpi build]
> > >
> > > There was that call to MPI_Comm_create() in the stack trace [via 
> > > hypre_BoomerAMGSetup]
> >
> >This is what I said. The MPI_Comm_create() is called for each solver and 
> > hence uses a slot for each solver.
> 
> Ops sorry - misread the text..
> 
> Satish
> 



Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
I think we could add an inner comm for external package. If the same comm
is passed in again, we just retrieve the same communicator, instead of
MPI_Comm_dup(), for that external package (at least HYPRE team claimed this
will be fine).   I did not see any issue with this idea so far.

I might be missing something here


Fande,

On Tue, Apr 3, 2018 at 1:45 PM, Satish Balay  wrote:

> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>
> >
> >
> > > On Apr 3, 2018, at 11:59 AM, Balay, Satish  wrote:
> > >
> > > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> > >
> > >>   Note that PETSc does one MPI_Comm_dup() for each hypre matrix.
> Internally hypre does at least one MPI_Comm_create() per hypre boomerAMG
> solver. So even if PETSc does not do the MPI_Comm_dup() you will still be
> limited due to hypre's MPI_Comm_create.
> > >>
> > >>I will compose an email to hypre cc:ing everyone to get
> information from them.
> > >
> > > Actually I don't see any calls to MPI_Comm_dup() in hypre sources
> [there are stubs for it for non-mpi build]
> > >
> > > There was that call to MPI_Comm_create() in the stack trace [via
> hypre_BoomerAMGSetup]
> >
> >This is what I said. The MPI_Comm_create() is called for each solver
> and hence uses a slot for each solver.
>
> Ops sorry - misread the text..
>
> Satish
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Smith, Barry F. wrote:

> 
> 
> > On Apr 3, 2018, at 11:59 AM, Balay, Satish  wrote:
> > 
> > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> > 
> >>   Note that PETSc does one MPI_Comm_dup() for each hypre matrix. 
> >> Internally hypre does at least one MPI_Comm_create() per hypre boomerAMG 
> >> solver. So even if PETSc does not do the MPI_Comm_dup() you will still be 
> >> limited due to hypre's MPI_Comm_create.
> >> 
> >>I will compose an email to hypre cc:ing everyone to get information 
> >> from them.
> > 
> > Actually I don't see any calls to MPI_Comm_dup() in hypre sources [there 
> > are stubs for it for non-mpi build]
> > 
> > There was that call to MPI_Comm_create() in the stack trace [via 
> > hypre_BoomerAMGSetup]
> 
>This is what I said. The MPI_Comm_create() is called for each solver and 
> hence uses a slot for each solver.

Ops sorry - misread the text..

Satish


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
Do you think there is any possibility of getting Hypre to use disjoint tags
from PETSc so you can just use the same comm?  Maybe a configure option to
Hypre to tell it what number to start at for its tags?

Derek

On Tue, Apr 3, 2018 at 11:59 AM Satish Balay  wrote:

> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>
> >Note that PETSc does one MPI_Comm_dup() for each hypre matrix.
> Internally hypre does at least one MPI_Comm_create() per hypre boomerAMG
> solver. So even if PETSc does not do the MPI_Comm_dup() you will still be
> limited due to hypre's MPI_Comm_create.
> >
> > I will compose an email to hypre cc:ing everyone to get information
> from them.
>
> Actually I don't see any calls to MPI_Comm_dup() in hypre sources [there
> are stubs for it for non-mpi build]
>
> There was that call to MPI_Comm_create() in the stack trace [via
> hypre_BoomerAMGSetup]
>
> Satish
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.


> On Apr 3, 2018, at 11:59 AM, Balay, Satish  wrote:
> 
> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> 
>>   Note that PETSc does one MPI_Comm_dup() for each hypre matrix. Internally 
>> hypre does at least one MPI_Comm_create() per hypre boomerAMG solver. So 
>> even if PETSc does not do the MPI_Comm_dup() you will still be limited due 
>> to hypre's MPI_Comm_create.
>> 
>>I will compose an email to hypre cc:ing everyone to get information from 
>> them.
> 
> Actually I don't see any calls to MPI_Comm_dup() in hypre sources [there are 
> stubs for it for non-mpi build]
> 
> There was that call to MPI_Comm_create() in the stack trace [via 
> hypre_BoomerAMGSetup]

   This is what I said. The MPI_Comm_create() is called for each solver and 
hence uses a slot for each solver.

   Barry

> 
> Satish



Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Smith, Barry F. wrote:

>Note that PETSc does one MPI_Comm_dup() for each hypre matrix. Internally 
> hypre does at least one MPI_Comm_create() per hypre boomerAMG solver. So even 
> if PETSc does not do the MPI_Comm_dup() you will still be limited due to 
> hypre's MPI_Comm_create.
> 
> I will compose an email to hypre cc:ing everyone to get information from 
> them.

Actually I don't see any calls to MPI_Comm_dup() in hypre sources [there are 
stubs for it for non-mpi build]

There was that call to MPI_Comm_create() in the stack trace [via 
hypre_BoomerAMGSetup]

Satish


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.


> On Apr 3, 2018, at 11:41 AM, Kong, Fande  wrote:
> 
> 
> 
> On Tue, Apr 3, 2018 at 11:29 AM, Smith, Barry F.  wrote:
> 
>   Fande,
> 
>  The reason for MPI_Comm_dup() and the inner communicator is that this 
> communicator is used by hypre and so cannot "just" be a PETSc communicator. 
> We cannot have PETSc and hypre using the same communicator since they may 
> capture each others messages etc.
> 
>   See my pull request that I think should resolve the issue in the short 
> term,
> 
> Yes, it helps as well. 

   Good, we'll get it in to master after testing
> 
> The question becomes we can not have more than 2000 AMG solvers in one 
> application because each Hypre owns its communicator.  There is no way to 
> have all AMG solvers share the same HYPRE-sided communicator? Just like what 
> we are dong for PETSc objects?

   I cannot answer this question. This we need to consult with the hypre team. 

   Note that PETSc does one MPI_Comm_dup() for each hypre matrix. Internally 
hypre does at least one MPI_Comm_create() per hypre boomerAMG solver. So even 
if PETSc does not do the MPI_Comm_dup() you will still be limited due to 
hypre's MPI_Comm_create.

I will compose an email to hypre cc:ing everyone to get information from 
them.

   Barry

Another note: If you are creating 2 thousand subcommunicators (independent of 
using hypre etc) the limit on communicators is still going to hit you, so if 
you dream of millions of cores and thousands of sub communicators you will need 
to convince the MPI developers to support a larger number of communicators.

> 
> 
> Fande,
> 
>  
> 
> Barry
> 
> 
> > On Apr 3, 2018, at 11:21 AM, Kong, Fande  wrote:
> >
> > Figured out:
> >
> > The reason is that  in  MatCreate_HYPRE(Mat B), we call MPI_Comm_dup 
> > instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it 
> > does not actually create a communicator if the communicator is already 
> > known to PETSc.
> >
> > Furthermore, I do not think we should a comm in
> >
> > typedef struct {
> >   HYPRE_IJMatrix ij;
> >   HYPRE_IJVector x;
> >   HYPRE_IJVector b;
> >   MPI_Comm   comm;
> > } Mat_HYPRE;
> >
> > It is an inner data of Mat, and it should already the same comm as the Mat. 
> > I do not understand why the internal data has its own comm.
> >
> > The following patch fixed the issue (just deleted this extra comm).
> >
> > diff --git a/src/mat/impls/hypre/mhypre.c b/src/mat/impls/hypre/mhypre.c
> > index dc19892..d8cfe3d 100644
> > --- a/src/mat/impls/hypre/mhypre.c
> > +++ b/src/mat/impls/hypre/mhypre.c
> > @@ -74,7 +74,7 @@ static PetscErrorCode MatHYPRE_CreateFromMat(Mat A, 
> > Mat_HYPRE *hA)
> >rend   = A->rmap->rend;
> >cstart = A->cmap->rstart;
> >cend   = A->cmap->rend;
> > -  
> > PetscStackCallStandard(HYPRE_IJMatrixCreate,(hA->comm,rstart,rend-1,cstart,cend-1,>ij));
> > +  
> > PetscStackCallStandard(HYPRE_IJMatrixCreate,(PetscObjectComm((PetscObject)A),rstart,rend-1,cstart,cend-1,>ij));
> >
> > PetscStackCallStandard(HYPRE_IJMatrixSetObjectType,(hA->ij,HYPRE_PARCSR));
> >{
> >  PetscBool  same;
> > @@ -434,7 +434,7 @@ PetscErrorCode MatDestroy_HYPRE(Mat A)
> >if (hA->x) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->x));
> >if (hA->b) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->b));
> >if (hA->ij) PetscStackCallStandard(HYPRE_IJMatrixDestroy,(hA->ij));
> > -  if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}
> > +  /*if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}*/
> >ierr = 
> > PetscObjectComposeFunction((PetscObject)A,"MatConvert_hypre_aij_C",NULL);CHKERRQ(ierr);
> >ierr = PetscFree(A->data);CHKERRQ(ierr);
> >PetscFunctionReturn(0);
> > @@ -500,7 +500,8 @@ PETSC_EXTERN PetscErrorCode MatCreate_HYPRE(Mat B)
> >B->ops->destroy   = MatDestroy_HYPRE;
> >B->ops->assemblyend   = MatAssemblyEnd_HYPRE;
> >
> > -  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> > +  /*ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr); */
> > +  /*ierr = 
> > PetscCommDuplicate(PetscObjectComm((PetscObject)B),>comm,NULL);CHKERRQ(ierr);*/
> >ierr = PetscObjectChangeTypeName((PetscObject)B,MATHYPRE);CHKERRQ(ierr);
> >ierr = 
> > PetscObjectComposeFunction((PetscObject)B,"MatConvert_hypre_aij_C",MatConvert_HYPRE_AIJ);CHKERRQ(ierr);
> >PetscFunctionReturn(0);
> > diff --git a/src/mat/impls/hypre/mhypre.h b/src/mat/impls/hypre/mhypre.h
> > index 3d9ddd2..1189020 100644
> > --- a/src/mat/impls/hypre/mhypre.h
> > +++ b/src/mat/impls/hypre/mhypre.h
> > @@ -10,7 +10,7 @@ typedef struct {
> >HYPRE_IJMatrix ij;
> >HYPRE_IJVector x;
> >HYPRE_IJVector b;
> > -  MPI_Comm   comm;
> > +  /*MPI_Comm   comm;*/
> >  } Mat_HYPRE;
> >
> >
> >
> > Fande,
> >
> >
> >
> >
> > On Tue, Apr 3, 2018 at 10:35 AM, Satish Balay  wrote:
> > On Tue, 3 Apr 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
Derek Gaston  writes:

> Sorry, should read: "any one MPI process is not involved in more than ~2000
> *communicators*"

Yes, as intended.  Only the ranks in a communicator's group need to know
about the existence of that communicator.

> Derek
>
> On Tue, Apr 3, 2018 at 11:47 AM Derek Gaston  wrote:
>
>> On Tue, Apr 3, 2018 at 10:31 AM Satish Balay  wrote:
>>
>>> On Tue, 3 Apr 2018, Derek Gaston wrote:
>>> > Which does bring up a point: I have been able to do solves before with
>>> > ~50,000 separate PETSc solves without issue.  Is it because I was
>>> working
>>> > with MVAPICH on a cluster?  Does it just have a higher limit?
>>>
>>> Don't know - but thats easy to find out with a simple test code..
>>>
>>
>> I get 2044 using mvapich on my cluster too.
>>
>> The only thing I can think of as to why those massive problems work for me
>> is that any one MPI process is not involved in more than ~2000 processors
>> (because the communicators are split as you go down the hierarchy).  At
>> most, a single MPI process will see ~hundreds of PETSc solves but not
>> thousands.
>>
>> That said: it's just because of the current nature of the solves I'm doing
>> - it's definitely possible to have that not be the case with MOOSE.
>>
>> Derek
>>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
Sorry, should read: "any one MPI process is not involved in more than ~2000
*communicators*"

Derek

On Tue, Apr 3, 2018 at 11:47 AM Derek Gaston  wrote:

> On Tue, Apr 3, 2018 at 10:31 AM Satish Balay  wrote:
>
>> On Tue, 3 Apr 2018, Derek Gaston wrote:
>> > Which does bring up a point: I have been able to do solves before with
>> > ~50,000 separate PETSc solves without issue.  Is it because I was
>> working
>> > with MVAPICH on a cluster?  Does it just have a higher limit?
>>
>> Don't know - but thats easy to find out with a simple test code..
>>
>
> I get 2044 using mvapich on my cluster too.
>
> The only thing I can think of as to why those massive problems work for me
> is that any one MPI process is not involved in more than ~2000 processors
> (because the communicators are split as you go down the hierarchy).  At
> most, a single MPI process will see ~hundreds of PETSc solves but not
> thousands.
>
> That said: it's just because of the current nature of the solves I'm doing
> - it's definitely possible to have that not be the case with MOOSE.
>
> Derek
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
On Tue, Apr 3, 2018 at 10:31 AM Satish Balay  wrote:

> On Tue, 3 Apr 2018, Derek Gaston wrote:
> > Which does bring up a point: I have been able to do solves before with
> > ~50,000 separate PETSc solves without issue.  Is it because I was working
> > with MVAPICH on a cluster?  Does it just have a higher limit?
>
> Don't know - but thats easy to find out with a simple test code..
>

I get 2044 using mvapich on my cluster too.

The only thing I can think of as to why those massive problems work for me
is that any one MPI process is not involved in more than ~2000 processors
(because the communicators are split as you go down the hierarchy).  At
most, a single MPI process will see ~hundreds of PETSc solves but not
thousands.

That said: it's just because of the current nature of the solves I'm doing
- it's definitely possible to have that not be the case with MOOSE.

Derek


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
On Tue, Apr 3, 2018 at 11:29 AM, Smith, Barry F.  wrote:

>
>   Fande,
>
>  The reason for MPI_Comm_dup() and the inner communicator is that this
> communicator is used by hypre and so cannot "just" be a PETSc communicator.
> We cannot have PETSc and hypre using the same communicator since they may
> capture each others messages etc.
>
>   See my pull request that I think should resolve the issue in the
> short term,
>

Yes, it helps as well.

The question becomes we can not have more than 2000 AMG solvers in one
application because each Hypre owns its communicator.  There is no way to
have all AMG solvers share the same HYPRE-sided communicator? Just like
what we are dong for PETSc objects?


Fande,



>
> Barry
>
>
> > On Apr 3, 2018, at 11:21 AM, Kong, Fande  wrote:
> >
> > Figured out:
> >
> > The reason is that  in  MatCreate_HYPRE(Mat B), we call MPI_Comm_dup
> instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it
> does not actually create a communicator if the communicator is already
> known to PETSc.
> >
> > Furthermore, I do not think we should a comm in
> >
> > typedef struct {
> >   HYPRE_IJMatrix ij;
> >   HYPRE_IJVector x;
> >   HYPRE_IJVector b;
> >   MPI_Comm   comm;
> > } Mat_HYPRE;
> >
> > It is an inner data of Mat, and it should already the same comm as the
> Mat. I do not understand why the internal data has its own comm.
> >
> > The following patch fixed the issue (just deleted this extra comm).
> >
> > diff --git a/src/mat/impls/hypre/mhypre.c b/src/mat/impls/hypre/mhypre.c
> > index dc19892..d8cfe3d 100644
> > --- a/src/mat/impls/hypre/mhypre.c
> > +++ b/src/mat/impls/hypre/mhypre.c
> > @@ -74,7 +74,7 @@ static PetscErrorCode MatHYPRE_CreateFromMat(Mat A,
> Mat_HYPRE *hA)
> >rend   = A->rmap->rend;
> >cstart = A->cmap->rstart;
> >cend   = A->cmap->rend;
> > -  PetscStackCallStandard(HYPRE_IJMatrixCreate,(hA->comm,
> rstart,rend-1,cstart,cend-1,>ij));
> > +  PetscStackCallStandard(HYPRE_IJMatrixCreate,(
> PetscObjectComm((PetscObject)A),rstart,rend-1,cstart,cend-1,>ij));
> >PetscStackCallStandard(HYPRE_IJMatrixSetObjectType,(hA->ij,
> HYPRE_PARCSR));
> >{
> >  PetscBool  same;
> > @@ -434,7 +434,7 @@ PetscErrorCode MatDestroy_HYPRE(Mat A)
> >if (hA->x) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->x));
> >if (hA->b) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->b));
> >if (hA->ij) PetscStackCallStandard(HYPRE_IJMatrixDestroy,(hA->ij));
> > -  if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}
> > +  /*if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}*/
> >ierr = PetscObjectComposeFunction((PetscObject)A,"MatConvert_
> hypre_aij_C",NULL);CHKERRQ(ierr);
> >ierr = PetscFree(A->data);CHKERRQ(ierr);
> >PetscFunctionReturn(0);
> > @@ -500,7 +500,8 @@ PETSC_EXTERN PetscErrorCode MatCreate_HYPRE(Mat B)
> >B->ops->destroy   = MatDestroy_HYPRE;
> >B->ops->assemblyend   = MatAssemblyEnd_HYPRE;
> >
> > -  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);
> CHKERRQ(ierr);
> > +  /*ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> */
> > +  /*ierr = PetscCommDuplicate(PetscObjectComm((PetscObject)
> B),>comm,NULL);CHKERRQ(ierr);*/
> >ierr = PetscObjectChangeTypeName((PetscObject)B,MATHYPRE);
> CHKERRQ(ierr);
> >ierr = PetscObjectComposeFunction((PetscObject)B,"MatConvert_
> hypre_aij_C",MatConvert_HYPRE_AIJ);CHKERRQ(ierr);
> >PetscFunctionReturn(0);
> > diff --git a/src/mat/impls/hypre/mhypre.h b/src/mat/impls/hypre/mhypre.h
> > index 3d9ddd2..1189020 100644
> > --- a/src/mat/impls/hypre/mhypre.h
> > +++ b/src/mat/impls/hypre/mhypre.h
> > @@ -10,7 +10,7 @@ typedef struct {
> >HYPRE_IJMatrix ij;
> >HYPRE_IJVector x;
> >HYPRE_IJVector b;
> > -  MPI_Comm   comm;
> > +  /*MPI_Comm   comm;*/
> >  } Mat_HYPRE;
> >
> >
> >
> > Fande,
> >
> >
> >
> >
> > On Tue, Apr 3, 2018 at 10:35 AM, Satish Balay  wrote:
> > On Tue, 3 Apr 2018, Satish Balay wrote:
> >
> > > On Tue, 3 Apr 2018, Derek Gaston wrote:
> > >
> > > > One thing I want to be clear of here: is that we're not trying to
> solve
> > > > this particular problem (where we're creating 1000 instances of
> Hypre to
> > > > precondition each variable independently)... this particular problem
> is
> > > > just a test (that we've had in our test suite for a long time) to
> stress
> > > > test some of this capability.
> > > >
> > > > We really do have needs for thousands (tens of thousands) of
> simultaneous
> > > > solves (each with their own Hypre instances).  That's not what this
> > > > particular problem is doing - but it is representative of a class of
> our
> > > > problems we need to solve.
> > > >
> > > > Which does bring up a point: I have been able to do solves before
> with
> > > > ~50,000 separate PETSc solves without issue.  Is it because I was
> working
> > > > with MVAPICH on a cluster?  

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.

  Fande,

 The reason for MPI_Comm_dup() and the inner communicator is that this 
communicator is used by hypre and so cannot "just" be a PETSc communicator. We 
cannot have PETSc and hypre using the same communicator since they may capture 
each others messages etc.

  See my pull request that I think should resolve the issue in the short 
term,

Barry


> On Apr 3, 2018, at 11:21 AM, Kong, Fande  wrote:
> 
> Figured out:
> 
> The reason is that  in  MatCreate_HYPRE(Mat B), we call MPI_Comm_dup instead 
> of PetscCommDuplicate. The PetscCommDuplicate is better, and it does not 
> actually create a communicator if the communicator is already known to PETSc. 
> 
> Furthermore, I do not think we should a comm in 
> 
> typedef struct {
>   HYPRE_IJMatrix ij;
>   HYPRE_IJVector x;
>   HYPRE_IJVector b;
>   MPI_Comm   comm;
> } Mat_HYPRE;
> 
> It is an inner data of Mat, and it should already the same comm as the Mat. I 
> do not understand why the internal data has its own comm.
> 
> The following patch fixed the issue (just deleted this extra comm).
> 
> diff --git a/src/mat/impls/hypre/mhypre.c b/src/mat/impls/hypre/mhypre.c
> index dc19892..d8cfe3d 100644
> --- a/src/mat/impls/hypre/mhypre.c
> +++ b/src/mat/impls/hypre/mhypre.c
> @@ -74,7 +74,7 @@ static PetscErrorCode MatHYPRE_CreateFromMat(Mat A, 
> Mat_HYPRE *hA)
>rend   = A->rmap->rend;
>cstart = A->cmap->rstart;
>cend   = A->cmap->rend;
> -  
> PetscStackCallStandard(HYPRE_IJMatrixCreate,(hA->comm,rstart,rend-1,cstart,cend-1,>ij));
> +  
> PetscStackCallStandard(HYPRE_IJMatrixCreate,(PetscObjectComm((PetscObject)A),rstart,rend-1,cstart,cend-1,>ij));
>PetscStackCallStandard(HYPRE_IJMatrixSetObjectType,(hA->ij,HYPRE_PARCSR));
>{
>  PetscBool  same;
> @@ -434,7 +434,7 @@ PetscErrorCode MatDestroy_HYPRE(Mat A)
>if (hA->x) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->x));
>if (hA->b) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->b));
>if (hA->ij) PetscStackCallStandard(HYPRE_IJMatrixDestroy,(hA->ij));
> -  if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}
> +  /*if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}*/
>ierr = 
> PetscObjectComposeFunction((PetscObject)A,"MatConvert_hypre_aij_C",NULL);CHKERRQ(ierr);
>ierr = PetscFree(A->data);CHKERRQ(ierr);
>PetscFunctionReturn(0);
> @@ -500,7 +500,8 @@ PETSC_EXTERN PetscErrorCode MatCreate_HYPRE(Mat B)
>B->ops->destroy   = MatDestroy_HYPRE;
>B->ops->assemblyend   = MatAssemblyEnd_HYPRE;
>  
> -  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> +  /*ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr); */
> +  /*ierr = 
> PetscCommDuplicate(PetscObjectComm((PetscObject)B),>comm,NULL);CHKERRQ(ierr);*/
>ierr = PetscObjectChangeTypeName((PetscObject)B,MATHYPRE);CHKERRQ(ierr);
>ierr = 
> PetscObjectComposeFunction((PetscObject)B,"MatConvert_hypre_aij_C",MatConvert_HYPRE_AIJ);CHKERRQ(ierr);
>PetscFunctionReturn(0);
> diff --git a/src/mat/impls/hypre/mhypre.h b/src/mat/impls/hypre/mhypre.h
> index 3d9ddd2..1189020 100644
> --- a/src/mat/impls/hypre/mhypre.h
> +++ b/src/mat/impls/hypre/mhypre.h
> @@ -10,7 +10,7 @@ typedef struct {
>HYPRE_IJMatrix ij;
>HYPRE_IJVector x;
>HYPRE_IJVector b;
> -  MPI_Comm   comm;
> +  /*MPI_Comm   comm;*/
>  } Mat_HYPRE;
>  
> 
> 
> Fande,
> 
> 
> 
> 
> On Tue, Apr 3, 2018 at 10:35 AM, Satish Balay  wrote:
> On Tue, 3 Apr 2018, Satish Balay wrote:
> 
> > On Tue, 3 Apr 2018, Derek Gaston wrote:
> >
> > > One thing I want to be clear of here: is that we're not trying to solve
> > > this particular problem (where we're creating 1000 instances of Hypre to
> > > precondition each variable independently)... this particular problem is
> > > just a test (that we've had in our test suite for a long time) to stress
> > > test some of this capability.
> > >
> > > We really do have needs for thousands (tens of thousands) of simultaneous
> > > solves (each with their own Hypre instances).  That's not what this
> > > particular problem is doing - but it is representative of a class of our
> > > problems we need to solve.
> > >
> > > Which does bring up a point: I have been able to do solves before with
> > > ~50,000 separate PETSc solves without issue.  Is it because I was working
> > > with MVAPICH on a cluster?  Does it just have a higher limit?
> >
> > Don't know - but thats easy to find out with a simple test code..
> >
> > >>
> > $ cat comm_dup_test.c
> > #include 
> > #include 
> >
> > int main(int argc, char** argv) {
> > MPI_Comm newcomm;
> > int i, err;
> > MPI_Init(NULL, NULL);
> > for (i=0; i<10; i++) {
> >   err = MPI_Comm_dup(MPI_COMM_WORLD, );
> >   if (err) {
> >   printf("%5d - fail\n",i);fflush(stdout);
> >   break;
> > } else {
> >   printf("%5d - success\n",i);fflush(stdout);
> >   }
> >  

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Kong, Fande wrote:

> Figured out:
> 
> The reason is that  in  MatCreate_HYPRE(Mat B), we call MPI_Comm_dup
> instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it
> does not actually create a communicator if the communicator is already
> known to PETSc.
> 
> Furthermore, I do not think we should a comm in
> 
> 
> 
> 
> 
> 
> *typedef struct {  HYPRE_IJMatrix ij;  HYPRE_IJVector x;  HYPRE_IJVector
> b;  MPI_Comm   comm;} Mat_HYPRE;*
> 
> It is an inner data of Mat, and it should already the same comm as the Mat.
> I do not understand why the internal data has its own comm.

As mentioned before - we create a separate comm using MPI_Comm_dup() to pass it 
down to hypre routines [corresponding to this petsc object]. 
PetscCommDuplicate() is used for all comm usage required within PETSc. 

The problem with this change is - now 2 libraries are using the same 
communicator - but don't know how to manage 'tags' in separate space - thus 
potentially resulting in
messages from hypre picked up by petsc or vice versa.

Satish

> 
> The following patch fixed the issue (just deleted this extra comm).
> 
> diff --git a/src/mat/impls/hypre/mhypre.c b/src/mat/impls/hypre/mhypre.c
> index dc19892..d8cfe3d 100644
> --- a/src/mat/impls/hypre/mhypre.c
> +++ b/src/mat/impls/hypre/mhypre.c
> @@ -74,7 +74,7 @@ static PetscErrorCode MatHYPRE_CreateFromMat(Mat A,
> Mat_HYPRE *hA)
>rend   = A->rmap->rend;
>cstart = A->cmap->rstart;
>cend   = A->cmap->rend;
> -
> PetscStackCallStandard(HYPRE_IJMatrixCreate,(hA->comm,rstart,rend-1,cstart,cend-1,>ij));
> +
> PetscStackCallStandard(HYPRE_IJMatrixCreate,(PetscObjectComm((PetscObject)A),rstart,rend-1,cstart,cend-1,>ij));
> 
> PetscStackCallStandard(HYPRE_IJMatrixSetObjectType,(hA->ij,HYPRE_PARCSR));
>{
>  PetscBool  same;
> @@ -434,7 +434,7 @@ PetscErrorCode MatDestroy_HYPRE(Mat A)
>if (hA->x) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->x));
>if (hA->b) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->b));
>if (hA->ij) PetscStackCallStandard(HYPRE_IJMatrixDestroy,(hA->ij));
> -  if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}
> +  /*if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}*/
>ierr =
> PetscObjectComposeFunction((PetscObject)A,"MatConvert_hypre_aij_C",NULL);CHKERRQ(ierr);
>ierr = PetscFree(A->data);CHKERRQ(ierr);
>PetscFunctionReturn(0);
> @@ -500,7 +500,8 @@ PETSC_EXTERN PetscErrorCode MatCreate_HYPRE(Mat B)
>B->ops->destroy   = MatDestroy_HYPRE;
>B->ops->assemblyend   = MatAssemblyEnd_HYPRE;
> 
> -  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> +  /*ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr); */
> +  /*ierr =
> PetscCommDuplicate(PetscObjectComm((PetscObject)B),>comm,NULL);CHKERRQ(ierr);*/
>ierr = PetscObjectChangeTypeName((PetscObject)B,MATHYPRE);CHKERRQ(ierr);
>ierr =
> PetscObjectComposeFunction((PetscObject)B,"MatConvert_hypre_aij_C",MatConvert_HYPRE_AIJ);CHKERRQ(ierr);
>PetscFunctionReturn(0);
> diff --git a/src/mat/impls/hypre/mhypre.h b/src/mat/impls/hypre/mhypre.h
> index 3d9ddd2..1189020 100644
> --- a/src/mat/impls/hypre/mhypre.h
> +++ b/src/mat/impls/hypre/mhypre.h
> @@ -10,7 +10,7 @@ typedef struct {
>HYPRE_IJMatrix ij;
>HYPRE_IJVector x;
>HYPRE_IJVector b;
> -  MPI_Comm   comm;
> +  /*MPI_Comm   comm;*/
>  } Mat_HYPRE;
> 
> 
> 
> Fande,
> 
> 
> 
> 
> On Tue, Apr 3, 2018 at 10:35 AM, Satish Balay  wrote:
> 
> > On Tue, 3 Apr 2018, Satish Balay wrote:
> >
> > > On Tue, 3 Apr 2018, Derek Gaston wrote:
> > >
> > > > One thing I want to be clear of here: is that we're not trying to solve
> > > > this particular problem (where we're creating 1000 instances of Hypre
> > to
> > > > precondition each variable independently)... this particular problem is
> > > > just a test (that we've had in our test suite for a long time) to
> > stress
> > > > test some of this capability.
> > > >
> > > > We really do have needs for thousands (tens of thousands) of
> > simultaneous
> > > > solves (each with their own Hypre instances).  That's not what this
> > > > particular problem is doing - but it is representative of a class of
> > our
> > > > problems we need to solve.
> > > >
> > > > Which does bring up a point: I have been able to do solves before with
> > > > ~50,000 separate PETSc solves without issue.  Is it because I was
> > working
> > > > with MVAPICH on a cluster?  Does it just have a higher limit?
> > >
> > > Don't know - but thats easy to find out with a simple test code..
> > >
> > > >>
> > > $ cat comm_dup_test.c
> > > #include 
> > > #include 
> > >
> > > int main(int argc, char** argv) {
> > > MPI_Comm newcomm;
> > > int i, err;
> > > MPI_Init(NULL, NULL);
> > > for (i=0; i<10; i++) {
> > >   err = MPI_Comm_dup(MPI_COMM_WORLD, );
> > >   if (err) {
> > >   printf("%5d - 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.

   Fande,

  Please try the branch 
https://bitbucket.org/petsc/petsc/pull-requests/921/boomeramg-unlike-2-other-hypre/diff
  

   It does not "solve" the problem but it should get your current test that 
now fails to run again,

   Barry


> On Apr 3, 2018, at 10:14 AM, Kong, Fande  wrote:
> 
> The first bad commit:
> 
> commit 49a781f5cee36db85e8d5b951eec29f10ac13593
> Author: Stefano Zampini 
> Date:   Sat Nov 5 20:15:19 2016 +0300
> 
> PCHYPRE: use internal Mat of type MatHYPRE
> 
> hpmat already stores two HYPRE vectors
> 
> 
> Hypre version:
> 
> ~/projects/petsc/arch-darwin-c-opt-bisect_bad/externalpackages/git.hypre]> 
> git branch 
> * (HEAD detached at 83b1f19)
> 
> 
> 
> The last good commit:
> 
> commit 63c07aad33d943fe85193412d077a1746a7c55aa
> Author: Stefano Zampini 
> Date:   Sat Nov 5 19:30:12 2016 +0300
> 
> MatHYPRE: create new matrix type
> 
> The conversion from AIJ to HYPRE has been taken from 
> src/dm/impls/da/hypre/mhyp.c
> HYPRE to AIJ is new
> 
> Hypre version:
> 
> /projects/petsc/arch-darwin-c-opt-bisect/externalpackages/git.hypre]> git 
> branch 
> * (HEAD detached at 83b1f19)
> 
> 
> 
> 
> 
> We are using the same HYPRE version.
> 
> 
> I will narrow down line-by-line.
> 
> 
> Fande,
> 
> 
> On Tue, Apr 3, 2018 at 9:50 AM, Stefano Zampini  
> wrote:
> 
>> On Apr 3, 2018, at 5:43 PM, Fande Kong  wrote:
>> 
>> 
>> 
>> On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini  
>> wrote:
>> 
>>> On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
>>> 
>>> On Tue, 3 Apr 2018, Kong, Fande wrote:
>>> 
 On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F.  wrote:
 
> 
>   Each external package definitely needs its own duplicated communicator;
> cannot share between packages.
> 
>   The only problem with the dups below is if they are in a loop and get
> called many times.
> 
 
 
 The "standard test" that has this issue actually has 1K fields. MOOSE
 creates its own field-split preconditioner (not based on the PETSc
 fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
 duplicates communicators, we should easily reach the limit 2048.
 
 I also want to confirm what extra communicators are introduced in the bad
 commit.
>>> 
>>> To me it looks like there is 1 extra comm created [for MATHYPRE] for each 
>>> PCHYPRE that is created [which also creates one comm for this object].
>>> 
>> 
>> You’re right; however, it was the same before the commit.
>> I don’t understand how this specific commit is related with this issue, 
>> being the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE. 
>> Actually, the error comes from MPI_Comm_create
>> 
>> frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create + 3492
>> frame #6: 0x0001061345d9 
>> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
>> participate=, new_comm_ptr=) + 409 at 
>> gen_redcs_mat.c:531 [opt]
>> frame #7: 0x00010618f8ba 
>> libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00, 
>> level=, relax_type=9) + 74 at par_relax.c:4209 [opt]
>> frame #8: 0x000106140e93 
>> libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=, 
>> A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699 at 
>> par_amg_setup.c:2108 [opt]
>> frame #9: 0x000105ec773c 
>> libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226 
>> [opt
>> 
>> How did you perform the bisection? make clean + make all ? Which version of 
>> HYPRE are you using?
>> 
>> I did more aggressively.  
>> 
>> "rm -rf  arch-darwin-c-opt-bisect   "
>> 
>> "./configure  --optionsModule=config.compilerOptions -with-debugging=no 
>> --with-shared-libraries=1 --with-mpi=1 --download-fblaslapack=1 
>> --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 
>> --download-hypre=1 --download-mumps=1 --download-scalapack=1 
>> PETSC_ARCH=arch-darwin-c-opt-bisect"
>> 
> 
> Good, so this removes some possible sources of errors
>> 
>> HYPRE verison:
>> 
>> 
>> self.gitcommit = 'v2.11.1-55-g2ea0e43'
>> self.download  = 
>> ['git://https://github.com/LLNL/hypre','https://github.com/LLNL/hypre/archive/'+self.gitcommit+'.tar.gz']
>> 
>> 
> 
> When reconfiguring, the  HYPRE version can be different too (that commit is 
> from 11/2016, so the HYPRE version used by the PETSc configure can have been 
> upgraded too)
> 
>> I do not think this is caused by HYPRE.
>> 
>> Fande,
>> 
>>  
>> 
>>> But you might want to verify [by linking with mpi trace library?]
>>> 
>>> 
>>> There are some debugging hints at 
>>> https://lists.mpich.org/pipermail/discuss/2012-December/000148.html [wrt 
>>> mpich] - which I haven't checked..
>>> 
>>> Satish
>>> 
 
 
 Fande,
 
 
 
> 
>  

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
Figured out:

The reason is that  in  MatCreate_HYPRE(Mat B), we call MPI_Comm_dup
instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it
does not actually create a communicator if the communicator is already
known to PETSc.

Furthermore, I do not think we should a comm in






*typedef struct {  HYPRE_IJMatrix ij;  HYPRE_IJVector x;  HYPRE_IJVector
b;  MPI_Comm   comm;} Mat_HYPRE;*

It is an inner data of Mat, and it should already the same comm as the Mat.
I do not understand why the internal data has its own comm.

The following patch fixed the issue (just deleted this extra comm).

diff --git a/src/mat/impls/hypre/mhypre.c b/src/mat/impls/hypre/mhypre.c
index dc19892..d8cfe3d 100644
--- a/src/mat/impls/hypre/mhypre.c
+++ b/src/mat/impls/hypre/mhypre.c
@@ -74,7 +74,7 @@ static PetscErrorCode MatHYPRE_CreateFromMat(Mat A,
Mat_HYPRE *hA)
   rend   = A->rmap->rend;
   cstart = A->cmap->rstart;
   cend   = A->cmap->rend;
-
PetscStackCallStandard(HYPRE_IJMatrixCreate,(hA->comm,rstart,rend-1,cstart,cend-1,>ij));
+
PetscStackCallStandard(HYPRE_IJMatrixCreate,(PetscObjectComm((PetscObject)A),rstart,rend-1,cstart,cend-1,>ij));

PetscStackCallStandard(HYPRE_IJMatrixSetObjectType,(hA->ij,HYPRE_PARCSR));
   {
 PetscBool  same;
@@ -434,7 +434,7 @@ PetscErrorCode MatDestroy_HYPRE(Mat A)
   if (hA->x) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->x));
   if (hA->b) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->b));
   if (hA->ij) PetscStackCallStandard(HYPRE_IJMatrixDestroy,(hA->ij));
-  if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}
+  /*if (hA->comm) { ierr = MPI_Comm_free(>comm);CHKERRQ(ierr);}*/
   ierr =
PetscObjectComposeFunction((PetscObject)A,"MatConvert_hypre_aij_C",NULL);CHKERRQ(ierr);
   ierr = PetscFree(A->data);CHKERRQ(ierr);
   PetscFunctionReturn(0);
@@ -500,7 +500,8 @@ PETSC_EXTERN PetscErrorCode MatCreate_HYPRE(Mat B)
   B->ops->destroy   = MatDestroy_HYPRE;
   B->ops->assemblyend   = MatAssemblyEnd_HYPRE;

-  ierr =
MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
+  /*ierr =
MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr); */
+  /*ierr =
PetscCommDuplicate(PetscObjectComm((PetscObject)B),>comm,NULL);CHKERRQ(ierr);*/
   ierr = PetscObjectChangeTypeName((PetscObject)B,MATHYPRE);CHKERRQ(ierr);
   ierr =
PetscObjectComposeFunction((PetscObject)B,"MatConvert_hypre_aij_C",MatConvert_HYPRE_AIJ);CHKERRQ(ierr);
   PetscFunctionReturn(0);
diff --git a/src/mat/impls/hypre/mhypre.h b/src/mat/impls/hypre/mhypre.h
index 3d9ddd2..1189020 100644
--- a/src/mat/impls/hypre/mhypre.h
+++ b/src/mat/impls/hypre/mhypre.h
@@ -10,7 +10,7 @@ typedef struct {
   HYPRE_IJMatrix ij;
   HYPRE_IJVector x;
   HYPRE_IJVector b;
-  MPI_Comm   comm;
+  /*MPI_Comm   comm;*/
 } Mat_HYPRE;



Fande,




On Tue, Apr 3, 2018 at 10:35 AM, Satish Balay  wrote:

> On Tue, 3 Apr 2018, Satish Balay wrote:
>
> > On Tue, 3 Apr 2018, Derek Gaston wrote:
> >
> > > One thing I want to be clear of here: is that we're not trying to solve
> > > this particular problem (where we're creating 1000 instances of Hypre
> to
> > > precondition each variable independently)... this particular problem is
> > > just a test (that we've had in our test suite for a long time) to
> stress
> > > test some of this capability.
> > >
> > > We really do have needs for thousands (tens of thousands) of
> simultaneous
> > > solves (each with their own Hypre instances).  That's not what this
> > > particular problem is doing - but it is representative of a class of
> our
> > > problems we need to solve.
> > >
> > > Which does bring up a point: I have been able to do solves before with
> > > ~50,000 separate PETSc solves without issue.  Is it because I was
> working
> > > with MVAPICH on a cluster?  Does it just have a higher limit?
> >
> > Don't know - but thats easy to find out with a simple test code..
> >
> > >>
> > $ cat comm_dup_test.c
> > #include 
> > #include 
> >
> > int main(int argc, char** argv) {
> > MPI_Comm newcomm;
> > int i, err;
> > MPI_Init(NULL, NULL);
> > for (i=0; i<10; i++) {
> >   err = MPI_Comm_dup(MPI_COMM_WORLD, );
> >   if (err) {
> >   printf("%5d - fail\n",i);fflush(stdout);
> >   break;
> > } else {
> >   printf("%5d - success\n",i);fflush(stdout);
> >   }
> > }
> > MPI_Finalize();
> > }
> > <<<
> >
> > OpenMPI fails after '65531' and mpich after '2044'. MVAPICH is derived
> > off MPICH - but its possible they have a different limit than MPICH.
>
> BTW: the above is  with: openmpi-2.1.2 and mpich-3.3b1
>
> mvapich2-1.9.5 - and I get error after '2044' comm dupes
>
> Satish
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Satish Balay wrote:

> On Tue, 3 Apr 2018, Derek Gaston wrote:
> 
> > One thing I want to be clear of here: is that we're not trying to solve
> > this particular problem (where we're creating 1000 instances of Hypre to
> > precondition each variable independently)... this particular problem is
> > just a test (that we've had in our test suite for a long time) to stress
> > test some of this capability.
> > 
> > We really do have needs for thousands (tens of thousands) of simultaneous
> > solves (each with their own Hypre instances).  That's not what this
> > particular problem is doing - but it is representative of a class of our
> > problems we need to solve.
> > 
> > Which does bring up a point: I have been able to do solves before with
> > ~50,000 separate PETSc solves without issue.  Is it because I was working
> > with MVAPICH on a cluster?  Does it just have a higher limit?
> 
> Don't know - but thats easy to find out with a simple test code..
> 
> >>
> $ cat comm_dup_test.c
> #include 
> #include 
> 
> int main(int argc, char** argv) {
> MPI_Comm newcomm;
> int i, err;
> MPI_Init(NULL, NULL);
> for (i=0; i<10; i++) {
>   err = MPI_Comm_dup(MPI_COMM_WORLD, );
>   if (err) {
>   printf("%5d - fail\n",i);fflush(stdout);
>   break;
> } else {
>   printf("%5d - success\n",i);fflush(stdout);
>   }
> }
> MPI_Finalize();
> }
> <<<
> 
> OpenMPI fails after '65531' and mpich after '2044'. MVAPICH is derived
> off MPICH - but its possible they have a different limit than MPICH.

BTW: the above is  with: openmpi-2.1.2 and mpich-3.3b1

mvapich2-1.9.5 - and I get error after '2044' comm dupes

Satish


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Derek Gaston wrote:

> One thing I want to be clear of here: is that we're not trying to solve
> this particular problem (where we're creating 1000 instances of Hypre to
> precondition each variable independently)... this particular problem is
> just a test (that we've had in our test suite for a long time) to stress
> test some of this capability.
> 
> We really do have needs for thousands (tens of thousands) of simultaneous
> solves (each with their own Hypre instances).  That's not what this
> particular problem is doing - but it is representative of a class of our
> problems we need to solve.
> 
> Which does bring up a point: I have been able to do solves before with
> ~50,000 separate PETSc solves without issue.  Is it because I was working
> with MVAPICH on a cluster?  Does it just have a higher limit?

Don't know - but thats easy to find out with a simple test code..

>>
$ cat comm_dup_test.c
#include 
#include 

int main(int argc, char** argv) {
MPI_Comm newcomm;
int i, err;
MPI_Init(NULL, NULL);
for (i=0; i<10; i++) {
  err = MPI_Comm_dup(MPI_COMM_WORLD, );
  if (err) {
  printf("%5d - fail\n",i);fflush(stdout);
  break;
} else {
  printf("%5d - success\n",i);fflush(stdout);
  }
}
MPI_Finalize();
}
<<<

OpenMPI fails after '65531' and mpich after '2044'. MVAPICH is derived
off MPICH - but its possible they have a different limit than MPICH.

Satish


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
The first bad commit:








*commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
>Date:   Sat Nov 5
20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
hpmat already stores two HYPRE vectors*

Hypre version:

~/projects/petsc/arch-darwin-c-opt-bisect_bad/externalpackages/git.hypre]>
git branch
* (HEAD detached at 83b1f19)



The last good commit:








*commit 63c07aad33d943fe85193412d077a1746a7c55aaAuthor: Stefano Zampini
>Date:   Sat Nov 5
19:30:12 2016 +0300MatHYPRE: create new matrix typeThe
conversion from AIJ to HYPRE has been taken from
src/dm/impls/da/hypre/mhyp.cHYPRE to AIJ is new*

Hypre version:

/projects/petsc/arch-darwin-c-opt-bisect/externalpackages/git.hypre]> git
branch
* (HEAD detached at 83b1f19)





We are using the same HYPRE version.


I will narrow down line-by-line.


Fande,


On Tue, Apr 3, 2018 at 9:50 AM, Stefano Zampini 
wrote:

>
> On Apr 3, 2018, at 5:43 PM, Fande Kong  wrote:
>
>
>
> On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini  > wrote:
>
>>
>> On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
>>
>> On Tue, 3 Apr 2018, Kong, Fande wrote:
>>
>> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. 
>> wrote:
>>
>>
>>   Each external package definitely needs its own duplicated communicator;
>> cannot share between packages.
>>
>>   The only problem with the dups below is if they are in a loop and get
>> called many times.
>>
>>
>>
>> The "standard test" that has this issue actually has 1K fields. MOOSE
>> creates its own field-split preconditioner (not based on the PETSc
>> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
>> duplicates communicators, we should easily reach the limit 2048.
>>
>> I also want to confirm what extra communicators are introduced in the bad
>> commit.
>>
>>
>> To me it looks like there is 1 extra comm created [for MATHYPRE] for each
>> PCHYPRE that is created [which also creates one comm for this object].
>>
>>
>> You’re right; however, it was the same before the commit.
>> I don’t understand how this specific commit is related with this issue,
>> being the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE.
>> Actually, the error comes from MPI_Comm_create
>>
>>
>>
>>
>>
>> *frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create +
>> 3492frame #6: 0x0001061345d9
>> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852,
>> participate=, new_comm_ptr=) + 409 at
>> gen_redcs_mat.c:531 [opt]frame #7: 0x00010618f8ba
>> libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00,
>> level=, relax_type=9) + 74 at par_relax.c:4209 [opt]frame
>> #8: 0x000106140e93
>> libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=,
>> A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699
>> at par_amg_setup.c:2108 [opt]frame #9: 0x000105ec773c
>> libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226
>> [opt*
>>
>> How did you perform the bisection? make clean + make all ? Which version
>> of HYPRE are you using?
>>
>
> I did more aggressively.
>
> "rm -rf  arch-darwin-c-opt-bisect   "
>
> "./configure  --optionsModule=config.compilerOptions -with-debugging=no
> --with-shared-libraries=1 --with-mpi=1 --download-fblaslapack=1
> --download-metis=1 --download-parmetis=1 --download-superlu_dist=1
> --download-hypre=1 --download-mumps=1 --download-scalapack=1
> PETSC_ARCH=arch-darwin-c-opt-bisect"
>
>
> Good, so this removes some possible sources of errors
>
>
> HYPRE verison:
>
>
> self.gitcommit = 'v2.11.1-55-g2ea0e43'
> self.download  = ['git://https://github.com/LLNL/hypre
> 
> ','https://github.com/LLNL/hypre/archive/'+self.gitcommit+'.tar.gz
> 
> ']
>
>
>
> When reconfiguring, the  HYPRE version can be different too (that commit
> is from 11/2016, so the HYPRE version used by the PETSc configure can have
> been upgraded too)
>
> I do not think this is caused by HYPRE.
>
>
> Fande,
>
>
>
>>
>> But you might want to verify [by linking with mpi trace library?]
>>
>>
>> There are some debugging hints at https://lists.mpich.org/piperm
>> ail/discuss/2012-December/000148.html
>> 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Stefano Zampini

> On Apr 3, 2018, at 5:43 PM, Fande Kong  wrote:
> 
> 
> 
> On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini  > wrote:
> 
>> On Apr 3, 2018, at 4:58 PM, Satish Balay > > wrote:
>> 
>> On Tue, 3 Apr 2018, Kong, Fande wrote:
>> 
>>> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. >> > wrote:
>>> 
 
   Each external package definitely needs its own duplicated communicator;
 cannot share between packages.
 
   The only problem with the dups below is if they are in a loop and get
 called many times.
 
>>> 
>>> 
>>> The "standard test" that has this issue actually has 1K fields. MOOSE
>>> creates its own field-split preconditioner (not based on the PETSc
>>> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
>>> duplicates communicators, we should easily reach the limit 2048.
>>> 
>>> I also want to confirm what extra communicators are introduced in the bad
>>> commit.
>> 
>> To me it looks like there is 1 extra comm created [for MATHYPRE] for each 
>> PCHYPRE that is created [which also creates one comm for this object].
>> 
> 
> You’re right; however, it was the same before the commit.
> I don’t understand how this specific commit is related with this issue, being 
> the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE. 
> Actually, the error comes from MPI_Comm_create
> 
> frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create + 3492
> frame #6: 0x0001061345d9 
> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
> participate=, new_comm_ptr=) + 409 at 
> gen_redcs_mat.c:531 [opt]
> frame #7: 0x00010618f8ba 
> libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00, 
> level=, relax_type=9) + 74 at par_relax.c:4209 [opt]
> frame #8: 0x000106140e93 
> libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=, 
> A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699 at 
> par_amg_setup.c:2108 [opt]
> frame #9: 0x000105ec773c 
> libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226 [opt
> 
> How did you perform the bisection? make clean + make all ? Which version of 
> HYPRE are you using?
> 
> I did more aggressively.  
> 
> "rm -rf  arch-darwin-c-opt-bisect   "
> 
> "./configure  --optionsModule=config.compilerOptions -with-debugging=no 
> --with-shared-libraries=1 --with-mpi=1 --download-fblaslapack=1 
> --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 
> --download-hypre=1 --download-mumps=1 --download-scalapack=1 
> PETSC_ARCH=arch-darwin-c-opt-bisect"
> 

Good, so this removes some possible sources of errors
> 
> HYPRE verison:
> 
> 
> self.gitcommit = 'v2.11.1-55-g2ea0e43'
> self.download  = ['git://https://github.com/LLNL/hypre 
> ','https://github.com/LLNL/hypre/archive/'+self.gitcommit+'.tar.gz
>  ']
> 
> 

When reconfiguring, the  HYPRE version can be different too (that commit is 
from 11/2016, so the HYPRE version used by the PETSc configure can have been 
upgraded too)

> I do not think this is caused by HYPRE.
> 
> Fande,
> 
>  
> 
>> But you might want to verify [by linking with mpi trace library?]
>> 
>> 
>> There are some debugging hints at 
>> https://lists.mpich.org/pipermail/discuss/2012-December/000148.html 
>>  [wrt 
>> mpich] - which I haven't checked..
>> 
>> Satish
>> 
>>> 
>>> 
>>> Fande,
>>> 
>>> 
>>> 
 
To debug the hypre/duplication issue in MOOSE I would run in the
 debugger with a break point in MPI_Comm_dup() and see
 who keeps calling it an unreasonable amount of times. (My guess is this is
 a new "feature" in hypre that they will need to fix but only debugging will
 tell)
 
   Barry
 
 
> On Apr 2, 2018, at 7:44 PM, Balay, Satish  > wrote:
> 
> We do a MPI_Comm_dup() for objects related to externalpackages.
> 
> Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
> 
> src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
 PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
 PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
 CHKERRQ(ierr);
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
 PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
> src/ksp/pc/impls/hypre/hypre.c:  ierr = 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
One thing I want to be clear of here: is that we're not trying to solve
this particular problem (where we're creating 1000 instances of Hypre to
precondition each variable independently)... this particular problem is
just a test (that we've had in our test suite for a long time) to stress
test some of this capability.

We really do have needs for thousands (tens of thousands) of simultaneous
solves (each with their own Hypre instances).  That's not what this
particular problem is doing - but it is representative of a class of our
problems we need to solve.

Which does bring up a point: I have been able to do solves before with
~50,000 separate PETSc solves without issue.  Is it because I was working
with MVAPICH on a cluster?  Does it just have a higher limit?

Derek

On Tue, Apr 3, 2018 at 9:13 AM Stefano Zampini 
wrote:

> On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
>
> On Tue, 3 Apr 2018, Kong, Fande wrote:
>
> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. 
> wrote:
>
>
>   Each external package definitely needs its own duplicated communicator;
> cannot share between packages.
>
>   The only problem with the dups below is if they are in a loop and get
> called many times.
>
>
>
> The "standard test" that has this issue actually has 1K fields. MOOSE
> creates its own field-split preconditioner (not based on the PETSc
> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
> duplicates communicators, we should easily reach the limit 2048.
>
> I also want to confirm what extra communicators are introduced in the bad
> commit.
>
>
> To me it looks like there is 1 extra comm created [for MATHYPRE] for each
> PCHYPRE that is created [which also creates one comm for this object].
>
>
> You’re right; however, it was the same before the commit.
> I don’t understand how this specific commit is related with this issue,
> being the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE.
> Actually, the error comes from MPI_Comm_create
>
>
>
>
>
> *frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create +
> 3492frame #6: 0x0001061345d9
> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852,
> participate=, new_comm_ptr=) + 409 at
> gen_redcs_mat.c:531 [opt]frame #7: 0x00010618f8ba
> libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00,
> level=, relax_type=9) + 74 at par_relax.c:4209 [opt]frame
> #8: 0x000106140e93
> libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=,
> A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699
> at par_amg_setup.c:2108 [opt]frame #9: 0x000105ec773c
> libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226
> [opt*
>
> How did you perform the bisection? make clean + make all ? Which version
> of HYPRE are you using?
>
> But you might want to verify [by linking with mpi trace library?]
>
>
> There are some debugging hints at
> https://lists.mpich.org/pipermail/discuss/2012-December/000148.html [wrt
> mpich] - which I haven't checked..
>
> Satish
>
>
>
> Fande,
>
>
>
>
>To debug the hypre/duplication issue in MOOSE I would run in the
> debugger with a break point in MPI_Comm_dup() and see
> who keeps calling it an unreasonable amount of times. (My guess is this is
> a new "feature" in hypre that they will need to fix but only debugging will
> tell)
>
>   Barry
>
>
> On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
>
> We do a MPI_Comm_dup() for objects related to externalpackages.
>
> Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
>
> src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>
> src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>
> src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
>
> CHKERRQ(ierr);
>
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
>
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
>
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
>
> src/ksp/pc/impls/spai/ispai.c:  ierr  =
>
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_
> spai));CHKERRQ(ierr);
>
> src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
>
> );CHKERRQ(ierr);
>
> src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
>
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_
> cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
>
> src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
>
> 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Fande Kong
On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini 
wrote:

>
> On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
>
> On Tue, 3 Apr 2018, Kong, Fande wrote:
>
> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. 
> wrote:
>
>
>   Each external package definitely needs its own duplicated communicator;
> cannot share between packages.
>
>   The only problem with the dups below is if they are in a loop and get
> called many times.
>
>
>
> The "standard test" that has this issue actually has 1K fields. MOOSE
> creates its own field-split preconditioner (not based on the PETSc
> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
> duplicates communicators, we should easily reach the limit 2048.
>
> I also want to confirm what extra communicators are introduced in the bad
> commit.
>
>
> To me it looks like there is 1 extra comm created [for MATHYPRE] for each
> PCHYPRE that is created [which also creates one comm for this object].
>
>
> You’re right; however, it was the same before the commit.
> I don’t understand how this specific commit is related with this issue,
> being the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE.
> Actually, the error comes from MPI_Comm_create
>
>
>
>
>
> *frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create +
> 3492frame #6: 0x0001061345d9
> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852,
> participate=, new_comm_ptr=) + 409 at
> gen_redcs_mat.c:531 [opt]frame #7: 0x00010618f8ba
> libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00,
> level=, relax_type=9) + 74 at par_relax.c:4209 [opt]frame
> #8: 0x000106140e93
> libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=,
> A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699
> at par_amg_setup.c:2108 [opt]frame #9: 0x000105ec773c
> libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226
> [opt*
>
> How did you perform the bisection? make clean + make all ? Which version
> of HYPRE are you using?
>

I did more aggressively.

"rm -rf  arch-darwin-c-opt-bisect   "

"./configure  --optionsModule=config.compilerOptions -with-debugging=no
--with-shared-libraries=1 --with-mpi=1 --download-fblaslapack=1
--download-metis=1 --download-parmetis=1 --download-superlu_dist=1
--download-hypre=1 --download-mumps=1 --download-scalapack=1
PETSC_ARCH=arch-darwin-c-opt-bisect"


HYPRE verison:


self.gitcommit = 'v2.11.1-55-g2ea0e43'
self.download  = ['git://https://github.com/LLNL/hypre','
https://github.com/LLNL/hypre/archive/'+self.gitcommit+'.tar.gz']


I do not think this is caused by HYPRE.

Fande,



>
> But you might want to verify [by linking with mpi trace library?]
>
>
> There are some debugging hints at https://lists.mpich.org/
> pipermail/discuss/2012-December/000148.html [wrt mpich] - which I haven't
> checked..
>
> Satish
>
>
>
> Fande,
>
>
>
>
>To debug the hypre/duplication issue in MOOSE I would run in the
> debugger with a break point in MPI_Comm_dup() and see
> who keeps calling it an unreasonable amount of times. (My guess is this is
> a new "feature" in hypre that they will need to fix but only debugging will
> tell)
>
>   Barry
>
>
> On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
>
> We do a MPI_Comm_dup() for objects related to externalpackages.
>
> Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
>
> src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>
> src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>
> src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
>
> CHKERRQ(ierr);
>
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
>
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
>
> src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>
> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
>
> src/ksp/pc/impls/spai/ispai.c:  ierr  =
>
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_
> spai));CHKERRQ(ierr);
>
> src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
>
> );CHKERRQ(ierr);
>
> src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
>
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_
> cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
>
> src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
>
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_
> mumps));CHKERRQ(ierr);
>
> src/mat/impls/aij/mpi/pastix/pastix.c:ierr =
>
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_
> 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Kong, Fande wrote:

> > I thought this trace comes up after applying your patch
> >
> 
> This trace comes from Mac

> > Too many communicators (0/2048 free on this process; ignore_id=0)

> This comes from a Linux (it is a test box), and I do not have access to it.

Then its hard to compare and infer what caused either of these traces. Changing 
only 1 variable at a time is best to narrow down

> > > How did you perform the bisection? make clean + make all ? Which version 
> > > of HYPRE are you using?

Also its good to know this.

Satish


Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
On Tue, Apr 3, 2018 at 9:32 AM, Satish Balay  wrote:

> On Tue, 3 Apr 2018, Stefano Zampini wrote:
>
> >
> > > On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
> > >
> > > On Tue, 3 Apr 2018, Kong, Fande wrote:
> > >
> > >> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. 
> wrote:
> > >>
> > >>>
> > >>>   Each external package definitely needs its own duplicated
> communicator;
> > >>> cannot share between packages.
> > >>>
> > >>>   The only problem with the dups below is if they are in a loop and
> get
> > >>> called many times.
> > >>>
> > >>
> > >>
> > >> The "standard test" that has this issue actually has 1K fields. MOOSE
> > >> creates its own field-split preconditioner (not based on the PETSc
> > >> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
> > >> duplicates communicators, we should easily reach the limit 2048.
> > >>
> > >> I also want to confirm what extra communicators are introduced in the
> bad
> > >> commit.
> > >
> > > To me it looks like there is 1 extra comm created [for MATHYPRE] for
> each PCHYPRE that is created [which also creates one comm for this object].
> > >
> >
> > You’re right; however, it was the same before the commit.
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__
> bitbucket.org_petsc_petsc_commits_49a781f5cee36db85e8d5b951eec29
> f10ac13593=DwIDaQ=54IZrppPQZKX9mLzcGdPfFD1hxrcB_
> _aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmi
> CY=6_ukwovpDrK5BL_94S4ezasw2a3S15SM59R41rSY-Yw=
> r8xHYLKF9LtJHReR6Jmfeei3OfwkQNiGrKXAgeqPVQ8=
> Before the commit - PCHYPRE was not calling MatConvert(MATHYPRE) [this
> results in an additional call to MPI_Comm_dup() for hypre calls] PCHYPRE
> was calling MatHYPRE_IJMatrixCreate() directly [which I presume reusing the
> comm created by the call to MPI_Comm_dup() in PCHYPRE - for hypre calls]
>
>
>
> > I don’t understand how this specific commit is related with this issue,
> being the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE.
> Actually, the error comes from MPI_Comm_create
> >
> > frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create + 3492
> > frame #6: 0x0001061345d9 libpetsc.3.07.dylib`hypre_
> GenerateSubComm(comm=-1006627852, participate=,
> new_comm_ptr=) + 409 at gen_redcs_mat.c:531 [opt]
> > frame #7: 0x00010618f8ba libpetsc.3.07.dylib`hypre_
> GaussElimSetup(amg_data=0x7fe7ff857a00, level=,
> relax_type=9) + 74 at par_relax.c:4209 [opt]
> > frame #8: 0x000106140e93 libpetsc.3.07.dylib`hypre_
> BoomerAMGSetup(amg_vdata=, A=0x7fe80842aff0,
> f=0x7fe80842a980, u=0x7fe80842a510) + 17699 at par_amg_setup.c:2108
> [opt]
> > frame #9: 0x000105ec773c 
> > libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=)
> + 2540 at hypre.c:226 [opt
>
> I thought this trace comes up after applying your patch
>

This trace comes from Mac



>
> -ierr = MatDestroy(>hpmat);CHKERRQ(ierr);
> -ierr = MatConvert(pc->pmat,MATHYPRE,MAT_INITIAL_MATRIX,>
> hpmat);CHKERRQ(ierr);
> +ierr = MatConvert(pc->pmat,MATHYPRE,jac->hpmat ? MAT_REUSE_MATRIX :
> MAT_INITIAL_MATRIX,>hpmat);CHKERRQ(ierr);
>
> The stack before this patch was: [its a different format - so it was
> obtained in a different way than the above method?]
>
> preconditioners/pbp.lots_of_variables: Other MPI error, error stack:
> preconditioners/pbp.lots_of_variables: PMPI_Comm_dup(177)..:
> MPI_Comm_dup(comm=0x8401, new_comm=0x97d1068) failed
> preconditioners/pbp.lots_of_variables: PMPI_Comm_dup(162)
> ..:
> preconditioners/pbp.lots_of_variables: MPIR_Comm_dup_impl(57)
> ..:
> preconditioners/pbp.lots_of_variables: MPIR_Comm_copy(739)...
> ..:
> preconditioners/pbp.lots_of_variables: MPIR_Get_contextid_sparse_group(614):
> Too many communicators (0/2048 free on this process; ignore_id=0)
>

This comes from a Linux (it is a test box), and I do not have access to it.


Fande,



>
> Satish
>
> >
> > How did you perform the bisection? make clean + make all ? Which version
> of HYPRE are you using?
> >
> > > But you might want to verify [by linking with mpi trace library?]
> > >
> > >
> > > There are some debugging hints at https://urldefense.proofpoint.
> com/v2/url?u=https-3A__lists.mpich.org_pipermail_discuss_
> 2012-2DDecember_000148.html=DwIDaQ=54IZrppPQZKX9mLzcGdPfFD1hxrcB_
> _aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmi
> CY=6_ukwovpDrK5BL_94S4ezasw2a3S15SM59R41rSY-Yw&
> s=XUy9n2kmdq262Gwrn_RMXYR-bIyiKViCvp4fRfGCP9w= [wrt mpich] - which I
> haven't checked..
> > >
> > > Satish
> > >
> > >>
> > >>
> > >> Fande,
> > >>
> > >>
> > >>
> > >>>
> > >>>To debug the hypre/duplication issue in MOOSE I would run in the
> > >>> debugger with a break point in MPI_Comm_dup() and see
> > >>> who keeps calling it an unreasonable amount of times. (My guess is
> this is
> > >>> a new "feature" in hypre that they will need to fix but only
> debugging will
> > >>> tell)

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Stefano Zampini wrote:

> 
> > On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
> > 
> > On Tue, 3 Apr 2018, Kong, Fande wrote:
> > 
> >> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F.  wrote:
> >> 
> >>> 
> >>>   Each external package definitely needs its own duplicated communicator;
> >>> cannot share between packages.
> >>> 
> >>>   The only problem with the dups below is if they are in a loop and get
> >>> called many times.
> >>> 
> >> 
> >> 
> >> The "standard test" that has this issue actually has 1K fields. MOOSE
> >> creates its own field-split preconditioner (not based on the PETSc
> >> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
> >> duplicates communicators, we should easily reach the limit 2048.
> >> 
> >> I also want to confirm what extra communicators are introduced in the bad
> >> commit.
> > 
> > To me it looks like there is 1 extra comm created [for MATHYPRE] for each 
> > PCHYPRE that is created [which also creates one comm for this object].
> > 
> 
> You’re right; however, it was the same before the commit.

https://bitbucket.org/petsc/petsc/commits/49a781f5cee36db85e8d5b951eec29f10ac13593
Before the commit - PCHYPRE was not calling MatConvert(MATHYPRE) [this results 
in an additional call to MPI_Comm_dup() for hypre calls] PCHYPRE was calling 
MatHYPRE_IJMatrixCreate() directly [which I presume reusing the comm created by 
the call to MPI_Comm_dup() in PCHYPRE - for hypre calls]



> I don’t understand how this specific commit is related with this issue, being 
> the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE. 
> Actually, the error comes from MPI_Comm_create
> 
> frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create + 3492
> frame #6: 0x0001061345d9 
> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
> participate=, new_comm_ptr=) + 409 at 
> gen_redcs_mat.c:531 [opt]
> frame #7: 0x00010618f8ba 
> libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00, 
> level=, relax_type=9) + 74 at par_relax.c:4209 [opt]
> frame #8: 0x000106140e93 
> libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=, 
> A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699 at 
> par_amg_setup.c:2108 [opt]
> frame #9: 0x000105ec773c 
> libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226 [opt

I thought this trace comes up after applying your patch

-ierr = MatDestroy(>hpmat);CHKERRQ(ierr);
-ierr = 
MatConvert(pc->pmat,MATHYPRE,MAT_INITIAL_MATRIX,>hpmat);CHKERRQ(ierr);
+ierr = MatConvert(pc->pmat,MATHYPRE,jac->hpmat ? MAT_REUSE_MATRIX : 
MAT_INITIAL_MATRIX,>hpmat);CHKERRQ(ierr);

The stack before this patch was: [its a different format - so it was obtained 
in a different way than the above method?]

preconditioners/pbp.lots_of_variables: Other MPI error, error stack:
preconditioners/pbp.lots_of_variables: PMPI_Comm_dup(177)..: 
MPI_Comm_dup(comm=0x8401, new_comm=0x97d1068) failed
preconditioners/pbp.lots_of_variables: PMPI_Comm_dup(162)..:
preconditioners/pbp.lots_of_variables: MPIR_Comm_dup_impl(57)..:
preconditioners/pbp.lots_of_variables: MPIR_Comm_copy(739).:
preconditioners/pbp.lots_of_variables: MPIR_Get_contextid_sparse_group(614): 
Too many communicators (0/2048 free on this process; ignore_id=0)

Satish

> 
> How did you perform the bisection? make clean + make all ? Which version of 
> HYPRE are you using?
> 
> > But you might want to verify [by linking with mpi trace library?]
> > 
> > 
> > There are some debugging hints at 
> > https://lists.mpich.org/pipermail/discuss/2012-December/000148.html [wrt 
> > mpich] - which I haven't checked..
> > 
> > Satish
> > 
> >> 
> >> 
> >> Fande,
> >> 
> >> 
> >> 
> >>> 
> >>>To debug the hypre/duplication issue in MOOSE I would run in the
> >>> debugger with a break point in MPI_Comm_dup() and see
> >>> who keeps calling it an unreasonable amount of times. (My guess is this is
> >>> a new "feature" in hypre that they will need to fix but only debugging 
> >>> will
> >>> tell)
> >>> 
> >>>   Barry
> >>> 
> >>> 
>  On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
>  
>  We do a MPI_Comm_dup() for objects related to externalpackages.
>  
>  Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
>  using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
>  is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
>  
>  src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> >>> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>  src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> >>> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>  src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
> >>> CHKERRQ(ierr);
>  src/ksp/pc/impls/hypre/hypre.c:  

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Stefano Zampini

> On Apr 3, 2018, at 4:58 PM, Satish Balay  wrote:
> 
> On Tue, 3 Apr 2018, Kong, Fande wrote:
> 
>> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F.  wrote:
>> 
>>> 
>>>   Each external package definitely needs its own duplicated communicator;
>>> cannot share between packages.
>>> 
>>>   The only problem with the dups below is if they are in a loop and get
>>> called many times.
>>> 
>> 
>> 
>> The "standard test" that has this issue actually has 1K fields. MOOSE
>> creates its own field-split preconditioner (not based on the PETSc
>> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
>> duplicates communicators, we should easily reach the limit 2048.
>> 
>> I also want to confirm what extra communicators are introduced in the bad
>> commit.
> 
> To me it looks like there is 1 extra comm created [for MATHYPRE] for each 
> PCHYPRE that is created [which also creates one comm for this object].
> 

You’re right; however, it was the same before the commit.
I don’t understand how this specific commit is related with this issue, being 
the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE. Actually, 
the error comes from MPI_Comm_create

frame #5: 0x0001068defd4 libmpi.12.dylib`MPI_Comm_create + 3492
frame #6: 0x0001061345d9 
libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
participate=, new_comm_ptr=) + 409 at 
gen_redcs_mat.c:531 [opt]
frame #7: 0x00010618f8ba 
libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00, 
level=, relax_type=9) + 74 at par_relax.c:4209 [opt]
frame #8: 0x000106140e93 
libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=, 
A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699 at 
par_amg_setup.c:2108 [opt]
frame #9: 0x000105ec773c 
libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226 [opt

How did you perform the bisection? make clean + make all ? Which version of 
HYPRE are you using?

> But you might want to verify [by linking with mpi trace library?]
> 
> 
> There are some debugging hints at 
> https://lists.mpich.org/pipermail/discuss/2012-December/000148.html [wrt 
> mpich] - which I haven't checked..
> 
> Satish
> 
>> 
>> 
>> Fande,
>> 
>> 
>> 
>>> 
>>>To debug the hypre/duplication issue in MOOSE I would run in the
>>> debugger with a break point in MPI_Comm_dup() and see
>>> who keeps calling it an unreasonable amount of times. (My guess is this is
>>> a new "feature" in hypre that they will need to fix but only debugging will
>>> tell)
>>> 
>>>   Barry
>>> 
>>> 
 On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
 
 We do a MPI_Comm_dup() for objects related to externalpackages.
 
 Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
 using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
 is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
 
 src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>>> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
 src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>>> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
 src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
>>> CHKERRQ(ierr);
 src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>>> PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
 src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>>> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
 src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>>> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
 src/ksp/pc/impls/spai/ispai.c:  ierr  =
>>> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_
>>> spai));CHKERRQ(ierr);
 src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
>>> );CHKERRQ(ierr);
 src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
>>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_
>>> cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
 src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
>>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_
>>> mumps));CHKERRQ(ierr);
 src/mat/impls/aij/mpi/pastix/pastix.c:ierr =
>>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_
>>> comm));CHKERRQ(ierr);
 src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr =
>>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_
>>> superlu));CHKERRQ(ierr);
 src/mat/impls/hypre/mhypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
>>> PetscObject)B),>comm);CHKERRQ(ierr);
 src/mat/partition/impls/pmetis/pmetis.c:ierr   =
>>> MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
 src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a
>>> MPI_Comm_dup() of each of these (duplicates of duplicates return the same
>>> communictor)
 src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
On Tue, 3 Apr 2018, Kong, Fande wrote:

> On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F.  wrote:
> 
> >
> >Each external package definitely needs its own duplicated communicator;
> > cannot share between packages.
> >
> >The only problem with the dups below is if they are in a loop and get
> > called many times.
> >
> 
> 
> The "standard test" that has this issue actually has 1K fields. MOOSE
> creates its own field-split preconditioner (not based on the PETSc
> fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
> duplicates communicators, we should easily reach the limit 2048.
> 
> I also want to confirm what extra communicators are introduced in the bad
> commit.

To me it looks like there is 1 extra comm created [for MATHYPRE] for each 
PCHYPRE that is created [which also creates one comm for this object].

But you might want to verify [by linking with mpi trace library?]


There are some debugging hints at 
https://lists.mpich.org/pipermail/discuss/2012-December/000148.html [wrt mpich] 
- which I haven't checked..

Satish

> 
> 
> Fande,
> 
> 
> 
> >
> > To debug the hypre/duplication issue in MOOSE I would run in the
> > debugger with a break point in MPI_Comm_dup() and see
> > who keeps calling it an unreasonable amount of times. (My guess is this is
> > a new "feature" in hypre that they will need to fix but only debugging will
> > tell)
> >
> >Barry
> >
> >
> > > On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
> > >
> > > We do a MPI_Comm_dup() for objects related to externalpackages.
> > >
> > > Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> > > using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> > > is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
> > >
> > > src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> > PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > > src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> > PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > > src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
> > CHKERRQ(ierr);
> > > src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> > PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
> > > src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> > PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > > src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> > PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > > src/ksp/pc/impls/spai/ispai.c:  ierr  =
> > MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_
> > spai));CHKERRQ(ierr);
> > > src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
> > );CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_
> > cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_
> > mumps));CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/pastix/pastix.c:ierr =
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_
> > comm));CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr =
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_
> > superlu));CHKERRQ(ierr);
> > > src/mat/impls/hypre/mhypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> > PetscObject)B),>comm);CHKERRQ(ierr);
> > > src/mat/partition/impls/pmetis/pmetis.c:ierr   =
> > MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
> > > src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a
> > MPI_Comm_dup() of each of these (duplicates of duplicates return the same
> > communictor)
> > > src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
> > > src/sys/objects/pinit.c:  ierr = MPI_Comm_dup(MPI_COMM_WORLD,&
> > local_comm);CHKERRQ(ierr);
> > > src/sys/objects/pinit.c:  ierr = MPI_Comm_dup(MPI_COMM_WORLD,&
> > local_comm);CHKERRQ(ierr);
> > > src/sys/objects/tagm.c:  ierr = MPI_Comm_dup(comm_in,comm_out)
> > ;CHKERRQ(ierr);
> > > src/sys/utils/mpiu.c:  ierr = MPI_Comm_dup(comm,_comm)
> > ;CHKERRQ(ierr);
> > > src/ts/impls/implicit/sundials/sundials.c:  ierr =
> > MPI_Comm_dup(PetscObjectComm((PetscObject)ts),&(cvode->comm_
> > sundials));CHKERRQ(ierr);
> > >
> > > Perhaps we need a PetscCommDuplicateExternalPkg() to somehow avoid
> > these MPI_Comm_dup() calls?
> > >
> > > Satish
> > >
> > > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> > >
> > >>
> > >>  Are we sure this is a PETSc comm issue and not a hypre comm
> > duplication issue
> > >>
> > >> frame #6: 0x0001061345d9 libpetsc.3.07.dylib`hypre_
> > GenerateSubComm(comm=-1006627852, participate=,
> > new_comm_ptr=) + 409 at gen_redcs_mat.c:531 [opt]
> > >>
> > >> Looks like hypre is needed to generate subcomms, perhaps it generates
> > too many?
> > >>
> > >>   Barry
> > >>
> > >>
> > >>> 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Jed Brown
The PETSc model is that the "outer" communicator (passed by the caller)
is dup'd to create an "inner" communicator which as attached (using MPI
attributes) to the outer communicator.  In the future, PETSc will find
the inner communicator and use that, instead of dup'ing again.

Derek Gaston  writes:

> I like the idea that Hypre (as a package) would get _one_ comm (for all the
> solvers/matrices created) that was duped from the one given to PETSc in
> Vec/MatCreate().
>
> Seems like the tricky part would be figuring out _which_ comm that is based
> on the incoming comm.  For instance - we would definitely have the case
> where we are doing a Hypre solve on effectively COMM_WORLD… and then many
> Hypre solves on sub-communicators (and even Hypre solves on
> sub-communicators of those sub-communicators).  The system for getting
> “the” Hypre Comm would have to match up the incoming Comm in the
> Vec/MatCreate() call and find the correct Hypre comm to use.
>
> Derek
>
>
>
> On Tue, Apr 3, 2018 at 7:46 AM Satish Balay  wrote:
>
>> Fande claimed 49a781f5cee36db85e8d5b951eec29f10ac13593 made a difference.
>> [so assuming same hypre version was used before and after this commit - for
>> this bisection]
>>
>> So the extra MPI_Comm_dup() calls due to MATHYPRE must be pushing the
>> total communicators over the limit.
>>
>> And wrt debugging - perhaps we need to  check MPI_Comm_free() aswell?
>> Presumably freed communicators can get reused so we have to look for
>> outstanding/unfreed communicators?
>>
>> Per message below - MPICH[?] provides a max of 2048 communicators. And
>> there is some discussion of this issue at:
>> https://lists.mpich.org/pipermail/discuss/2012-December/000148.html
>>
>> And wrt 'sharing' - I was thining in terms of: Can one use MPI_COMM_WORLD
>> with all hypre objects we create? If so - we could somehow attach one more
>> inner-comm - that could be obtained and reused with multiple hypre objects
>> [that got created off the same petsc_comm?]
>>
>> Satish
>>
>> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>>
>> >
>> >Each external package definitely needs its own duplicated
>> communicator; cannot share between packages.
>> >
>> >The only problem with the dups below is if they are in a loop and get
>> called many times.
>> >
>> > To debug the hypre/duplication issue in MOOSE I would run in the
>> debugger with a break point in MPI_Comm_dup() and see
>> > who keeps calling it an unreasonable amount of times. (My guess is this
>> is a new "feature" in hypre that they will need to fix but only debugging
>> will tell)
>> >
>> >Barry
>> >
>> >
>> > > On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
>> > >
>> > > We do a MPI_Comm_dup() for objects related to externalpackages.
>> > >
>> > > Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
>> > > using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
>> > > is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
>> > >
>> > > src/dm/impls/da/hypre/mhyp.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>> > > src/dm/impls/da/hypre/mhyp.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
>> > > src/dm/impls/swarm/data_ex.c:  ierr =
>> MPI_Comm_dup(comm,>comm);CHKERRQ(ierr);
>> > > src/ksp/pc/impls/hypre/hypre.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
>> > > src/ksp/pc/impls/hypre/hypre.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
>> > > src/ksp/pc/impls/hypre/hypre.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
>> > > src/ksp/pc/impls/spai/ispai.c:  ierr  =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_spai));CHKERRQ(ierr);
>> > > src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
>> );CHKERRQ(ierr);
>> > > src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
>> > > src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_mumps));CHKERRQ(ierr);
>> > > src/mat/impls/aij/mpi/pastix/pastix.c:ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_comm));CHKERRQ(ierr);
>> > > src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_superlu));CHKERRQ(ierr);
>> > > src/mat/impls/hypre/mhypre.c:  ierr =
>> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
>> > > src/mat/partition/impls/pmetis/pmetis.c:ierr   =
>> MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
>> > > src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a
>> MPI_Comm_dup() of each of these (duplicates of duplicates return the same
>> communictor)
>> > > 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Kong, Fande
On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F.  wrote:

>
>Each external package definitely needs its own duplicated communicator;
> cannot share between packages.
>
>The only problem with the dups below is if they are in a loop and get
> called many times.
>


The "standard test" that has this issue actually has 1K fields. MOOSE
creates its own field-split preconditioner (not based on the PETSc
fieldsplit), and each filed is associated with one PC HYPRE.  If PETSc
duplicates communicators, we should easily reach the limit 2048.

I also want to confirm what extra communicators are introduced in the bad
commit.


Fande,



>
> To debug the hypre/duplication issue in MOOSE I would run in the
> debugger with a break point in MPI_Comm_dup() and see
> who keeps calling it an unreasonable amount of times. (My guess is this is
> a new "feature" in hypre that they will need to fix but only debugging will
> tell)
>
>Barry
>
>
> > On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
> >
> > We do a MPI_Comm_dup() for objects related to externalpackages.
> >
> > Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> > using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> > is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
> >
> > src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);
> CHKERRQ(ierr);
> > src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
> > src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > src/ksp/pc/impls/spai/ispai.c:  ierr  =
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_
> spai));CHKERRQ(ierr);
> > src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
> );CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_
> cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_
> mumps));CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/pastix/pastix.c:ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_
> comm));CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_
> superlu));CHKERRQ(ierr);
> > src/mat/impls/hypre/mhypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((
> PetscObject)B),>comm);CHKERRQ(ierr);
> > src/mat/partition/impls/pmetis/pmetis.c:ierr   =
> MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
> > src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a
> MPI_Comm_dup() of each of these (duplicates of duplicates return the same
> communictor)
> > src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
> > src/sys/objects/pinit.c:  ierr = MPI_Comm_dup(MPI_COMM_WORLD,&
> local_comm);CHKERRQ(ierr);
> > src/sys/objects/pinit.c:  ierr = MPI_Comm_dup(MPI_COMM_WORLD,&
> local_comm);CHKERRQ(ierr);
> > src/sys/objects/tagm.c:  ierr = MPI_Comm_dup(comm_in,comm_out)
> ;CHKERRQ(ierr);
> > src/sys/utils/mpiu.c:  ierr = MPI_Comm_dup(comm,_comm)
> ;CHKERRQ(ierr);
> > src/ts/impls/implicit/sundials/sundials.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)ts),&(cvode->comm_
> sundials));CHKERRQ(ierr);
> >
> > Perhaps we need a PetscCommDuplicateExternalPkg() to somehow avoid
> these MPI_Comm_dup() calls?
> >
> > Satish
> >
> > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> >
> >>
> >>  Are we sure this is a PETSc comm issue and not a hypre comm
> duplication issue
> >>
> >> frame #6: 0x0001061345d9 libpetsc.3.07.dylib`hypre_
> GenerateSubComm(comm=-1006627852, participate=,
> new_comm_ptr=) + 409 at gen_redcs_mat.c:531 [opt]
> >>
> >> Looks like hypre is needed to generate subcomms, perhaps it generates
> too many?
> >>
> >>   Barry
> >>
> >>
> >>> On Apr 2, 2018, at 7:07 PM, Derek Gaston  wrote:
> >>>
> >>> I’m working with Fande on this and I would like to add a bit more.
> There are many circumstances where we aren’t working on COMM_WORLD at all
> (e.g. working on a sub-communicator) but PETSc was initialized using
> MPI_COMM_WORLD (think multi-level solves)… and we need to create
> arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We
> definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering
> duplication.
> >>>
> >>> Can you explain why PETSc needs to duplicate the communicator so much?
> >>>
> >>> Thanks for your 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Derek Gaston
I like the idea that Hypre (as a package) would get _one_ comm (for all the
solvers/matrices created) that was duped from the one given to PETSc in
Vec/MatCreate().

Seems like the tricky part would be figuring out _which_ comm that is based
on the incoming comm.  For instance - we would definitely have the case
where we are doing a Hypre solve on effectively COMM_WORLD… and then many
Hypre solves on sub-communicators (and even Hypre solves on
sub-communicators of those sub-communicators).  The system for getting
“the” Hypre Comm would have to match up the incoming Comm in the
Vec/MatCreate() call and find the correct Hypre comm to use.

Derek



On Tue, Apr 3, 2018 at 7:46 AM Satish Balay  wrote:

> Fande claimed 49a781f5cee36db85e8d5b951eec29f10ac13593 made a difference.
> [so assuming same hypre version was used before and after this commit - for
> this bisection]
>
> So the extra MPI_Comm_dup() calls due to MATHYPRE must be pushing the
> total communicators over the limit.
>
> And wrt debugging - perhaps we need to  check MPI_Comm_free() aswell?
> Presumably freed communicators can get reused so we have to look for
> outstanding/unfreed communicators?
>
> Per message below - MPICH[?] provides a max of 2048 communicators. And
> there is some discussion of this issue at:
> https://lists.mpich.org/pipermail/discuss/2012-December/000148.html
>
> And wrt 'sharing' - I was thining in terms of: Can one use MPI_COMM_WORLD
> with all hypre objects we create? If so - we could somehow attach one more
> inner-comm - that could be obtained and reused with multiple hypre objects
> [that got created off the same petsc_comm?]
>
> Satish
>
> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
>
> >
> >Each external package definitely needs its own duplicated
> communicator; cannot share between packages.
> >
> >The only problem with the dups below is if they are in a loop and get
> called many times.
> >
> > To debug the hypre/duplication issue in MOOSE I would run in the
> debugger with a break point in MPI_Comm_dup() and see
> > who keeps calling it an unreasonable amount of times. (My guess is this
> is a new "feature" in hypre that they will need to fix but only debugging
> will tell)
> >
> >Barry
> >
> >
> > > On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
> > >
> > > We do a MPI_Comm_dup() for objects related to externalpackages.
> > >
> > > Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> > > using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> > > is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
> > >
> > > src/dm/impls/da/hypre/mhyp.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > > src/dm/impls/da/hypre/mhyp.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > > src/dm/impls/swarm/data_ex.c:  ierr =
> MPI_Comm_dup(comm,>comm);CHKERRQ(ierr);
> > > src/ksp/pc/impls/hypre/hypre.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
> > > src/ksp/pc/impls/hypre/hypre.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > > src/ksp/pc/impls/hypre/hypre.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > > src/ksp/pc/impls/spai/ispai.c:  ierr  =
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_spai));CHKERRQ(ierr);
> > > src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD,
> );CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/mumps/mumps.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_mumps));CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/pastix/pastix.c:ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_comm));CHKERRQ(ierr);
> > > src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_superlu));CHKERRQ(ierr);
> > > src/mat/impls/hypre/mhypre.c:  ierr =
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> > > src/mat/partition/impls/pmetis/pmetis.c:ierr   =
> MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
> > > src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a
> MPI_Comm_dup() of each of these (duplicates of duplicates return the same
> communictor)
> > > src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
> > > src/sys/objects/pinit.c:  ierr =
> MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
> > > src/sys/objects/pinit.c:  ierr =
> MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
> > > src/sys/objects/tagm.c:  ierr =
> MPI_Comm_dup(comm_in,comm_out);CHKERRQ(ierr);
> > > src/sys/utils/mpiu.c:  ierr =
> MPI_Comm_dup(comm,_comm);CHKERRQ(ierr);
> > > 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Satish Balay
Fande claimed 49a781f5cee36db85e8d5b951eec29f10ac13593 made a difference. [so 
assuming same hypre version was used before and after this commit - for this 
bisection]

So the extra MPI_Comm_dup() calls due to MATHYPRE must be pushing the total 
communicators over the limit.

And wrt debugging - perhaps we need to  check MPI_Comm_free() aswell? 
Presumably freed communicators can get reused so we have to look for 
outstanding/unfreed communicators?

Per message below - MPICH[?] provides a max of 2048 communicators. And there is 
some discussion of this issue at: 
https://lists.mpich.org/pipermail/discuss/2012-December/000148.html

And wrt 'sharing' - I was thining in terms of: Can one use MPI_COMM_WORLD with 
all hypre objects we create? If so - we could somehow attach one more 
inner-comm - that could be obtained and reused with multiple hypre objects 
[that got created off the same petsc_comm?]

Satish

On Tue, 3 Apr 2018, Smith, Barry F. wrote:

> 
>Each external package definitely needs its own duplicated communicator; 
> cannot share between packages.
> 
>The only problem with the dups below is if they are in a loop and get 
> called many times.
> 
> To debug the hypre/duplication issue in MOOSE I would run in the debugger 
> with a break point in MPI_Comm_dup() and see
> who keeps calling it an unreasonable amount of times. (My guess is this is a 
> new "feature" in hypre that they will need to fix but only debugging will 
> tell)
> 
>Barry
> 
> 
> > On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
> > 
> > We do a MPI_Comm_dup() for objects related to externalpackages.
> > 
> > Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> > using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> > is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
> > 
> > src/dm/impls/da/hypre/mhyp.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > src/dm/impls/da/hypre/mhyp.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> > src/dm/impls/swarm/data_ex.c:  ierr = 
> > MPI_Comm_dup(comm,>comm);CHKERRQ(ierr);
> > src/ksp/pc/impls/hypre/hypre.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
> > src/ksp/pc/impls/hypre/hypre.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > src/ksp/pc/impls/hypre/hypre.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> > src/ksp/pc/impls/spai/ispai.c:  ierr  = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_spai));CHKERRQ(ierr);
> > src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD, 
> > );CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/mumps/mumps.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_mumps));CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/pastix/pastix.c:ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_comm));CHKERRQ(ierr);
> > src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_superlu));CHKERRQ(ierr);
> > src/mat/impls/hypre/mhypre.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> > src/mat/partition/impls/pmetis/pmetis.c:ierr   = 
> > MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
> > src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a 
> > MPI_Comm_dup() of each of these (duplicates of duplicates return the same 
> > communictor)
> > src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
> > src/sys/objects/pinit.c:  ierr = 
> > MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
> > src/sys/objects/pinit.c:  ierr = 
> > MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
> > src/sys/objects/tagm.c:  ierr = 
> > MPI_Comm_dup(comm_in,comm_out);CHKERRQ(ierr);
> > src/sys/utils/mpiu.c:  ierr = MPI_Comm_dup(comm,_comm);CHKERRQ(ierr);
> > src/ts/impls/implicit/sundials/sundials.c:  ierr = 
> > MPI_Comm_dup(PetscObjectComm((PetscObject)ts),&(cvode->comm_sundials));CHKERRQ(ierr);
> > 
> > Perhaps we need a PetscCommDuplicateExternalPkg() to somehow avoid these 
> > MPI_Comm_dup() calls?
> > 
> > Satish
> > 
> > On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> > 
> >> 
> >>  Are we sure this is a PETSc comm issue and not a hypre comm duplication 
> >> issue
> >> 
> >> frame #6: 0x0001061345d9 
> >> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
> >> participate=, new_comm_ptr=) + 409 at 
> >> gen_redcs_mat.c:531 [opt]
> >> 
> >> Looks like hypre is needed to generate subcomms, perhaps it generates too 
> >> many?
> >> 
> >>   Barry
> >> 
> >> 
> >>> On Apr 2, 2018, at 7:07 PM, Derek Gaston 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-03 Thread Smith, Barry F.

   Each external package definitely needs its own duplicated communicator; 
cannot share between packages.

   The only problem with the dups below is if they are in a loop and get called 
many times.

To debug the hypre/duplication issue in MOOSE I would run in the debugger 
with a break point in MPI_Comm_dup() and see
who keeps calling it an unreasonable amount of times. (My guess is this is a 
new "feature" in hypre that they will need to fix but only debugging will tell)

   Barry


> On Apr 2, 2018, at 7:44 PM, Balay, Satish  wrote:
> 
> We do a MPI_Comm_dup() for objects related to externalpackages.
> 
> Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
> using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
> is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7
> 
> src/dm/impls/da/hypre/mhyp.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> src/dm/impls/da/hypre/mhyp.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
> src/dm/impls/swarm/data_ex.c:  ierr = 
> MPI_Comm_dup(comm,>comm);CHKERRQ(ierr);
> src/ksp/pc/impls/hypre/hypre.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
> src/ksp/pc/impls/hypre/hypre.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> src/ksp/pc/impls/hypre/hypre.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
> src/ksp/pc/impls/spai/ispai.c:  ierr  = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_spai));CHKERRQ(ierr);
> src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD, 
> );CHKERRQ(ierr);
> src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
> src/mat/impls/aij/mpi/mumps/mumps.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_mumps));CHKERRQ(ierr);
> src/mat/impls/aij/mpi/pastix/pastix.c:ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_comm));CHKERRQ(ierr);
> src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_superlu));CHKERRQ(ierr);
> src/mat/impls/hypre/mhypre.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
> src/mat/partition/impls/pmetis/pmetis.c:ierr   = 
> MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
> src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a MPI_Comm_dup() 
> of each of these (duplicates of duplicates return the same communictor)
> src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
> src/sys/objects/pinit.c:  ierr = 
> MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
> src/sys/objects/pinit.c:  ierr = 
> MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
> src/sys/objects/tagm.c:  ierr = 
> MPI_Comm_dup(comm_in,comm_out);CHKERRQ(ierr);
> src/sys/utils/mpiu.c:  ierr = MPI_Comm_dup(comm,_comm);CHKERRQ(ierr);
> src/ts/impls/implicit/sundials/sundials.c:  ierr = 
> MPI_Comm_dup(PetscObjectComm((PetscObject)ts),&(cvode->comm_sundials));CHKERRQ(ierr);
> 
> Perhaps we need a PetscCommDuplicateExternalPkg() to somehow avoid these 
> MPI_Comm_dup() calls?
> 
> Satish
> 
> On Tue, 3 Apr 2018, Smith, Barry F. wrote:
> 
>> 
>>  Are we sure this is a PETSc comm issue and not a hypre comm duplication 
>> issue
>> 
>> frame #6: 0x0001061345d9 
>> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
>> participate=, new_comm_ptr=) + 409 at 
>> gen_redcs_mat.c:531 [opt]
>> 
>> Looks like hypre is needed to generate subcomms, perhaps it generates too 
>> many?
>> 
>>   Barry
>> 
>> 
>>> On Apr 2, 2018, at 7:07 PM, Derek Gaston  wrote:
>>> 
>>> I’m working with Fande on this and I would like to add a bit more.  There 
>>> are many circumstances where we aren’t working on COMM_WORLD at all (e.g. 
>>> working on a sub-communicator) but PETSc was initialized using 
>>> MPI_COMM_WORLD (think multi-level solves)… and we need to create 
>>> arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We 
>>> definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering 
>>> duplication.
>>> 
>>> Can you explain why PETSc needs to duplicate the communicator so much?
>>> 
>>> Thanks for your help in tracking this down!
>>> 
>>> Derek
>>> 
>>> On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande  wrote:
>>> Why we do not use user-level MPI communicators directly? What are potential 
>>> risks here? 
>>> 
>>> 
>>> Fande,
>>> 
>>> On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay  wrote:
>>> PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to 
>>> MPI_Comm_dup() - thus potentially avoiding such errors
>>> 
>>> 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
We do a MPI_Comm_dup() for objects related to externalpackages.

Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7

src/dm/impls/da/hypre/mhyp.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
src/dm/impls/da/hypre/mhyp.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,>comm);CHKERRQ(ierr);
src/ksp/pc/impls/hypre/hypre.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
src/ksp/pc/impls/hypre/hypre.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
src/ksp/pc/impls/hypre/hypre.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
src/ksp/pc/impls/spai/ispai.c:  ierr  = 
MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_spai));CHKERRQ(ierr);
src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD, 
);CHKERRQ(ierr);
src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
src/mat/impls/aij/mpi/mumps/mumps.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_mumps));CHKERRQ(ierr);
src/mat/impls/aij/mpi/pastix/pastix.c:ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_comm));CHKERRQ(ierr);
src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_superlu));CHKERRQ(ierr);
src/mat/impls/hypre/mhypre.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)B),>comm);CHKERRQ(ierr);
src/mat/partition/impls/pmetis/pmetis.c:ierr   = 
MPI_Comm_dup(pcomm,);CHKERRQ(ierr);
src/sys/mpiuni/mpi.c:MPI_COMM_SELF, MPI_COMM_WORLD, and a MPI_Comm_dup() of 
each of these (duplicates of duplicates return the same communictor)
src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
src/sys/objects/pinit.c:  ierr = 
MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
src/sys/objects/pinit.c:  ierr = 
MPI_Comm_dup(MPI_COMM_WORLD,_comm);CHKERRQ(ierr);
src/sys/objects/tagm.c:  ierr = 
MPI_Comm_dup(comm_in,comm_out);CHKERRQ(ierr);
src/sys/utils/mpiu.c:  ierr = MPI_Comm_dup(comm,_comm);CHKERRQ(ierr);
src/ts/impls/implicit/sundials/sundials.c:  ierr = 
MPI_Comm_dup(PetscObjectComm((PetscObject)ts),&(cvode->comm_sundials));CHKERRQ(ierr);

Perhaps we need a PetscCommDuplicateExternalPkg() to somehow avoid these 
MPI_Comm_dup() calls?

Satish

On Tue, 3 Apr 2018, Smith, Barry F. wrote:

> 
>   Are we sure this is a PETSc comm issue and not a hypre comm duplication 
> issue
> 
>  frame #6: 0x0001061345d9 
> libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
> participate=, new_comm_ptr=) + 409 at 
> gen_redcs_mat.c:531 [opt]
> 
> Looks like hypre is needed to generate subcomms, perhaps it generates too 
> many?
> 
>Barry
> 
> 
> > On Apr 2, 2018, at 7:07 PM, Derek Gaston  wrote:
> > 
> > I’m working with Fande on this and I would like to add a bit more.  There 
> > are many circumstances where we aren’t working on COMM_WORLD at all (e.g. 
> > working on a sub-communicator) but PETSc was initialized using 
> > MPI_COMM_WORLD (think multi-level solves)… and we need to create 
> > arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We 
> > definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering 
> > duplication.
> > 
> > Can you explain why PETSc needs to duplicate the communicator so much?
> > 
> > Thanks for your help in tracking this down!
> > 
> > Derek
> > 
> > On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande  wrote:
> > Why we do not use user-level MPI communicators directly? What are potential 
> > risks here? 
> > 
> > 
> > Fande,
> > 
> > On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay  wrote:
> > PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to 
> > MPI_Comm_dup() - thus potentially avoiding such errors
> > 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html=DwIBAg=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU=
> > 
> > 
> > Satish
> > 
> > On Mon, 2 Apr 2018, Kong, Fande wrote:
> > 
> > > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:
> > >
> > > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
> > > >
> > > > If so - you could try changing to PETSC_COMM_WORLD
> > > >
> > >
> > >
> > > I do not think we are using PETSC_COMM_WORLD when creating PETSc objects.
> > > Why we can not use MPI_COMM_WORLD?
> > >
> > >
> > > 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
Sorry - I was going down the wrong path..

Sure MPI_COMM_WORLD vs PETSC_COMM_WORLD shouldn't make a difference
[except for a couple of extra mpi_comm_dup() calls.]

Satish

On Tue, 3 Apr 2018, Derek Gaston wrote:

> I’m working with Fande on this and I would like to add a bit more.  There
> are many circumstances where we aren’t working on COMM_WORLD at all (e.g.
> working on a sub-communicator) but PETSc was initialized using
> MPI_COMM_WORLD (think multi-level solves)… and we need to create
> arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We
> definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering
> duplication.
> 
> Can you explain why PETSc needs to duplicate the communicator so much?
> 
> Thanks for your help in tracking this down!
> 
> Derek
> 
> On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande  wrote:
> 
> > Why we do not use user-level MPI communicators directly? What are
> > potential risks here?
> >
> >
> > Fande,
> >
> > On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay  wrote:
> >
> >> PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to
> >> MPI_Comm_dup() - thus potentially avoiding such errors
> >>
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html=DwIBAg=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU=
> >
> >
> >>
> >> Satish
> >>
> >> On Mon, 2 Apr 2018, Kong, Fande wrote:
> >>
> >> > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:
> >> >
> >> > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
> >> > >
> >> > > If so - you could try changing to PETSC_COMM_WORLD
> >> > >
> >> >
> >> >
> >> > I do not think we are using PETSC_COMM_WORLD when creating PETSc
> >> objects.
> >> > Why we can not use MPI_COMM_WORLD?
> >> >
> >> >
> >> > Fande,
> >> >
> >> >
> >> > >
> >> > > Satish
> >> > >
> >> > >
> >> > > On Mon, 2 Apr 2018, Kong, Fande wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> >> > > > applications. I have a error message for a standard test:
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > *preconditioners/pbp.lots_of_variables: MPI had an
> >> > > > errorpreconditioners/pbp.lots_of_variables:
> >> > > > 
> >> > > preconditioners/pbp.lots_of_variables:
> >> > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> >> > > > PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> >> > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> >> > > > PMPI_Comm_dup(162)..:
> >> > > > preconditioners/pbp.lots_of_variables:
> >> > > > MPIR_Comm_dup_impl(57)..:
> >> > > > preconditioners/pbp.lots_of_variables:
> >> > > > MPIR_Comm_copy(739).:
> >> > > > preconditioners/pbp.lots_of_variables:
> >> > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
> >> > > free
> >> > > > on this process; ignore_id=0)*
> >> > > >
> >> > > >
> >> > > > I did "git bisect', and the following commit introduces this issue:
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano
> >> Zampini
> >> > > > >Date:   Sat
> >> Nov 5
> >> > > > 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> >> > > > hpmat already stores two HYPRE vectors*
> >> > > >
> >> > > > Before I debug line-by-line, anyone has a clue on this?
> >> > > >
> >> > > >
> >> > > > Fande,
> >> > > >
> >> > >
> >> > >
> >> >
> >>
> >>
> 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Smith, Barry F.

  Are we sure this is a PETSc comm issue and not a hypre comm duplication issue

 frame #6: 0x0001061345d9 
libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, 
participate=, new_comm_ptr=) + 409 at 
gen_redcs_mat.c:531 [opt]

Looks like hypre is needed to generate subcomms, perhaps it generates too many?

   Barry


> On Apr 2, 2018, at 7:07 PM, Derek Gaston  wrote:
> 
> I’m working with Fande on this and I would like to add a bit more.  There are 
> many circumstances where we aren’t working on COMM_WORLD at all (e.g. working 
> on a sub-communicator) but PETSc was initialized using MPI_COMM_WORLD (think 
> multi-level solves)… and we need to create arbitrarily many PETSc 
> vecs/mats/solvers/preconditioners and solve.  We definitely can’t rely on 
> using PETSC_COMM_WORLD to avoid triggering duplication.
> 
> Can you explain why PETSc needs to duplicate the communicator so much?
> 
> Thanks for your help in tracking this down!
> 
> Derek
> 
> On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande  wrote:
> Why we do not use user-level MPI communicators directly? What are potential 
> risks here? 
> 
> 
> Fande,
> 
> On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay  wrote:
> PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to 
> MPI_Comm_dup() - thus potentially avoiding such errors
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html=DwIBAg=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU=
> 
> 
> Satish
> 
> On Mon, 2 Apr 2018, Kong, Fande wrote:
> 
> > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:
> >
> > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
> > >
> > > If so - you could try changing to PETSC_COMM_WORLD
> > >
> >
> >
> > I do not think we are using PETSC_COMM_WORLD when creating PETSc objects.
> > Why we can not use MPI_COMM_WORLD?
> >
> >
> > Fande,
> >
> >
> > >
> > > Satish
> > >
> > >
> > > On Mon, 2 Apr 2018, Kong, Fande wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> > > > applications. I have a error message for a standard test:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *preconditioners/pbp.lots_of_variables: MPI had an
> > > > errorpreconditioners/pbp.lots_of_variables:
> > > > 
> > > preconditioners/pbp.lots_of_variables:
> > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> > > > PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> > > > PMPI_Comm_dup(162)..:
> > > > preconditioners/pbp.lots_of_variables:
> > > > MPIR_Comm_dup_impl(57)..:
> > > > preconditioners/pbp.lots_of_variables:
> > > > MPIR_Comm_copy(739).:
> > > > preconditioners/pbp.lots_of_variables:
> > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
> > > free
> > > > on this process; ignore_id=0)*
> > > >
> > > >
> > > > I did "git bisect', and the following commit introduces this issue:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
> > > > >Date:   Sat Nov 5
> > > > 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> > > > hpmat already stores two HYPRE vectors*
> > > >
> > > > Before I debug line-by-line, anyone has a clue on this?
> > > >
> > > >
> > > > Fande,
> > > >
> > >
> > >
> >
> 



Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Derek Gaston
I’m working with Fande on this and I would like to add a bit more.  There
are many circumstances where we aren’t working on COMM_WORLD at all (e.g.
working on a sub-communicator) but PETSc was initialized using
MPI_COMM_WORLD (think multi-level solves)… and we need to create
arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We
definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering
duplication.

Can you explain why PETSc needs to duplicate the communicator so much?

Thanks for your help in tracking this down!

Derek

On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande  wrote:

> Why we do not use user-level MPI communicators directly? What are
> potential risks here?
>
>
> Fande,
>
> On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay  wrote:
>
>> PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to
>> MPI_Comm_dup() - thus potentially avoiding such errors
>>
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html=DwIBAg=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU=
>
>
>>
>> Satish
>>
>> On Mon, 2 Apr 2018, Kong, Fande wrote:
>>
>> > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:
>> >
>> > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
>> > >
>> > > If so - you could try changing to PETSC_COMM_WORLD
>> > >
>> >
>> >
>> > I do not think we are using PETSC_COMM_WORLD when creating PETSc
>> objects.
>> > Why we can not use MPI_COMM_WORLD?
>> >
>> >
>> > Fande,
>> >
>> >
>> > >
>> > > Satish
>> > >
>> > >
>> > > On Mon, 2 Apr 2018, Kong, Fande wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
>> > > > applications. I have a error message for a standard test:
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > *preconditioners/pbp.lots_of_variables: MPI had an
>> > > > errorpreconditioners/pbp.lots_of_variables:
>> > > > 
>> > > preconditioners/pbp.lots_of_variables:
>> > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
>> > > > PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
>> > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
>> > > > PMPI_Comm_dup(162)..:
>> > > > preconditioners/pbp.lots_of_variables:
>> > > > MPIR_Comm_dup_impl(57)..:
>> > > > preconditioners/pbp.lots_of_variables:
>> > > > MPIR_Comm_copy(739).:
>> > > > preconditioners/pbp.lots_of_variables:
>> > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
>> > > free
>> > > > on this process; ignore_id=0)*
>> > > >
>> > > >
>> > > > I did "git bisect', and the following commit introduces this issue:
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano
>> Zampini
>> > > > >Date:   Sat
>> Nov 5
>> > > > 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
>> > > > hpmat already stores two HYPRE vectors*
>> > > >
>> > > > Before I debug line-by-line, anyone has a clue on this?
>> > > >
>> > > >
>> > > > Fande,
>> > > >
>> > >
>> > >
>> >
>>
>>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
Why we do not use user-level MPI communicators directly? What are potential
risks here?


Fande,

On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay  wrote:

> PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to
> MPI_Comm_dup() - thus potentially avoiding such errors
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.
> anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_
> PetscCommDuplicate.html=DwIBAg=54IZrppPQZKX9mLzcGdPfFD1hxrcB_
> _aEkJFOKJFd00=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmi
> CY=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU=_
> zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU=
>
> Satish
>
> On Mon, 2 Apr 2018, Kong, Fande wrote:
>
> > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:
> >
> > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
> > >
> > > If so - you could try changing to PETSC_COMM_WORLD
> > >
> >
> >
> > I do not think we are using PETSC_COMM_WORLD when creating PETSc objects.
> > Why we can not use MPI_COMM_WORLD?
> >
> >
> > Fande,
> >
> >
> > >
> > > Satish
> > >
> > >
> > > On Mon, 2 Apr 2018, Kong, Fande wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> > > > applications. I have a error message for a standard test:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *preconditioners/pbp.lots_of_variables: MPI had an
> > > > errorpreconditioners/pbp.lots_of_variables:
> > > > 
> > > preconditioners/pbp.lots_of_variables:
> > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> > > > PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> > > > PMPI_Comm_dup(162)..:
> > > > preconditioners/pbp.lots_of_variables:
> > > > MPIR_Comm_dup_impl(57)..:
> > > > preconditioners/pbp.lots_of_variables:
> > > > MPIR_Comm_copy(739).:
> > > > preconditioners/pbp.lots_of_variables:
> > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
> > > free
> > > > on this process; ignore_id=0)*
> > > >
> > > >
> > > > I did "git bisect', and the following commit introduces this issue:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano
> Zampini
> > > > >Date:   Sat
> Nov 5
> > > > 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> > > > hpmat already stores two HYPRE vectors*
> > > >
> > > > Before I debug line-by-line, anyone has a clue on this?
> > > >
> > > >
> > > > Fande,
> > > >
> > >
> > >
> >
>
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to 
MPI_Comm_dup() - thus potentially avoiding such errors

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscCommDuplicate.html

Satish

On Mon, 2 Apr 2018, Kong, Fande wrote:

> On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:
> 
> > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
> >
> > If so - you could try changing to PETSC_COMM_WORLD
> >
> 
> 
> I do not think we are using PETSC_COMM_WORLD when creating PETSc objects.
> Why we can not use MPI_COMM_WORLD?
> 
> 
> Fande,
> 
> 
> >
> > Satish
> >
> >
> > On Mon, 2 Apr 2018, Kong, Fande wrote:
> >
> > > Hi All,
> > >
> > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> > > applications. I have a error message for a standard test:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *preconditioners/pbp.lots_of_variables: MPI had an
> > > errorpreconditioners/pbp.lots_of_variables:
> > > 
> > preconditioners/pbp.lots_of_variables:
> > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> > > PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> > > PMPI_Comm_dup(162)..:
> > > preconditioners/pbp.lots_of_variables:
> > > MPIR_Comm_dup_impl(57)..:
> > > preconditioners/pbp.lots_of_variables:
> > > MPIR_Comm_copy(739).:
> > > preconditioners/pbp.lots_of_variables:
> > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
> > free
> > > on this process; ignore_id=0)*
> > >
> > >
> > > I did "git bisect', and the following commit introduces this issue:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
> > > >Date:   Sat Nov 5
> > > 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> > > hpmat already stores two HYPRE vectors*
> > >
> > > Before I debug line-by-line, anyone has a clue on this?
> > >
> > >
> > > Fande,
> > >
> >
> >
> 



Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay  wrote:

> Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
>
> If so - you could try changing to PETSC_COMM_WORLD
>


I do not think we are using PETSC_COMM_WORLD when creating PETSc objects.
Why we can not use MPI_COMM_WORLD?


Fande,


>
> Satish
>
>
> On Mon, 2 Apr 2018, Kong, Fande wrote:
>
> > Hi All,
> >
> > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> > applications. I have a error message for a standard test:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *preconditioners/pbp.lots_of_variables: MPI had an
> > errorpreconditioners/pbp.lots_of_variables:
> > 
> preconditioners/pbp.lots_of_variables:
> > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> > PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> > PMPI_Comm_dup(162)..:
> > preconditioners/pbp.lots_of_variables:
> > MPIR_Comm_dup_impl(57)..:
> > preconditioners/pbp.lots_of_variables:
> > MPIR_Comm_copy(739).:
> > preconditioners/pbp.lots_of_variables:
> > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
> free
> > on this process; ignore_id=0)*
> >
> >
> > I did "git bisect', and the following commit introduces this issue:
> >
> >
> >
> >
> >
> >
> >
> >
> > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
> > >Date:   Sat Nov 5
> > 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> > hpmat already stores two HYPRE vectors*
> >
> > Before I debug line-by-line, anyone has a clue on this?
> >
> >
> > Fande,
> >
>
>


Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Satish Balay
Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?

If so - you could try changing to PETSC_COMM_WORLD

Satish


On Mon, 2 Apr 2018, Kong, Fande wrote:

> Hi All,
> 
> I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> applications. I have a error message for a standard test:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *preconditioners/pbp.lots_of_variables: MPI had an
> errorpreconditioners/pbp.lots_of_variables:
> preconditioners/pbp.lots_of_variables:
> Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> PMPI_Comm_dup(162)..:
> preconditioners/pbp.lots_of_variables:
> MPIR_Comm_dup_impl(57)..:
> preconditioners/pbp.lots_of_variables:
> MPIR_Comm_copy(739).:
> preconditioners/pbp.lots_of_variables:
> MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048 free
> on this process; ignore_id=0)*
> 
> 
> I did "git bisect', and the following commit introduces this issue:
> 
> 
> 
> 
> 
> 
> 
> 
> *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
> >Date:   Sat Nov 5
> 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> hpmat already stores two HYPRE vectors*
> 
> Before I debug line-by-line, anyone has a clue on this?
> 
> 
> Fande,
> 



Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
Nope.

There is a back trace:




































** thread #1: tid = 0x3b477b4, 0x7fffb306cd42
libsystem_kernel.dylib`__pthread_kill + 10, queue =
'com.apple.main-thread', stop reason = signal SIGABRT  * frame #0:
0x7fffb306cd42 libsystem_kernel.dylib`__pthread_kill + 10frame #1:
0x7fffb315a457 libsystem_pthread.dylib`pthread_kill + 90frame #2:
0x7fffb2fd2420 libsystem_c.dylib`abort + 129frame #3:
0x0001057ff30a
libpetsc.3.07.dylib`Petsc_MPI_AbortOnError(comm=,
flag=) + 26 at init.c:185 [opt]frame #4:
0x000106bd3245 libpmpi.12.dylib`MPIR_Err_return_comm + 533frame #5:
0x0001068defd4 libmpi.12.dylib`MPI_Comm_create + 3492frame #6:
0x0001061345d9
libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852,
participate=, new_comm_ptr=) + 409 at
gen_redcs_mat.c:531 [opt]frame #7: 0x00010618f8ba
libpetsc.3.07.dylib`hypre_GaussElimSetup(amg_data=0x7fe7ff857a00,
level=, relax_type=9) + 74 at par_relax.c:4209 [opt]frame
#8: 0x000106140e93
libpetsc.3.07.dylib`hypre_BoomerAMGSetup(amg_vdata=,
A=0x7fe80842aff0, f=0x7fe80842a980, u=0x7fe80842a510) + 17699
at par_amg_setup.c:2108 [opt]frame #9: 0x000105ec773c
libpetsc.3.07.dylib`PCSetUp_HYPRE(pc=) + 2540 at hypre.c:226
[opt]frame #10: 0x000105eea68d
libpetsc.3.07.dylib`PCSetUp(pc=0x7fe805553f50) + 797 at precon.c:968
[opt]frame #11: 0x000105ee9fe5
libpetsc.3.07.dylib`PCApply(pc=0x7fe805553f50, x=0x7fe80052d420,
y=0x7fe800522c20) + 181 at precon.c:478 [opt]frame #12:
0x0001015cf218
libmesh_opt.0.dylib`libMesh::PetscPreconditioner::apply(libMesh::NumericVector
const&, libMesh::NumericVector&) + 24frame #13:
0x0001009c7998
libmoose-opt.0.dylib`PhysicsBasedPreconditioner::apply(libMesh::NumericVector
const&, libMesh::NumericVector&) + 520frame #14:
0x0001016ad701 libmesh_opt.0.dylib`libmesh_petsc_preconditioner_apply +
129frame #15: 0x000105e7e715
libpetsc.3.07.dylib`PCApply_Shell(pc=0x7fe8052623f0,
x=0x7fe806805a20, y=0x7fe806805420) + 117 at shellpc.c:123 [opt]
frame #16: 0x000105eea079
libpetsc.3.07.dylib`PCApply(pc=0x7fe8052623f0, x=0x7fe806805a20,
y=0x7fe806805420) + 329 at precon.c:482 [opt]frame #17:
0x000105eeb611 libpetsc.3.07.dylib`PCApplyBAorAB(pc=0x7fe8052623f0,
side=PC_RIGHT, x=0x7fe806805a20, y=0x7fe806806020,
work=0x7fe806805420) + 945 at precon.c:714 [opt]frame #18:
0x000105f31658 libpetsc.3.07.dylib`KSPGMRESCycle [inlined]
KSP_PCApplyBAorAB(ksp=0x7fe80600, x=,
y=0x7fe806806020, w=) + 191 at kspimpl.h:295 [opt]
frame #19: 0x000105f31599
libpetsc.3.07.dylib`KSPGMRESCycle(itcount=, ksp=)
+ 553 at gmres.c:156 [opt]frame #20: 0x000105f326bd
libpetsc.3.07.dylib`KSPSolve_GMRES(ksp=) + 221 at gmres.c:240
[opt]frame #21: 0x000105f5f671
libpetsc.3.07.dylib`KSPSolve(ksp=0x7fe80600, b=0x7fe7fd946220,
x=) + 1345 at itfunc.c:677 [opt]frame #22:
0x000105fd0251
libpetsc.3.07.dylib`SNESSolve_NEWTONLS(snes=) + 1425 at
ls.c:230 [opt]frame #23: 0x000105fa10ca
libpetsc.3.07.dylib`SNESSolve(snes=, b=,
x=0x7fe7fd865e20) + 858 at snes.c:4128 [opt]frame #24:
0x0001016b63c3
libmesh_opt.0.dylib`libMesh::PetscNonlinearSolver::solve(libMesh::SparseMatrix&,
libMesh::NumericVector&, libMesh::NumericVector&, double,
unsigned int) + 835frame #25: 0x0001016fc244
libmesh_opt.0.dylib`libMesh::NonlinearImplicitSystem::solve() + 324
frame #26: 0x000100a71dc8 libmoose-opt.0.dylib`NonlinearSystem::solve()
+ 472frame #27: 0x0001009fe815
libmoose-opt.0.dylib`FEProblemBase::solve() + 117frame #28:
0x000100761fba libmoose-opt.0.dylib`Steady::execute() + 266frame
#29: 0x000100b78ac3 libmoose-opt.0.dylib`MooseApp::run() + 259frame
#30: 0x0001003843aa moose_test-opt`main + 122frame #31:
0x7fffb2f3e235 libdyld.dylib`start + 1*
Fande,


On Mon, Apr 2, 2018 at 4:02 PM, Stefano Zampini 
wrote:

> maybe this will fix ?
>
>
> *diff --git a/src/ksp/pc/impls/hypre/hypre.c
> b/src/ksp/pc/impls/hypre/hypre.c*
>
> *index 28addcf533..6a756d4c57 100644*
>
> *--- a/src/ksp/pc/impls/hypre/hypre.c*
>
> *+++ b/src/ksp/pc/impls/hypre/hypre.c*
>
> @@ -142,8 +142,7 @@ static PetscErrorCode PCSetUp_HYPRE(PC pc)
>
>
>
>ierr = PetscObjectTypeCompare((PetscObject)pc->pmat,MATHYPRE,
> );CHKERRQ(ierr);
>
>if (!ishypre) {
>
> -ierr = MatDestroy(>hpmat);CHKERRQ(ierr);
>
> -ierr = MatConvert(pc->pmat,MATHYPRE,MAT_INITIAL_MATRIX,>
> hpmat);CHKERRQ(ierr);
>
> +ierr = MatConvert(pc->pmat,MATHYPRE,jac->hpmat ? MAT_REUSE_MATRIX :
> MAT_INITIAL_MATRIX,>hpmat);CHKERRQ(ierr);
>
>} else {
>
>  ierr = PetscObjectReference((PetscObject)pc->pmat);CHKERRQ(ierr);
>
>  ierr = MatDestroy(>hpmat);CHKERRQ(ierr);
>
>
>
> 2018-04-02 23:46 GMT+02:00 Kong, Fande :
>
>> Hi All,
>>
>> I am trying to upgrade PETSc from 

Re: [petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Stefano Zampini
maybe this will fix ?


*diff --git a/src/ksp/pc/impls/hypre/hypre.c
b/src/ksp/pc/impls/hypre/hypre.c*

*index 28addcf533..6a756d4c57 100644*

*--- a/src/ksp/pc/impls/hypre/hypre.c*

*+++ b/src/ksp/pc/impls/hypre/hypre.c*

@@ -142,8 +142,7 @@ static PetscErrorCode PCSetUp_HYPRE(PC pc)



   ierr =
PetscObjectTypeCompare((PetscObject)pc->pmat,MATHYPRE,);CHKERRQ(ierr);

   if (!ishypre) {

-ierr = MatDestroy(>hpmat);CHKERRQ(ierr);

-ierr =
MatConvert(pc->pmat,MATHYPRE,MAT_INITIAL_MATRIX,>hpmat);CHKERRQ(ierr);

+ierr = MatConvert(pc->pmat,MATHYPRE,jac->hpmat ? MAT_REUSE_MATRIX :
MAT_INITIAL_MATRIX,>hpmat);CHKERRQ(ierr);

   } else {

 ierr = PetscObjectReference((PetscObject)pc->pmat);CHKERRQ(ierr);

 ierr = MatDestroy(>hpmat);CHKERRQ(ierr);



2018-04-02 23:46 GMT+02:00 Kong, Fande :

> Hi All,
>
> I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> applications. I have a error message for a standard test:
>
>
>
>
>
>
>
>
>
> *preconditioners/pbp.lots_of_variables: MPI had an
> errorpreconditioners/pbp.lots_of_variables:
> preconditioners/pbp.lots_of_variables:
> Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
> new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> PMPI_Comm_dup(162)..:
> preconditioners/pbp.lots_of_variables:
> MPIR_Comm_dup_impl(57)..:
> preconditioners/pbp.lots_of_variables:
> MPIR_Comm_copy(739).:
> preconditioners/pbp.lots_of_variables:
> MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048 free
> on this process; ignore_id=0)*
>
>
> I did "git bisect', and the following commit introduces this issue:
>
>
>
>
>
>
>
>
> *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
> >Date:   Sat Nov 5
> 20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
> hpmat already stores two HYPRE vectors*
>
> Before I debug line-by-line, anyone has a clue on this?
>
>
> Fande,
>



-- 
Stefano


[petsc-users] A bad commit affects MOOSE

2018-04-02 Thread Kong, Fande
Hi All,

I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
applications. I have a error message for a standard test:









*preconditioners/pbp.lots_of_variables: MPI had an
errorpreconditioners/pbp.lots_of_variables:
preconditioners/pbp.lots_of_variables:
Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
PMPI_Comm_dup(177)..: MPI_Comm_dup(comm=0x8401,
new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
PMPI_Comm_dup(162)..:
preconditioners/pbp.lots_of_variables:
MPIR_Comm_dup_impl(57)..:
preconditioners/pbp.lots_of_variables:
MPIR_Comm_copy(739).:
preconditioners/pbp.lots_of_variables:
MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048 free
on this process; ignore_id=0)*


I did "git bisect', and the following commit introduces this issue:








*commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
>Date:   Sat Nov 5
20:15:19 2016 +0300PCHYPRE: use internal Mat of type MatHYPRE
hpmat already stores two HYPRE vectors*

Before I debug line-by-line, anyone has a clue on this?


Fande,