Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-11-08 Thread Jed Brown via petsc-dev
This bug is present in Ubuntu 18.10, which distributes unpatched
mpich-3.3b2.  I just submitted a bug report:

https://bugs.launchpad.net/ubuntu/+source/mpich/+bug/1802372

Eric Chamberland  writes:

> Hi,
>
> mainly for PETSc users: please do no waste your time using MPI released 
> with Intel Parallel Studio 2019 since it is the buggy MPICH 3.3b2 for 
> which this initial thread has been created...
>
> I just wrote a remind about this also on Intel forum:
>
> https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/797761
>
> Eric
>
> On 19/04/18 09:01 AM, Eric Chamberland wrote:
>> Hi,
>> 
>> this morning, mpich/master with PETSc is 100% working again for us.
>> 
>> Thanks to both commits:
>> 
>> https://github.com/pmodels/mpich/commit/c597c8d79deea220a42751fda0f01ce70764c260
>>  
>> 
>> 
>> https://github.com/pmodels/mpich/commit/8edabc7373b82dd660019e53d246131765819294
>>  
>> 
>> 
>> and thanks to everybody who helped:
>> 
>> Satish
>> Min
>> Wesley
>> Ken
>> Rob
>> Jed
>> 
>> :)
>> 
>> Eric
>> 
>> On 17/04/18 04:58 PM, Min Si wrote:
>>> Hi all,
>>>
>>> Thanks for narrowing down the problem. I checked the MPICH code and 
>>> believe this is a bug in MPICH. I just created a PR to fix it:
>>> https://github.com/pmodels/mpich/pull/3097
>>>
>>> It should be merged into MPICH master branch soon.
>>>
>>> Thanks,
>>> Min
>>>
>>> On 2018/04/17 14:10, Eric Chamberland wrote:
 Hi,

 are we talking about the "tag" passed to MPI_Isend for example?

 but does that mean there is something to change for any MPI call 
 which involves tags usage or is it only a PETSc "bad" tag usage?

 thanks Satish for your finding!

 Eric

 On 16/04/18 11:31 PM, Satish Balay wrote:
> On Tue, 13 Mar 2018, Eric Chamberland wrote:
>
>> Hi,
>>
>> each night we are testing mpich/master with our petsc-based code.  
>> I don't
>> know if PETSc team is doing the same thing with mpich/master? 
>> (Maybe it is a
>> good idea?)
>>
>> Everything was fine (except the issue
>> https://github.com/pmodels/mpich/issues/2892) up to commit 
>> 7b8d64debd, but
>> since commit mpich:a8a2b30fd21), I have a segfault on a any 
>> parallel nightly
>> test.
>
> I attempted a bisect of the above range of commits - and narrowed 
> down to:
>

> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
> <<<
>
> balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
> Author: Ken Raffenetti 
> Date:   Thu Feb 15 11:37:59 2018 -0600
>
>  init: Fix tag upper limit initialization
>   The starting point for this value is equivalent to the 
> usable tag bits
>  macro. This value should be set before device initialization,
>  otherwise devices will assume they have more bits than are 
> actually
>  available.
>   Signed-off-by: Wesley Bland 
>
> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
> index cbc41f4d5..b31ae2f07 100644
> --- a/src/mpi/init/initthread.c
> +++ b/src/mpi/init/initthread.c
> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, 
> int required, int *provided)
>   MPIR_Process.attrs.host = MPI_PROC_NULL;
>   MPIR_Process.attrs.io = MPI_PROC_NULL;
>   MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
> -MPIR_Process.attrs.tag_ub = 0;
> +MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
>   MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
>   MPIR_Process.attrs.wtime_is_global = 0;
>   @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, 
> int required, int *provided)
>   MPIR_Assert(((unsigned) MPIR_Process.
>attrs.tag_ub & ((unsigned) 
> MPIR_Process.attrs.tag_ub + 1)) == 0);
>   -/* Set aside tag space for tagged collectives and failure 
> notification */
> -#ifdef HAVE_TAG_ERROR_BITS
> -MPIR_Process.attrs.tag_ub >>= 3;
> -#else
> -MPIR_Process.attrs.tag_ub >>= 1;
> -#endif
> -
>   /* Assert: tag_ub is at least the minimum asked for in the MPI 
> spec */
>   MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
> <
>
> Reverthing this patch gets mpich-3.3b2 working with petsc
>
> Satish
>


Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-10-02 Thread Eric Chamberland

Hi,

mainly for PETSc users: please do no waste your time using MPI released 
with Intel Parallel Studio 2019 since it is the buggy MPICH 3.3b2 for 
which this initial thread has been created...


I just wrote a remind about this also on Intel forum:

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/797761

Eric

On 19/04/18 09:01 AM, Eric Chamberland wrote:

Hi,

this morning, mpich/master with PETSc is 100% working again for us.

Thanks to both commits:

https://github.com/pmodels/mpich/commit/c597c8d79deea220a42751fda0f01ce70764c260 



https://github.com/pmodels/mpich/commit/8edabc7373b82dd660019e53d246131765819294 



and thanks to everybody who helped:

Satish
Min
Wesley
Ken
Rob
Jed

:)

Eric

On 17/04/18 04:58 PM, Min Si wrote:

Hi all,

Thanks for narrowing down the problem. I checked the MPICH code and 
believe this is a bug in MPICH. I just created a PR to fix it:

https://github.com/pmodels/mpich/pull/3097

It should be merged into MPICH master branch soon.

Thanks,
Min

On 2018/04/17 14:10, Eric Chamberland wrote:

Hi,

are we talking about the "tag" passed to MPI_Isend for example?

but does that mean there is something to change for any MPI call 
which involves tags usage or is it only a PETSc "bad" tag usage?


thanks Satish for your finding!

Eric

On 16/04/18 11:31 PM, Satish Balay wrote:

On Tue, 13 Mar 2018, Eric Chamberland wrote:


Hi,

each night we are testing mpich/master with our petsc-based code.  
I don't
know if PETSc team is doing the same thing with mpich/master? 
(Maybe it is a

good idea?)

Everything was fine (except the issue
https://github.com/pmodels/mpich/issues/2892) up to commit 
7b8d64debd, but
since commit mpich:a8a2b30fd21), I have a segfault on a any 
parallel nightly

test.


I attempted a bisect of the above range of commits - and narrowed 
down to:





db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
<<<



balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
$ git show db11d4c4a70e39a28be88ed32f00542301699e08
commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
Author: Ken Raffenetti 
Date:   Thu Feb 15 11:37:59 2018 -0600

 init: Fix tag upper limit initialization
  The starting point for this value is equivalent to the 
usable tag bits

 macro. This value should be set before device initialization,
 otherwise devices will assume they have more bits than are 
actually

 available.
  Signed-off-by: Wesley Bland 

diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
index cbc41f4d5..b31ae2f07 100644
--- a/src/mpi/init/initthread.c
+++ b/src/mpi/init/initthread.c
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, 
int required, int *provided)

  MPIR_Process.attrs.host = MPI_PROC_NULL;
  MPIR_Process.attrs.io = MPI_PROC_NULL;
  MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
-MPIR_Process.attrs.tag_ub = 0;
+MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
  MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
  MPIR_Process.attrs.wtime_is_global = 0;
  @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, 
int required, int *provided)

  MPIR_Assert(((unsigned) MPIR_Process.
   attrs.tag_ub & ((unsigned) 
MPIR_Process.attrs.tag_ub + 1)) == 0);
  -/* Set aside tag space for tagged collectives and failure 
notification */

-#ifdef HAVE_TAG_ERROR_BITS
-MPIR_Process.attrs.tag_ub >>= 3;
-#else
-MPIR_Process.attrs.tag_ub >>= 1;
-#endif
-
  /* Assert: tag_ub is at least the minimum asked for in the MPI 
spec */

  MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
<

Reverthing this patch gets mpich-3.3b2 working with petsc

Satish



Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-19 Thread Min Si

Hi Junchao,

This is a great idea. We will add large tag tests in our test suite !

Min

On 2018/04/17 18:17, Junchao Zhang wrote:

Min,
  I suggest MPICH add tests to play with the maximal MPI tag (through 
attribute MPI_TAG_UB).
  PETSc uses tags from the maximal and downwards. I guess MPICH tests 
use small tags. That is why the bug only showed up with PETSc.


--Junchao Zhang

On Tue, Apr 17, 2018 at 3:58 PM, Min Si > wrote:


Hi all,

Thanks for narrowing down the problem. I checked the MPICH code
and believe this is a bug in MPICH. I just created a PR to fix it:
https://github.com/pmodels/mpich/pull/3097


It should be merged into MPICH master branch soon.

Thanks,
Min


On 2018/04/17 14:10, Eric Chamberland wrote:

Hi,

are we talking about the "tag" passed to MPI_Isend for example?

but does that mean there is something to change for any MPI
call which involves tags usage or is it only a PETSc "bad" tag
usage?

thanks Satish for your finding!

Eric

On 16/04/18 11:31 PM, Satish Balay wrote:

On Tue, 13 Mar 2018, Eric Chamberland wrote:

Hi,

each night we are testing mpich/master with our
petsc-based code.  I don't
know if PETSc team is doing the same thing with
mpich/master?   (Maybe it is a
good idea?)

Everything was fine (except the issue
https://github.com/pmodels/mpich/issues/2892
) up to
commit 7b8d64debd, but
since commit mpich:a8a2b30fd21), I have a segfault on
a any parallel nightly
test.


I attempted a bisect of the above range of commits - and
narrowed down to:


db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad
commit
<<<


balay@asterix /home/balay/soft/build/mpich
((db11d4c4a...)|BISECTING)
$ git show db11d4c4a70e39a28be88ed32f00542301699e08
commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD,
refs/bisect/bad)
Author: Ken Raffenetti >
Date:   Thu Feb 15 11:37:59 2018 -0600

 init: Fix tag upper limit initialization
  The starting point for this value is equivalent
to the usable tag bits
 macro. This value should be set before device
initialization,
 otherwise devices will assume they have more bits
than are actually
 available.
  Signed-off-by: Wesley Bland
>

diff --git a/src/mpi/init/initthread.c
b/src/mpi/init/initthread.c
index cbc41f4d5..b31ae2f07 100644
--- a/src/mpi/init/initthread.c
+++ b/src/mpi/init/initthread.c
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char
***argv, int required, int *provided)
  MPIR_Process.attrs.host = MPI_PROC_NULL;
MPIR_Process.attrs.io  =
MPI_PROC_NULL;
  MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
-    MPIR_Process.attrs.tag_ub = 0;
+    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
  MPIR_Process.attrs.universe =
MPIR_UNIVERSE_SIZE_NOT_SET;
  MPIR_Process.attrs.wtime_is_global = 0;
  @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc,
char ***argv, int required, int *provided)
  MPIR_Assert(((unsigned) MPIR_Process.
   attrs.tag_ub & ((unsigned)
MPIR_Process.attrs.tag_ub + 1)) == 0);
  -    /* Set aside tag space for tagged collectives and
failure notification */
-#ifdef HAVE_TAG_ERROR_BITS
-    MPIR_Process.attrs.tag_ub >>= 3;
-#else
-    MPIR_Process.attrs.tag_ub >>= 1;
-#endif
-
  /* Assert: tag_ub is at least the minimum asked for
in the MPI spec */
  MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
<

Reverthing this patch gets mpich-3.3b2 working with petsc

Satish







Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-19 Thread Eric Chamberland

Hi,

this morning, mpich/master with PETSc is 100% working again for us.

Thanks to both commits:

https://github.com/pmodels/mpich/commit/c597c8d79deea220a42751fda0f01ce70764c260

https://github.com/pmodels/mpich/commit/8edabc7373b82dd660019e53d246131765819294

and thanks to everybody who helped:

Satish
Min
Wesley
Ken
Rob
Jed

:)

Eric

On 17/04/18 04:58 PM, Min Si wrote:

Hi all,

Thanks for narrowing down the problem. I checked the MPICH code and 
believe this is a bug in MPICH. I just created a PR to fix it:

https://github.com/pmodels/mpich/pull/3097

It should be merged into MPICH master branch soon.

Thanks,
Min

On 2018/04/17 14:10, Eric Chamberland wrote:

Hi,

are we talking about the "tag" passed to MPI_Isend for example?

but does that mean there is something to change for any MPI call which 
involves tags usage or is it only a PETSc "bad" tag usage?


thanks Satish for your finding!

Eric

On 16/04/18 11:31 PM, Satish Balay wrote:

On Tue, 13 Mar 2018, Eric Chamberland wrote:


Hi,

each night we are testing mpich/master with our petsc-based code.  I 
don't
know if PETSc team is doing the same thing with mpich/master? (Maybe 
it is a

good idea?)

Everything was fine (except the issue
https://github.com/pmodels/mpich/issues/2892) up to commit 
7b8d64debd, but
since commit mpich:a8a2b30fd21), I have a segfault on a any parallel 
nightly

test.


I attempted a bisect of the above range of commits - and narrowed 
down to:





db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
<<<



balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
$ git show db11d4c4a70e39a28be88ed32f00542301699e08
commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
Author: Ken Raffenetti 
Date:   Thu Feb 15 11:37:59 2018 -0600

 init: Fix tag upper limit initialization
  The starting point for this value is equivalent to the 
usable tag bits

 macro. This value should be set before device initialization,
 otherwise devices will assume they have more bits than are actually
 available.
  Signed-off-by: Wesley Bland 

diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
index cbc41f4d5..b31ae2f07 100644
--- a/src/mpi/init/initthread.c
+++ b/src/mpi/init/initthread.c
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int 
required, int *provided)

  MPIR_Process.attrs.host = MPI_PROC_NULL;
  MPIR_Process.attrs.io = MPI_PROC_NULL;
  MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
-MPIR_Process.attrs.tag_ub = 0;
+MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
  MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
  MPIR_Process.attrs.wtime_is_global = 0;
  @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, 
int required, int *provided)

  MPIR_Assert(((unsigned) MPIR_Process.
   attrs.tag_ub & ((unsigned) 
MPIR_Process.attrs.tag_ub + 1)) == 0);
  -/* Set aside tag space for tagged collectives and failure 
notification */

-#ifdef HAVE_TAG_ERROR_BITS
-MPIR_Process.attrs.tag_ub >>= 3;
-#else
-MPIR_Process.attrs.tag_ub >>= 1;
-#endif
-
  /* Assert: tag_ub is at least the minimum asked for in the MPI 
spec */

  MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
<

Reverthing this patch gets mpich-3.3b2 working with petsc

Satish



Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-17 Thread Junchao Zhang
Min,
  I suggest MPICH add tests to play with the maximal MPI tag (through
attribute MPI_TAG_UB).
  PETSc uses tags from the maximal and downwards. I guess MPICH tests use
small tags. That is why the bug only showed up with PETSc.

--Junchao Zhang

On Tue, Apr 17, 2018 at 3:58 PM, Min Si  wrote:

> Hi all,
>
> Thanks for narrowing down the problem. I checked the MPICH code and
> believe this is a bug in MPICH. I just created a PR to fix it:
> https://github.com/pmodels/mpich/pull/3097
>
> It should be merged into MPICH master branch soon.
>
> Thanks,
> Min
>
>
> On 2018/04/17 14:10, Eric Chamberland wrote:
>
>> Hi,
>>
>> are we talking about the "tag" passed to MPI_Isend for example?
>>
>> but does that mean there is something to change for any MPI call which
>> involves tags usage or is it only a PETSc "bad" tag usage?
>>
>> thanks Satish for your finding!
>>
>> Eric
>>
>> On 16/04/18 11:31 PM, Satish Balay wrote:
>>
>>> On Tue, 13 Mar 2018, Eric Chamberland wrote:
>>>
>>> Hi,

 each night we are testing mpich/master with our petsc-based code.  I
 don't
 know if PETSc team is doing the same thing with mpich/master?   (Maybe
 it is a
 good idea?)

 Everything was fine (except the issue
 https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd,
 but
 since commit mpich:a8a2b30fd21), I have a segfault on a any parallel
 nightly
 test.

>>>
>>> I attempted a bisect of the above range of commits - and narrowed down
>>> to:
>>>
>>>
>> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
>>> <<<
>>>

>>> balay@asterix /home/balay/soft/build/mpich
>>> ((db11d4c4a...)|BISECTING)
>>> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
>>> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
>>> Author: Ken Raffenetti 
>>> Date:   Thu Feb 15 11:37:59 2018 -0600
>>>
>>>  init: Fix tag upper limit initialization
>>>   The starting point for this value is equivalent to the usable
>>> tag bits
>>>  macro. This value should be set before device initialization,
>>>  otherwise devices will assume they have more bits than are actually
>>>  available.
>>>   Signed-off-by: Wesley Bland 
>>>
>>> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
>>> index cbc41f4d5..b31ae2f07 100644
>>> --- a/src/mpi/init/initthread.c
>>> +++ b/src/mpi/init/initthread.c
>>> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int
>>> required, int *provided)
>>>   MPIR_Process.attrs.host = MPI_PROC_NULL;
>>>   MPIR_Process.attrs.io = MPI_PROC_NULL;
>>>   MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
>>> -MPIR_Process.attrs.tag_ub = 0;
>>> +MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
>>>   MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
>>>   MPIR_Process.attrs.wtime_is_global = 0;
>>>   @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, int
>>> required, int *provided)
>>>   MPIR_Assert(((unsigned) MPIR_Process.
>>>attrs.tag_ub & ((unsigned) MPIR_Process.attrs.tag_ub
>>> + 1)) == 0);
>>>   -/* Set aside tag space for tagged collectives and failure
>>> notification */
>>> -#ifdef HAVE_TAG_ERROR_BITS
>>> -MPIR_Process.attrs.tag_ub >>= 3;
>>> -#else
>>> -MPIR_Process.attrs.tag_ub >>= 1;
>>> -#endif
>>> -
>>>   /* Assert: tag_ub is at least the minimum asked for in the MPI
>>> spec */
>>>   MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
>>> <
>>>
>>> Reverthing this patch gets mpich-3.3b2 working with petsc
>>>
>>> Satish
>>>
>>>
>


Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-17 Thread Satish Balay
Thanks! I tried the patch - and this testcase doesn't hang anymore..

Satish

On Tue, 17 Apr 2018, Min Si wrote:

> Hi all,
> 
> Thanks for narrowing down the problem. I checked the MPICH code and believe
> this is a bug in MPICH. I just created a PR to fix it:
> https://github.com/pmodels/mpich/pull/3097
> 
> It should be merged into MPICH master branch soon.
> 
> Thanks,
> Min
> 
> On 2018/04/17 14:10, Eric Chamberland wrote:
> > Hi,
> >
> > are we talking about the "tag" passed to MPI_Isend for example?
> >
> > but does that mean there is something to change for any MPI call which
> > involves tags usage or is it only a PETSc "bad" tag usage?
> >
> > thanks Satish for your finding!
> >
> > Eric
> >
> > On 16/04/18 11:31 PM, Satish Balay wrote:
> >> On Tue, 13 Mar 2018, Eric Chamberland wrote:
> >>
> >>> Hi,
> >>>
> >>> each night we are testing mpich/master with our petsc-based code.  I don't
> >>> know if PETSc team is doing the same thing with mpich/master?   (Maybe it
> >>> is a
> >>> good idea?)
> >>>
> >>> Everything was fine (except the issue
> >>> https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd, but
> >>> since commit mpich:a8a2b30fd21), I have a segfault on a any parallel
> >>> nightly
> >>> test.
> >>
> >> I attempted a bisect of the above range of commits - and narrowed down to:
> >>
> >
> >> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
> >> <<<
> >>
> >> balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
> >> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
> >> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
> >> Author: Ken Raffenetti 
> >> Date:   Thu Feb 15 11:37:59 2018 -0600
> >>
> >>  init: Fix tag upper limit initialization
> >>   The starting point for this value is equivalent to the usable tag
> >> bits
> >>  macro. This value should be set before device initialization,
> >>  otherwise devices will assume they have more bits than are actually
> >>  available.
> >>   Signed-off-by: Wesley Bland 
> >>
> >> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
> >> index cbc41f4d5..b31ae2f07 100644
> >> --- a/src/mpi/init/initthread.c
> >> +++ b/src/mpi/init/initthread.c
> >> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int
> >> required, int *provided)
> >>   MPIR_Process.attrs.host = MPI_PROC_NULL;
> >>   MPIR_Process.attrs.io = MPI_PROC_NULL;
> >>   MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
> >> -    MPIR_Process.attrs.tag_ub = 0;
> >> +    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
> >>   MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
> >>   MPIR_Process.attrs.wtime_is_global = 0;
> >>   @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, int
> >> required, int *provided)
> >>   MPIR_Assert(((unsigned) MPIR_Process.
> >>    attrs.tag_ub & ((unsigned) MPIR_Process.attrs.tag_ub +
> >> 1)) == 0);
> >>   -    /* Set aside tag space for tagged collectives and failure
> >> notification */
> >> -#ifdef HAVE_TAG_ERROR_BITS
> >> -    MPIR_Process.attrs.tag_ub >>= 3;
> >> -#else
> >> -    MPIR_Process.attrs.tag_ub >>= 1;
> >> -#endif
> >> -
> >>   /* Assert: tag_ub is at least the minimum asked for in the MPI spec
> >> */
> >>   MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
> >> <
> >>
> >> Reverthing this patch gets mpich-3.3b2 working with petsc
> >>
> >> Satish
> >>
> 
> 
> 


Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-17 Thread Min Si

Hi all,

Thanks for narrowing down the problem. I checked the MPICH code and 
believe this is a bug in MPICH. I just created a PR to fix it:

https://github.com/pmodels/mpich/pull/3097

It should be merged into MPICH master branch soon.

Thanks,
Min

On 2018/04/17 14:10, Eric Chamberland wrote:

Hi,

are we talking about the "tag" passed to MPI_Isend for example?

but does that mean there is something to change for any MPI call which 
involves tags usage or is it only a PETSc "bad" tag usage?


thanks Satish for your finding!

Eric

On 16/04/18 11:31 PM, Satish Balay wrote:

On Tue, 13 Mar 2018, Eric Chamberland wrote:


Hi,

each night we are testing mpich/master with our petsc-based code.  I 
don't
know if PETSc team is doing the same thing with mpich/master?   
(Maybe it is a

good idea?)

Everything was fine (except the issue
https://github.com/pmodels/mpich/issues/2892) up to commit 
7b8d64debd, but
since commit mpich:a8a2b30fd21), I have a segfault on a any parallel 
nightly

test.


I attempted a bisect of the above range of commits - and narrowed 
down to:





db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
<<<



balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
$ git show db11d4c4a70e39a28be88ed32f00542301699e08
commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
Author: Ken Raffenetti 
Date:   Thu Feb 15 11:37:59 2018 -0600

 init: Fix tag upper limit initialization
  The starting point for this value is equivalent to the 
usable tag bits

 macro. This value should be set before device initialization,
 otherwise devices will assume they have more bits than are actually
 available.
  Signed-off-by: Wesley Bland 

diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
index cbc41f4d5..b31ae2f07 100644
--- a/src/mpi/init/initthread.c
+++ b/src/mpi/init/initthread.c
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int 
required, int *provided)

  MPIR_Process.attrs.host = MPI_PROC_NULL;
  MPIR_Process.attrs.io = MPI_PROC_NULL;
  MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
-    MPIR_Process.attrs.tag_ub = 0;
+    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
  MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
  MPIR_Process.attrs.wtime_is_global = 0;
  @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, 
int required, int *provided)

  MPIR_Assert(((unsigned) MPIR_Process.
   attrs.tag_ub & ((unsigned) 
MPIR_Process.attrs.tag_ub + 1)) == 0);
  -    /* Set aside tag space for tagged collectives and failure 
notification */

-#ifdef HAVE_TAG_ERROR_BITS
-    MPIR_Process.attrs.tag_ub >>= 3;
-#else
-    MPIR_Process.attrs.tag_ub >>= 1;
-#endif
-
  /* Assert: tag_ub is at least the minimum asked for in the MPI 
spec */

  MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
<

Reverthing this patch gets mpich-3.3b2 working with petsc

Satish





Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-17 Thread Eric Chamberland

Hi,

are we talking about the "tag" passed to MPI_Isend for example?

but does that mean there is something to change for any MPI call which 
involves tags usage or is it only a PETSc "bad" tag usage?


thanks Satish for your finding!

Eric

On 16/04/18 11:31 PM, Satish Balay wrote:

On Tue, 13 Mar 2018, Eric Chamberland wrote:


Hi,

each night we are testing mpich/master with our petsc-based code.  I don't
know if PETSc team is doing the same thing with mpich/master?   (Maybe it is a
good idea?)

Everything was fine (except the issue
https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd, but
since commit mpich:a8a2b30fd21), I have a segfault on a any parallel nightly
test.


I attempted a bisect of the above range of commits - and narrowed down to:




db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
<<<



balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
$ git show db11d4c4a70e39a28be88ed32f00542301699e08
commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
Author: Ken Raffenetti 
Date:   Thu Feb 15 11:37:59 2018 -0600

 init: Fix tag upper limit initialization
 
 The starting point for this value is equivalent to the usable tag bits

 macro. This value should be set before device initialization,
 otherwise devices will assume they have more bits than are actually
 available.
 
 Signed-off-by: Wesley Bland 


diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
index cbc41f4d5..b31ae2f07 100644
--- a/src/mpi/init/initthread.c
+++ b/src/mpi/init/initthread.c
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int required, 
int *provided)
  MPIR_Process.attrs.host = MPI_PROC_NULL;
  MPIR_Process.attrs.io = MPI_PROC_NULL;
  MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
-MPIR_Process.attrs.tag_ub = 0;
+MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
  MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
  MPIR_Process.attrs.wtime_is_global = 0;
  
@@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, int required, int *provided)

  MPIR_Assert(((unsigned) MPIR_Process.
   attrs.tag_ub & ((unsigned) MPIR_Process.attrs.tag_ub + 1)) 
== 0);
  
-/* Set aside tag space for tagged collectives and failure notification */

-#ifdef HAVE_TAG_ERROR_BITS
-MPIR_Process.attrs.tag_ub >>= 3;
-#else
-MPIR_Process.attrs.tag_ub >>= 1;
-#endif
-
  /* Assert: tag_ub is at least the minimum asked for in the MPI spec */
  MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
  
<


Reverthing this patch gets mpich-3.3b2 working with petsc

Satish



Re: [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-04-16 Thread Satish Balay
On Tue, 13 Mar 2018, Eric Chamberland wrote:

> Hi,
> 
> each night we are testing mpich/master with our petsc-based code.  I don't
> know if PETSc team is doing the same thing with mpich/master?   (Maybe it is a
> good idea?)
> 
> Everything was fine (except the issue
> https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd, but
> since commit mpich:a8a2b30fd21), I have a segfault on a any parallel nightly
> test.

I attempted a bisect of the above range of commits - and narrowed down to:

>>>
db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
<<<

balay@asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
$ git show db11d4c4a70e39a28be88ed32f00542301699e08
commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
Author: Ken Raffenetti 
Date:   Thu Feb 15 11:37:59 2018 -0600

init: Fix tag upper limit initialization

The starting point for this value is equivalent to the usable tag bits
macro. This value should be set before device initialization,
otherwise devices will assume they have more bits than are actually
available.

Signed-off-by: Wesley Bland 

diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
index cbc41f4d5..b31ae2f07 100644
--- a/src/mpi/init/initthread.c
+++ b/src/mpi/init/initthread.c
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int required, 
int *provided)
 MPIR_Process.attrs.host = MPI_PROC_NULL;
 MPIR_Process.attrs.io = MPI_PROC_NULL;
 MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
-MPIR_Process.attrs.tag_ub = 0;
+MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
 MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
 MPIR_Process.attrs.wtime_is_global = 0;
 
@@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, int 
required, int *provided)
 MPIR_Assert(((unsigned) MPIR_Process.
  attrs.tag_ub & ((unsigned) MPIR_Process.attrs.tag_ub + 1)) == 
0);
 
-/* Set aside tag space for tagged collectives and failure notification */
-#ifdef HAVE_TAG_ERROR_BITS
-MPIR_Process.attrs.tag_ub >>= 3;
-#else
-MPIR_Process.attrs.tag_ub >>= 1;
-#endif
-
 /* Assert: tag_ub is at least the minimum asked for in the MPI spec */
 MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
 
<

Reverthing this patch gets mpich-3.3b2 working with petsc

Satish

[petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

2018-03-13 Thread Eric Chamberland

Hi,

each night we are testing mpich/master with our petsc-based code.  I 
don't know if PETSc team is doing the same thing with mpich/master?   
(Maybe it is a good idea?)


Everything was fine (except the issue 
https://github.com/pmodels/mpich/issues/2892) up to commit 7b8d64debd, 
but since commit mpich:a8a2b30fd21), I have a segfault on a any parallel 
nightly test.


For example, a 2 process test ends at almost different execution points:

rank 0:

#003: /lib64/libpthread.so.0(+0xf870) [0x7f25bf908870]
#004: 
/pmi/cmpbib/compilation_BIB_dernier_mpich/COMPILE_AUTO/BIB/bin/BIBMEFGD.opt() 
[0x64a788]
#005: /lib64/libc.so.6(+0x35140) [0x7f25bca18140]
#006: /lib64/libc.so.6(__poll+0x2d) [0x7f25bcabfbfd]
#007: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x1e4cc9) [0x7f25bd90ccc9]
#008: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x1ea55c) [0x7f25bd91255c]
#009: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0xba657) [0x7f25bd7e2657]
#010: /opt/mpich-3.x_debug/lib/libmpi.so.0(PMPI_Waitall+0xe3) [0x7f25bd7e3343]
#011: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(PetscGatherMessageLengths+0x654)
 [0x7f25c4bb3193]
#012: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecScatterCreate_PtoS+0x859)
 [0x7f25c4e82d7f]
#013: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecScatterCreate+0x5684)
 [0x7f25c4e4d055]
#014: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecCreateGhostWithArray+0x688)
 [0x7f25c4e01a39]
#015: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecCreateGhost+0x179)
 [0x7f25c4e020f6]

rank 1:

#002: 
/pmi/cmpbib/compilation_BIB_dernier_mpich/COMPILE_AUTO/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x2bd0)
 [0x7f62df8e7310]
#003: /lib64/libc.so.6(+0x35140) [0x7f62d3bc9140]
#004: /lib64/libc.so.6(__poll+0x2d) [0x7f62d3c70bfd]
#005: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x1e4cc9) [0x7f62d4abdcc9]
#006: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x1ea55c) [0x7f62d4ac355c]
#007: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x12c9c5) [0x7f62d4a059c5]
#008: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x12e102) [0x7f62d4a07102]
#009: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0xf17a1) [0x7f62d49ca7a1]
#010: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x3facf) [0x7f62d4918acf]
#011: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x3fc3d) [0x7f62d4918c3d]
#012: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0xf18d8) [0x7f62d49ca8d8]
#013: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x3fb88) [0x7f62d4918b88]
#014: /opt/mpich-3.x_debug/lib/libmpi.so.0(+0x3fc3d) [0x7f62d4918c3d]
#015: /opt/mpich-3.x_debug/lib/libmpi.so.0(MPI_Barrier+0x27b) [0x7f62d4918edb]
#016: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(PetscCommGetNewTag+0x3ff)
 [0x7f62dbceb055]
#017: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(PetscObjectGetNewTag+0x15d)
 [0x7f62dbceaadb]
#018: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecScatterCreateCommon_PtoS+0x1ee)
 [0x7f62dc03625c]
#019: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecScatterCreate_PtoS+0x29c4)
 [0x7f62dc035eea]
#020: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecScatterCreate+0x5684)
 [0x7f62dbffe055]
#021: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecCreateGhostWithArray+0x688)
 [0x7f62dbfb2a39]
#022: 
/opt/petsc-3.8.3_debug_mpich-3.x_debug/lib/libpetsc.so.3.8(VecCreateGhost+0x179)
 [0x7f62dbfb30f6]

Have some other users (PETSc users?) reported problem?

Thanks,

Eric

ps: usual informations:

mpich logs:

http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_config.system
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_mpich_version.txt
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_c.txt
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_m.txt
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_mi.txt
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_openmpa_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_mpl_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_pm_hydra_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_pm_hydra_tools_topo_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_mpiexec_info.txt

Petsc logs:
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_configure.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_make.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_default.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_RDict.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2018.03.12.05h39m54s_CMakeLists.txt