Re: [OMPI users] ipath_userinit errors

2014-11-06 Thread Friedley, Andrew
Michael,

I can only guess, but perhaps it is due to some nodes with differing software 
installations, or not all nodes using context sharing?

You might try contacting Redhat support to see if they know more since the RPMs 
are sourced from them, or even try Intel support via ibsupp...@intel.com.

Andrew

> -Original Message-
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di
> Domenico
> Sent: Thursday, November 6, 2014 4:38 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] ipath_userinit errors
> 
> Andrew,
> 
> Thanks.  We're using the RHEL version because it was less complicated for our
> environment in the past, but sounds like we might want to reconsider that
> decision.
> 
> Do you know why we don't see the message with lower node count
> allocations?  It only seems to happen when the node count gets over a
> certain point?
> 
> thanks
> 
> On Wed, Nov 5, 2014 at 5:51 PM, Friedley, Andrew
>  wrote:
> > Hi Michael,
> >
> > From what I understand, this is an issue with the qib driver and PSM from
> RHEL 6.5 and 6.6, and will be fixed for 6.7.  There is no functional change
> between qib->PSM API versions 11 and 12, so the message is harmless.  I
> presume you're using the RHEL sourced package for a reason, but using an
> IFS release would fix the problem until RHEL 6.7 is ready.
> >
> > Andrew
> >
> >> -Original Message-
> >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael
> >> Di Domenico
> >> Sent: Tuesday, November 4, 2014 8:35 AM
> >> To: Open MPI Users
> >> Subject: [OMPI users] ipath_userinit errors
> >>
> >> I'm getting the below message on my cluster(s).  It seems to only
> >> happen when I try to use more then 64 nodes (16-cores each).  The
> >> clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM.
> >> I'm using the OFED versions included with RHEL for infiniband support.
> >>
> >> ipath_userinit: Mismatched user minor version (12) and driver minor
> >> version
> >> (11) while context sharing. Ensure that driver and library are from
> >> the same release
> >>
> >> I already realize this is a warning message and the jobs complete.
> >> Another user a little over a year ago had a similar issue that was
> >> tracked to mismatched ofed versions.  Since i have a diskless cluster
> >> all my nodes are identical.
> >>
> >> I'm not adverse to thinking there might not be something unique about
> >> my machine, but since i have two separate machines doing it, I'm not
> >> really sure where to look to triage the issue and see what might be set
> incorrectly.
> >>
> >> Any thoughts on where to start checking would be helpful, thanks...
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post: http://www.open-
> >> mpi.org/community/lists/users/2014/11/25667.php
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2014/11/25694.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-
> mpi.org/community/lists/users/2014/11/25698.php


Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast

2014-11-06 Thread Rolf vandeVaart
The CUDA person is now responding.  I will try and reproduce.  I looked through 
the zip file but did not see the mpirun command.   Can this be reproduced with 
-np 4 running across four nodes?
Also, in your original message you wrote "Likewise, it doesn't matter if I 
enable CUDA support or not. "  Can you provide more detail about what that 
means?
Thanks

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 06, 2014 1:05 PM
To: Open MPI Users
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

I was hoping our CUDA person would respond, but in the interim - I would 
suggest trying the nightly 1.8.4 tarball as we are getting ready to release it, 
and I know there were some CUDA-related patches since 1.8.1

http://www.open-mpi.org/nightly/v1.8/


On Nov 5, 2014, at 4:45 PM, Steven Eliuk 
> wrote:

OpenMPI: 1.8.1 with CUDA RDMA...

Thanks sir and sorry for the late response,

Kindest Regards,
-
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Ralph Castain >
Reply-To: Open MPI Users >
List-Post: users@lists.open-mpi.org
Date: Monday, November 3, 2014 at 10:02 AM
To: Open MPI Users >
Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
MPI_Ibcast

Which version of OMPI were you testing?

On Nov 3, 2014, at 9:14 AM, Steven Eliuk 
> wrote:

Hello,

We were using OpenMPI for some testing, everything works fine but randomly, 
MPI_Ibcast()
takes long time to finish. We have a standalone program just to test it.  The 
following
is the profiling results of the simple test program on our cluster:

Ibcast 604 mb takes 103 ms
Ibcast 608 mb takes 106 ms
Ibcast 612 mb takes 105 ms
Ibcast 616 mb takes 105 ms
Ibcast 620 mb takes 107 ms
Ibcast 624 mb takes 107 ms
Ibcast 628 mb takes 108 ms
Ibcast 632 mb takes 110 ms
Ibcast 636 mb takes 110 ms
Ibcast 640 mb takes 7437 ms
Ibcast 644 mb takes 115 ms
Ibcast 648 mb takes 111 ms
Ibcast 652 mb takes 112 ms
Ibcast 656 mb takes 112 ms
Ibcast 660 mb takes 114 ms
Ibcast 664 mb takes 114 ms
Ibcast 668 mb takes 115 ms
Ibcast 672 mb takes 116 ms
Ibcast 676 mb takes 116 ms
Ibcast 680 mb takes 116 ms
Ibcast 684 mb takes 122 ms
Ibcast 688 mb takes 7385 ms
Ibcast 692 mb takes 8729 ms
Ibcast 696 mb takes 120 ms
Ibcast 700 mb takes 124 ms
Ibcast 704 mb takes 121 ms
Ibcast 708 mb takes 8240 ms
Ibcast 712 mb takes 122 ms
Ibcast 716 mb takes 123 ms
Ibcast 720 mb takes 123 ms
Ibcast 724 mb takes 124 ms
Ibcast 728 mb takes 125 ms
Ibcast 732 mb takes 125 ms
Ibcast 736 mb takes 126 ms

As you can see, Ibcast takes a long to finish and it's totally random.
The same program was compiled and tested with MVAPICH2-gdr but it went smoothly.
Both tests were running exclusively on our four nodes cluster without 
contention. Likewise, it doesn't matter
if I enable CUDA support or not.  The followings are the configuration of our 
server:

We have four nodes in this test, each with one K40 GPU and connected with 
mellanox IB.

Please find attached config details and some sample code...

Kindest Regards,
-
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25662.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25695.php


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast

2014-11-06 Thread Ralph Castain
I was hoping our CUDA person would respond, but in the interim - I would 
suggest trying the nightly 1.8.4 tarball as we are getting ready to release it, 
and I know there were some CUDA-related patches since 1.8.1

http://www.open-mpi.org/nightly/v1.8/ 


> On Nov 5, 2014, at 4:45 PM, Steven Eliuk  wrote:
> 
> OpenMPI: 1.8.1 with CUDA RDMA…
> 
> Thanks sir and sorry for the late response,
> 
> Kindest Regards,
> —
> Steven Eliuk, Ph.D. Comp Sci,
> Advanced Software Platforms Lab,
> SRA - SV,
> Samsung Electronics,
> 1732 North First Street,
> San Jose, CA 95112,
> Work: +1 408-652-1976,
> Work: +1 408-544-5781 Wednesdays,
> Cell: +1 408-819-4407.
> 
> 
> From: Ralph Castain >
> Reply-To: Open MPI Users >
> Date: Monday, November 3, 2014 at 10:02 AM
> To: Open MPI Users >
> Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of 
> MPI_Ibcast
> 
> Which version of OMPI were you testing?
> 
>> On Nov 3, 2014, at 9:14 AM, Steven Eliuk > > wrote:
>> 
>> Hello,
>> 
>> We were using OpenMPI for some testing, everything works fine but randomly, 
>> MPI_Ibcast()
>> takes long time to finish. We have a standalone program just to test it.  
>> The following 
>> is the profiling results of the simple test program on our cluster:
>> 
>> Ibcast 604 mb takes 103 ms
>> Ibcast 608 mb takes 106 ms
>> Ibcast 612 mb takes 105 ms
>> Ibcast 616 mb takes 105 ms
>> Ibcast 620 mb takes 107 ms
>> Ibcast 624 mb takes 107 ms
>> Ibcast 628 mb takes 108 ms
>> Ibcast 632 mb takes 110 ms
>> Ibcast 636 mb takes 110 ms
>> Ibcast 640 mb takes 7437 ms
>> Ibcast 644 mb takes 115 ms
>> Ibcast 648 mb takes 111 ms
>> Ibcast 652 mb takes 112 ms
>> Ibcast 656 mb takes 112 ms
>> Ibcast 660 mb takes 114 ms
>> Ibcast 664 mb takes 114 ms
>> Ibcast 668 mb takes 115 ms
>> Ibcast 672 mb takes 116 ms
>> Ibcast 676 mb takes 116 ms
>> Ibcast 680 mb takes 116 ms
>> Ibcast 684 mb takes 122 ms
>> Ibcast 688 mb takes 7385 ms
>> Ibcast 692 mb takes 8729 ms
>> Ibcast 696 mb takes 120 ms
>> Ibcast 700 mb takes 124 ms
>> Ibcast 704 mb takes 121 ms
>> Ibcast 708 mb takes 8240 ms
>> Ibcast 712 mb takes 122 ms
>> Ibcast 716 mb takes 123 ms
>> Ibcast 720 mb takes 123 ms
>> Ibcast 724 mb takes 124 ms
>> Ibcast 728 mb takes 125 ms
>> Ibcast 732 mb takes 125 ms
>> Ibcast 736 mb takes 126 ms
>> 
>> As you can see, Ibcast takes a long to finish and it's totally random.
>> The same program was compiled and tested with MVAPICH2-gdr but it went 
>> smoothly.
>> Both tests were running exclusively on our four nodes cluster without 
>> contention. Likewise, it doesn't matter 
>> if I enable CUDA support or not.  The followings are the configuration of 
>> our server:
>> 
>> We have four nodes in this test, each with one K40 GPU and connected with 
>> mellanox IB.
>> 
>> Please find attached config details and some sample code…
>> 
>> Kindest Regards,
>> —
>> Steven Eliuk, Ph.D. Comp Sci,
>> Advanced Software Platforms Lab,
>> SRA - SV,
>> Samsung Electronics,
>> 1732 North First Street,
>> San Jose, CA 95112,
>> Work: +1 408-652-1976,
>> Work: +1 408-544-5781 Wednesdays,
>> Cell: +1 408-819-4407.
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25662.php 
>> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25695.php



Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Michael.Rachner
Dear Mr. Squyres,

Thank you for your clear answer on the state of the interfaces in the mpi 
modules of OPENMPI.  A good state!
And I have coded sufficiently bugs myself, so I do not become too angry about 
the bugs of others.
If I should stumble upon missing Ftn-bindings in the future, I will send you a 
hint.

Greetings to you all!
 Michael Rachner


-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Donnerstag, 6. November 2014 15:10
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

On Nov 6, 2014, at 8:55 AM,   
wrote:

> I agree fully with omitting the explicit interfaces from  mpif.h   . It is an 
> important  resort for legacy codes.
> But, in the mpi and mpi_f08 module  explicit interfaces are required for  
> all(!)  MPI-routines.
> So far, this is not fulfilled in MPI-versions I know. 

Bugs happen.

I think you're saying that we don't intend to have all the routines in the mpi 
and mpi_f08 modules.  That's not correct.  We *do* have all explicit MPI 
interface in the mpi and mpi_f08 modules.  If some are missing -- like 
WIN_ALLOCATE was just discovered to be missing in the 1.8.3 release -- those 
are bugs.  We try really hard to avoid bugs, but sometimes they happen.  :-(

Are you aware of other routines that are missing from the OMPI mpi / mpi_f08 
modules?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25700.php


Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-06 Thread Saliya Ekanayake
Hi Jeff,

I've attached a tar file with information.

Thank you,
Saliya

On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) 
wrote:

> Looks like it's failing in the openib BTL setup.
>
> Can you send the info listed here?
>
> http://www.open-mpi.org/community/help/
>
>
>
> On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake  wrote:
>
> > Hi,
> >
> > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It
> builds fine, but when I try to run even the simplest hello.c program it'll
> cause a segfault. Any suggestions on how to correct this?
> >
> > The steps I did and error message are below.
> >
> > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached.
> > 2. cd to examples directory and mpicc hello_c.c
> > 3. mpirun -np 2 ./a.out
> > 4. Error text is attached.
> >
> > Please let me know if you need more info.
> >
> > Thank you,
> > Saliya
> >
> >
> > --
> > Saliya Ekanayake esal...@gmail.com
> > Cell 812-391-4914 Home 812-961-6383
> > http://saliya.org
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25668.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25672.php
>



-- 
Saliya Ekanayake esal...@gmail.com
Cell 812-391-4914 Home 812-961-6383
http://saliya.org


ompi-segfault.tar.bz2
Description: BZip2 compressed data


Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Jeff Squyres (jsquyres)
On Nov 6, 2014, at 8:55 AM,   
wrote:

> I agree fully with omitting the explicit interfaces from  mpif.h   . It is an 
> important  resort for legacy codes.
> But, in the mpi and mpi_f08 module  explicit interfaces are required for  
> all(!)  MPI-routines.
> So far, this is not fulfilled in MPI-versions I know. 

Bugs happen.

I think you're saying that we don't intend to have all the routines in the mpi 
and mpi_f08 modules.  That's not correct.  We *do* have all explicit MPI 
interface in the mpi and mpi_f08 modules.  If some are missing -- like 
WIN_ALLOCATE was just discovered to be missing in the 1.8.3 release -- those 
are bugs.  We try really hard to avoid bugs, but sometimes they happen.  :-(

Are you aware of other routines that are missing from the OMPI mpi / mpi_f08 
modules?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Michael.Rachner
Dear Mr. Squyres,

I agree fully with omitting the explicit interfaces from  mpif.h   . It is an 
important  resort for legacy codes.
But, in the mpi and mpi_f08 module  explicit interfaces are required for  
all(!)  MPI-routines.
So far, this is not fulfilled in MPI-versions I know. 
I want to point out here, that this has a negative consequence for the 
Ftn-coding:
  'If someone uses the mpi (or mpi_f08) module, then he cannot put the name of 
an MPI-routine in the "only"-list of the mpi module'.

I explain that now: 
   The following stmt is an example of a desirable stmt, because the programmer 
sees at a glance, which quantities are used from this module in his subroutine,
   and this stmt limits the quantities in the mpi module only to those actually 
needed in the subroutine:

  use MPI, only:   MPI_COMM_WORLD, MPI_IN_PLACE, MPI_REDUCE
   However this stmt will work only, if the explicit interface for MPI_REDUCE 
is actually present in the mpi module.
   Unfortunately the explicit interfaces are not complete in the MPI 
distributions I know,
   so the programmer must use instead:   a) use MPI, only:  MPI_COMM_WORLD, 
MPI_IN_PLACE
   
This has the drawback, that always the implicit interface for MPI_REDUCE will 
be used, 
   
i.e. there is no control of the parameter list by the explicit interface, 
   
even if there exists an explicit interface in the mpi module
or :   b)   
use MPI
   
Here the explicit interface will be used if present in the module, otherwise 
the implicit interface will be used,
  
this is o.k., but the drawback is now, that the whole MPI world is (silently) 
present in the subroutine, 
  
and the programmer cannot see at a glance, what quantities are really used from 
the module in the sbr.
  
Greetings 
  Michael Rachner


-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Donnerstag, 6. November 2014 12:42
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

On Nov 6, 2014, at 5:37 AM,   
wrote:

> a) When looking in your  mpi_sizeof_mpifh.f90  test program I found a little 
> thing:  You may (but need not) change the name of the integer variable  size
>to e.g.   isize  , because   size   is just an intrinsic function in 
> Fortran (you may see it already, if you have an editor with 
> Fortran-highlighting).
>   Although your type declaration overrides the intrinsic function, a renaming 
> would make the coding unambiguous. 

Good catch.  I'll do that.

> b)  My idea was, that OPENMPI should provide always an declaration 
> (interface) for each MPI-routine
>(and that's what the MPI-3.0 Standard document (Sept.21, 2012) prescribes 
> (p. 599+601+603)),

Note that MPI-3 p603 says (about mpif.h):

"For each MPI routine, an implementation can choose to use an implicit or 
explicit interface..."

I.e., it is *not* mandated that MPI implementations have explicit interfaces 
for mpif.h (although, obviously, it *is* mandated for the mpi and mpi_f08 
modules).

There are several reasons why MPI implementations have not added explicit 
interfaces to their mpif.h files, mostly boiling down to: they may/will break 
real world MPI programs.

1. All modern compilers have ignore-TKR syntax, so it's at least not a problem 
for subroutines like MPI_SEND (with a choice buffer).  However: a) this was not 
true at the time when MPI-3 was written, and b) it's not standard fortran.

2. There are (very) likely real-world programs out there that aren't quite 
right (i.e., would fail to compile with explicit interfaces), but still work 
fine.  On the one hand, it's terrible that we implementers continue to allow 
people to run "incorrect" programs.  But on the other hand, users *have* very 
simple option to run their codes through explicit interfaces (the mpi module), 
and can do so if they choose to.  Hence, the MPI Forum has decided that 
backwards compatibility is important enough for legacy codes -- some of which 
are tens of thousands of lines long (and more!), and there are no maintainers 
for them any more (!) -- to allow the "good enough" to keep going.

3. But #1 and #2 are mostly trumped by: the goal is to deprecate mpif.h, anyway 
(perhaps in MPI-4?) -- so why bother spending any 

Re: [OMPI users] ipath_userinit errors

2014-11-06 Thread Michael Di Domenico
Andrew,

Thanks.  We're using the RHEL version because it was less complicated
for our environment in the past, but sounds like we might want to
reconsider that decision.

Do you know why we don't see the message with lower node count
allocations?  It only seems to happen when the node count gets over a
certain point?

thanks

On Wed, Nov 5, 2014 at 5:51 PM, Friedley, Andrew
 wrote:
> Hi Michael,
>
> From what I understand, this is an issue with the qib driver and PSM from 
> RHEL 6.5 and 6.6, and will be fixed for 6.7.  There is no functional change 
> between qib->PSM API versions 11 and 12, so the message is harmless.  I 
> presume you're using the RHEL sourced package for a reason, but using an IFS 
> release would fix the problem until RHEL 6.7 is ready.
>
> Andrew
>
>> -Original Message-
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di
>> Domenico
>> Sent: Tuesday, November 4, 2014 8:35 AM
>> To: Open MPI Users
>> Subject: [OMPI users] ipath_userinit errors
>>
>> I'm getting the below message on my cluster(s).  It seems to only happen
>> when I try to use more then 64 nodes (16-cores each).  The clusters are
>> running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM.
>> I'm using the OFED versions included with RHEL for infiniband support.
>>
>> ipath_userinit: Mismatched user minor version (12) and driver minor version
>> (11) while context sharing. Ensure that driver and library are from the same
>> release
>>
>> I already realize this is a warning message and the jobs complete.
>> Another user a little over a year ago had a similar issue that was tracked to
>> mismatched ofed versions.  Since i have a diskless cluster all my nodes are
>> identical.
>>
>> I'm not adverse to thinking there might not be something unique about my
>> machine, but since i have two separate machines doing it, I'm not really sure
>> where to look to triage the issue and see what might be set incorrectly.
>>
>> Any thoughts on where to start checking would be helpful, thanks...
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: http://www.open-
>> mpi.org/community/lists/users/2014/11/25667.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25694.php


Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Jeff Squyres (jsquyres)
On Nov 6, 2014, at 5:37 AM,   
wrote:

> a) When looking in your  mpi_sizeof_mpifh.f90  test program I found a little 
> thing:  You may (but need not) change the name of the integer variable  size
>to e.g.   isize  , because   size   is just an intrinsic function in 
> Fortran (you may see it already, if you have an editor with 
> Fortran-highlighting).
>   Although your type declaration overrides the intrinsic function, a renaming 
> would make the coding unambiguous. 

Good catch.  I'll do that.

> b)  My idea was, that OPENMPI should provide always an declaration 
> (interface) for each MPI-routine
>(and that's what the MPI-3.0 Standard document (Sept.21, 2012) prescribes 
> (p. 599+601+603)),

Note that MPI-3 p603 says (about mpif.h):

"For each MPI routine, an implementation can choose to use an implicit or 
explicit interface..."

I.e., it is *not* mandated that MPI implementations have explicit interfaces 
for mpif.h (although, obviously, it *is* mandated for the mpi and mpi_f08 
modules).

There are several reasons why MPI implementations have not added explicit 
interfaces to their mpif.h files, mostly boiling down to: they may/will break 
real world MPI programs.

1. All modern compilers have ignore-TKR syntax, so it's at least not a problem 
for subroutines like MPI_SEND (with a choice buffer).  However: a) this was not 
true at the time when MPI-3 was written, and b) it's not standard fortran.

2. There are (very) likely real-world programs out there that aren't quite 
right (i.e., would fail to compile with explicit interfaces), but still work 
fine.  On the one hand, it's terrible that we implementers continue to allow 
people to run "incorrect" programs.  But on the other hand, users *have* very 
simple option to run their codes through explicit interfaces (the mpi module), 
and can do so if they choose to.  Hence, the MPI Forum has decided that 
backwards compatibility is important enough for legacy codes -- some of which 
are tens of thousands of lines long (and more!), and there are no maintainers 
for them any more (!) -- to allow the "good enough" to keep going.

3. But #1 and #2 are mostly trumped by: the goal is to deprecate mpif.h, anyway 
(perhaps in MPI-4?) -- so why bother spending any more time on it than we have 
to?  Ultimately, we'd like to get rid of the mpi module too (maybe in MPI-5?) 
-- the mpi_f08 module is the True Way Forward.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Michael.Rachner
Dear Mr. Squyres,

a) When looking in your  mpi_sizeof_mpifh.f90  test program I found a little 
thing:  You may (but need not) change the name of the integer variable  size
to e.g.   isize  , because   size   is just an intrinsic function in 
Fortran (you may see it already, if you have an editor with 
Fortran-highlighting).
   Although your type declaration overrides the intrinsic function, a renaming 
would make the coding unambiguous. 

b)  My idea was, that OPENMPI should provide always an declaration (interface) 
for each MPI-routine
(and that's what the MPI-3.0 Standard document (Sept.21, 2012) prescribes 
(p. 599+601+603)),
 independent whether you have already a test program in your suite for an 
MPI-routine or not.
 Because:  If all the interfaces are present, you a priory will avoid 
"2-step" User messages: 
   first the User complains about a missing MPI-routine, 
and when the MPI-routine is made available, possibly later about a bug in that 
MPI-routine.
   So bugs in MPI-routines will be detected and removed 
faster in the course of the OPENMPI development. Good for all.

Greetings 
 Michael Rachner 





-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Mittwoch, 5. November 2014 16:48
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Meh.  I forgot to attach the test.  :-)

Here it is.

On Nov 5, 2014, at 10:46 AM, Jeff Squyres (jsquyres)  wrote:

> On Nov 5, 2014, at 9:59 AM,   
> wrote:
> 
>> In my   sharedmemtest.f90  coding   just sent to you,
>> I have added a call of MPI_SIZEOF (at present it is deactivated, because of 
>> the missing Ftn-binding in OPENMPI-1.8.3).
> 
> FWIW, I attached one of the tests I put in our test suite for SIZEOF issues 
> after the last bug was found.  I have that same test replicated essentially 
> three times:
> 
> - once for mpif.h
> - once for "use mpi"
> - ones for "use mpi_f08"
> 
>> I suggest, that you may activate the 2 respective statements in the 
>> coding , and use yourself the program for testing whether MPI_SIZEOF works 
>> now in the upcoming 1.8.4-version.
>> For me, the installation of a tarball version is not so easy to do as 
>> for you, and the problem with the missing Ftn-bindings is not limited to a 
>> special machine.
> 
> Right; it was a larger problem.
> 
>> Can you tell me, from which OPENMPI-version on  the bug will be removed?
> 
> 1.8.4 will contain the fix.
> 
>> To generalize the problem with the Ftn-bindings:
>>  I think OPENMPI-development should go the whole hog,  and check, 
>> whether for all MPI-routines the Ftn-bindings exist.
>> This not so much a complicated task, but a somewhat time-consuming task.
>> But otherwise, over a long time more or less angry Users will write emails 
>> on missing FTN-bindings and grumble on "that buggy OPENMPI".
>> And you will have to write the answers on and on... .
>> This will finally take more time for developers and users then doing that 
>> work now once-for-all.
> 
> We do have a bunch of fortran tests, but I admit that our coverage is 
> not complete.  SIZEOF was not tested at all, for example, until 
> recently.  :-(
> 
> SIZEOF is also a bit of a special case in the MPI API because it *must* be 
> polymorphic (I don't think any other MPI API is) -- even for mpif.h.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25689.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/