Re: [OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread Ralph Castain

On Jul 30, 2014, at 5:49 PM, George Bosilca  wrote:

> 
> On Jul 30, 2014, at 20:37 , Ralph Castain  wrote:
> 
>> 
>> On Jul 30, 2014, at 5:25 PM, George Bosilca  wrote:
>> 
>>> 
>>> On Jul 30, 2014, at 18:00 , Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
 WHAT: Should we make the job size (i.e., initial number of procs) 
 available in OPAL?
 
 WHY: At least 2 BTLs are using this info (*more below)
 
 WHERE: usnic and ugni
 
 TIMEOUT: there's already been some inflammatory emails about this; let's 
 discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
 
 MORE DETAIL:
 
 This is an open question.  We *have* the information at the time that the 
 BTLs are initialized: do we allow that information to go down to OPAL?
 
 Ralph added this info down in OPAL in r32355, but George reverted it in 
 r32361.
 
 Points for: YES, WE SHOULD
 +++ 2 BTLs were using it (usinc, ugni)
 +++ Other RTE job-related info are already in OPAL (num local ranks, local 
 rank)
 
 Points for: NO, WE SHOULD NOT
 --- What exactly is this number (e.g., num currently-connected procs?), 
 and when is it updated?
 --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>>> --- Using this information to configure the communication environment 
>>> limits the scope of communication substrate to a static application (in 
>>> number of participants). Under this assumption, one can simply wait until 
>>> the first add_proc to compute the number of processes, solution as 
>>> “correct” as the current one.
>> 
>> Not necessarily - it depends on how it is used, and how it is communicated. 
>> Some of us have explored other options for using this that aren’t static, 
>> but where the info is of use.
> 
> This is a little bit too much hand waving to be constructive. Some other 
> folks in the field have developed many communications libraries, and none of 
> them needed a random number of potential processes to initialize themselves 
> correctly.

That's fine - everyone innovates and does something new. I'm not about to 
divulge proprietary, competitive info to you in advance just to justify our 
needs. I'll only note that notification of change isn't the sole jurisdiction 
of the FT group, and some of us have other uses for it.


> 
>>> The other “global” information that were made available in OPAL 
>>> (num_local_peers and my_local_rank) are only used by local BTL (SM, SMCUDA 
>>> and VADER). Moreover, my_local_rank is only used to decide who initialize 
>>> the backend file, thing that can easily be done using an atomic operation. 
>>> The number of local processes is used to prevent SM from activating itself 
>>> if we don’t have at least 2 processes per node. So, their usage is 
>>> minimally invasive, and can eventually be phased out with a little effort.
>> 
>> FWIW: the new PMI abstraction is in OPAL because it is RTE-agnostic. So all 
>> the info being discussed will actually be captured originally in the OPAL 
>> layer,  and stored in the OPAL dstore framework. In the current code, the 
>> RTE grabs the data and exposes it to the OMPI layer, which then pushes it 
>> back down to the OPAL proc.h struct.
>> 
>>  since anyone can freely query the info from opal/pmix or 
>> opal/dstore, it is really irrelevant in some ways. The info is there, in the 
>> OPAL layer, prior to BTL's being initialized. If you don't want it in a 
>> global storage, people can just get it from the appropriate OPAL API.
>> 
>> So what are we actually debating here? Global storage vs API call?
> 
> Our goals in this project are clearly orthogonal. I put a lot of effort into 
> this move because I need to use the BTLs without PMI, without RTE.

And you are certainly free to do so. Nobody is putting a gun to your head and 
demanding that your BTLs use it

> In fact the question boils down to: Do you want to be able to use the BTL to 
> bootstrap the RTE or not? If yes, then the number of processes is out of the 
> picture, either as an API or as a global storage.

Yes, I do - and no, it isn't a black/white question. I can use the BTLs to 
bootstrap just fine, even when someone uses that info for an initial 
optimization. I can always notify them later when things change, and they can 
make adjustments if necessary.

Again, nobody is forcing you to use any of the data in the opal dstore. It is 
just there if someone *wants* to use it. I fail to understand why you want to 
tell everyone else what they can do in their BTL. If you don't like how they 
wrote it, you are always free to write your own version of it. Nobody will stop 
you.

So what is the issue here?


> 
>   George.
> 
> 
>> 
>>> 
>>>  George.
>>> 
>>> 
 FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
 down to OPAL:
 
 - usnic: for a minor latency optimization / sizing 

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
On Wed, Jul 30, 2014 at 6:15 PM,  wrote:
[...]

> Strange thing is that openmpi-1.8 with PGI14.7 works fine.
> What's the difference with openmpi-1.8 and openmpi-1.8.2rc2?
>
[...]

Tetsuya,

Now that I can reproduce the problem you have reported, I am building 1.8.1
with PGI14.4.
Then I may be able to answer the question about what is different.

-Paul




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
Jeff,

I can now reproduce Tetsuya's original problem, using a build of 1.8.2rc2
with PGI 14.4.

$ INST/bin/mpifort  ../test.f
/scratch/scratchdirs/hargrove/pgf90pdegT3bhBmEq.o: In function `.C1_283':
test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
/usr/bin/ld: link errors found, deleting executable `a.out'

And here is the showme:

$ INST/bin/mpifort  ../test.f --showme
pgf90 ../test.f
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/include
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib
-Wl,-rpath
-Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib
-L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib
-lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi


It may be relevant to note that the 4 undefined references each name a
module.
There does not appear to be any definition of these in any library:

$ for x in INST/lib/*.{a,so}; do nm $x; done | grep -i mpi_f08_sizeof
 U mpi_f08_sizeof_

That undefined reference is in libmpi_usepmif90.so along with the other
three in the linker error.


I am essentially illiterate with respect to any feature added to fortran
after F77.
So, I am happy to run tests but have no suggestions as to a resolution.

-Paul

On Wed, Jul 30, 2014 at 5:24 PM, Jeff Squyres (jsquyres)  wrote:

> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
> `mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
>
> Just to go back to the original post here: can you send the results of
>
>   mpifort test.f -o test.ex --showme
>
> I'd like to see what fortran libraries are being linked in.  Here's what I
> get when I compile OMPI with the Intel suite:
>
> -
> $ mpifort hello_usempif08.f90 -o hello --showme
> ifort hello_usempif08.f90 -o hello -I/home/jsquyres/bogus/include
> -I/home/jsquyres/bogus/lib -Wl,-rpath -Wl,/home/jsquyres/bogus/lib
> -Wl,--enable-new-dtags -L/home/jsquyres/bogus/lib -lmpi_usempif08
> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
> 
>
> I note that with the Intel compiler, the Fortran module files are created
> in the lib directory (i.e., $prefix/lib), which is -L'ed on the link line.
>  Does the PGI compiler require something different?  Does the PGI 14
> compiler make an additional library for modules that we need to link in?
>
> We didn't use CONTAINS, and it supposedly works fine with the mpi module
> (right, guys?), so I'm not sure would the same scheme wouldn't work for the
> mpi_f08 module...?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15377.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Paul and Jeff,

I additionally installed PGI14.4 and check the behavior.
Then, I confirmed that both versions create same results.

PGI14.7:
[mishima@manage work]$ mpif90 test.f -o test.ex --showme
pgfortran test.f -o test.ex
-I/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.7/include
-I/home/mishima/opt/mpi/openmpi-1.8
.2rc2-pgi14.7/lib -Wl,-rpath
-Wl,/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.7/lib
-L/home/mishima/opt/mpi/openmpi-1.8.
2rc2-pgi14.7/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
[mishima@manage work]$ mpif90 test.f -o test.ex
/tmp/pgfortranD-vdxk_lnPL3.o: In function `.C1_283':
test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'

PGI14.4:
[mishima@manage work]$ mpif90 test.f -o test.ex --showme
pgfortran test.f -o test.ex
-I/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.4/include
-I/home/mishima/opt/mpi/openmpi-1.8
.2rc2-pgi14.4/lib -Wl,-rpath
-Wl,/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.4/lib
-L/home/mishima/opt/mpi/openmpi-1.8.
2rc2-pgi14.4/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
[mishima@manage work]$ mpif90 test.f -o test.ex
/tmp/pgfortranm9sdKiZYkrMy.o: In function `.C1_283':
test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'

As I reported before, mpi_f08*.mod is created in $prefix/lib.

[mishima@manage openmpi-1.8.2rc2-pgi14.7]$ ll lib/mpi_f08*
-rwxr-xr-x 1 mishima mishima327 Jul 30 12:27 lib/mpi_f08_ext.mod
-rwxr-xr-x 1 mishima mishima  11716 Jul 30 12:27
lib/mpi_f08_interfaces_callbacks.mod
-rwxr-xr-x 1 mishima mishima 374813 Jul 30 12:27 lib/mpi_f08_interfaces.mod
-rwxr-xr-x 1 mishima mishima 715615 Jul 30 12:27 lib/mpi_f08.mod
-rwxr-xr-x 1 mishima mishima  14730 Jul 30 12:27 lib/mpi_f08_sizeof.mod
-rwxr-xr-x 1 mishima mishima  77141 Jul 30 12:27 lib/mpi_f08_types.mod


Strange thing is that openmpi-1.8 with PGI14.7 works fine.
What's the difference with openmpi-1.8 and openmpi-1.8.2rc2?

[mishima@manage work]$ mpif90 test.f -o test.ex --showme
pgfortran test.f -o test.ex
-I/home/mishima/opt/mpi/openmpi-1.8-pgi14.7/include
-I/home/mishima/opt/mpi/openmpi-1.8-pgi1
4.7/lib -Wl,-rpath -Wl,/home/mishima/opt/mpi/openmpi-1.8-pgi14.7/lib
-L/home/mishima/opt/mpi/openmpi-1.8-pgi14.7/lib -lm
pi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
[mishima@manage work]$ mpif90 test.f -o test.ex
[mishima@manage work]$

Tetsuya

> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
>
> Just to go back to the original post here: can you send the results of
>
> mpifort test.f -o test.ex --showme
>
> I'd like to see what fortran libraries are being linked in.  Here's what
I get when I compile OMPI with the Intel suite:
>
> -
> $ mpifort hello_usempif08.f90 -o hello --showme
> ifort hello_usempif08.f90 -o hello -I/home/jsquyres/bogus/include
-I/home/jsquyres/bogus/lib -Wl,-rpath -Wl,/home/jsquyres/bogus/lib
-Wl,--enable-new-dtags -L/home/jsquyres/bogus/lib -lmpi_usempif08
> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
> 
>
> I note that with the Intel compiler, the Fortran module files are created
in the lib directory (i.e., $prefix/lib), which is -L'ed on the link line.
Does the PGI compiler require something
> different?  Does the PGI 14 compiler make an additional library for
modules that we need to link in?
>
> We didn't use CONTAINS, and it supposedly works fine with the mpi module
(right, guys?), so I'm not sure would the same scheme wouldn't work for the
mpi_f08 module...?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15377.php



Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-30 Thread George Bosilca
I can also picture an environment where different projects can supply
component that would technically belong to a framework from another
project. Let me take an example. Imagine we decide to keep the RML-based
connection setup for SM, thing that is not currently possible in the OPAL
layer. In this case the default OPAL build will only propose generic
connection capabilities, such as connection method using an atomic file
opening operation. However, the OMPI layer could provide a connector
components, that will expose the same interface as the OPAL connectors, but
will have access to the RML communications via the selected RTE. Today,
because the project name is not in the naming scheme such an approach is
possible...

  George.


  George.



On Wed, Jul 30, 2014 at 5:40 PM, Ralph Castain  wrote:

> We've run into the same problem with frameworks in different projects
> having overlapping names, let alone symbols. So if you have an easy
> solution, please go for it. What we need is for not only the symbols, but
> the mca libs to contain the project names so they don't overlap each other.
>
>
> On Jul 30, 2014, at 2:34 PM, Dave Goodell (dgoodell) 
> wrote:
>
> > Jeff and I were talking about some namespacing issues that have come up
> in the recent BTL move from OMPI to OPAL.  AFAIK, the current system for
> namespacing external symbols is to name them
> "mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp
> BTL).  Similarly, the DSO for the component is named
> "mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").
> >
> > Jeff asserted that the eventual goal is to move to a system where all
> MCA frameworks/components are also prefixed by the project name.  So the
> above examples become "mca_ompi_btl_tcp_add_procs" and
> "mca_ompi_btl_tcp.so".  Does anyone actually care about pursuing this goal?
> >
> > I ask because if nobody wants to pursue the goal of adding project names
> to namespaces then I already have an easy solution to most of our
> namespacing problems.  OTOH, if someone does wish to pursue that goal, then
> I have a namespace-related RFC that I would like to propose (in a
> subsequent email).
> >
> > -Dave
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15371.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15372.php
>


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread Ralph Castain

On Jul 30, 2014, at 5:25 PM, George Bosilca  wrote:

> 
> On Jul 30, 2014, at 18:00 , Jeff Squyres (jsquyres)  
> wrote:
> 
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; let's 
>> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni)
>> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
>> rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> --- Using this information to configure the communication environment limits 
> the scope of communication substrate to a static application (in number of 
> participants). Under this assumption, one can simply wait until the first 
> add_proc to compute the number of processes, solution as “correct” as the 
> current one.

Not necessarily - it depends on how it is used, and how it is communicated. 
Some of us have explored other options for using this that aren't static, but 
where the info is of use.

> 
>> 
> The other “global” information that were made available in OPAL 
> (num_local_peers and my_local_rank) are only used by local BTL (SM, SMCUDA 
> and VADER). Moreover, my_local_rank is only used to decide who initialize the 
> backend file, thing that can easily be done using an atomic operation. The 
> number of local processes is used to prevent SM from activating itself if we 
> don’t have at least 2 processes per node. So, their usage is minimally 
> invasive, and can eventually be phased out with a little effort.

FWIW: the new PMI abstraction is in OPAL because it is RTE-agnostic. So all the 
info being discussed will actually be captured originally in the OPAL layer,  
and stored in the OPAL dstore framework. In the current code, the RTE grabs the 
data and exposes it to the OMPI layer, which then pushes it back down to the 
OPAL proc.h struct.

 since anyone can freely query the info from opal/pmix or opal/dstore, 
it is really irrelevant in some ways. The info is there, in the OPAL layer, 
prior to BTL's being initialized. If you don't want it in a global storage, 
people can just get it from the appropriate OPAL API.

So what are we actually debating here? Global storage vs API call?

> 
>  George.
> 
> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for send/recv 
>> communication
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15378.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Jeff Squyres (jsquyres)
On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:

> [mishima@manage work]$ mpif90 test.f -o test.ex
> /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
> test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'

Just to go back to the original post here: can you send the results of

  mpifort test.f -o test.ex --showme

I'd like to see what fortran libraries are being linked in.  Here's what I get 
when I compile OMPI with the Intel suite:

-
$ mpifort hello_usempif08.f90 -o hello --showme
ifort hello_usempif08.f90 -o hello -I/home/jsquyres/bogus/include 
-I/home/jsquyres/bogus/lib -Wl,-rpath -Wl,/home/jsquyres/bogus/lib 
-Wl,--enable-new-dtags -L/home/jsquyres/bogus/lib -lmpi_usempif08 
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi


I note that with the Intel compiler, the Fortran module files are created in 
the lib directory (i.e., $prefix/lib), which is -L'ed on the link line.  Does 
the PGI compiler require something different?  Does the PGI 14 compiler make an 
additional library for modules that we need to link in?

We didn't use CONTAINS, and it supposedly works fine with the mpi module 
(right, guys?), so I'm not sure would the same scheme wouldn't work for the 
mpi_f08 module...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
Tetsuya,

I found that the behavior of pgf90 changed somewhere between versions 13.6
and 14.1.
My previous reports were mostly based on my testing of 13.6.
So, I have probably been seeing an issue entirely different than yours.

I am testing 14.4 now and hope to be able to reproduce the problem you
reported.

-Paul


On Wed, Jul 30, 2014 at 12:14 AM,  wrote:

> Hi Paul, thank you for your comment.
>
> I don't think my mpi_f08.mod is older one, because the time stamp is
> equal to the time when I rebuilt them today.
>
> [mishima@manage openmpi-1.8.2rc2-pgi14.7]$ ll lib/mpi*
> -rwxr-xr-x 1 mishima mishima315 Jul 30 12:27 lib/mpi_ext.mod
> -rwxr-xr-x 1 mishima mishima327 Jul 30 12:27 lib/mpi_f08_ext.mod
> -rwxr-xr-x 1 mishima mishima  11716 Jul 30 12:27
> lib/mpi_f08_interfaces_callbacks.mod
> -rwxr-xr-x 1 mishima mishima 374813 Jul 30 12:27 lib/mpi_f08_interfaces.mod
> -rwxr-xr-x 1 mishima mishima 715615 Jul 30 12:27 lib/mpi_f08.mod
> -rwxr-xr-x 1 mishima mishima  14730 Jul 30 12:27 lib/mpi_f08_sizeof.mod
> -rwxr-xr-x 1 mishima mishima  77141 Jul 30 12:27 lib/mpi_f08_types.mod
> -rwxr-xr-x 1 mishima mishima 878339 Jul 30 12:27 lib/mpi.mod
>
> Regards,
> Tetsuya
>




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
Jeff,

I am not "screaming" for a return of support for the PGI compilers.
I will also note that "use mpi" works fine; only the F2008 support is
lacking.

Rather than complain I am offering to help test any solution that might be
offered.
I will also note that Nathan and Howard both have accounts at NERSC that
allow then access to Hopper, the system I have used for testing (in
addition to whatever LANL has).

NEW INFO:

While the 13.6 version of pgf90 failed the PROCEEDURE test, I find that
14.1 and 14.4 both *pass* (at least when attempted manually)
So, the issues I've had are DIFFERENT from the originally reported issue.
That is consistent with the mpi_f08.mod file with the same timestamp as the
others.
So, I am investigating the ORIGINAL problem once again with 14.4.


-Paul



On Wed, Jul 30, 2014 at 3:30 PM, Jeff Squyres (jsquyres)  wrote:

> On Jul 30, 2014, at 12:36 AM, Paul Hargrove  wrote:
>
> > Unfortunately, this (and
> https://svn.open-mpi.org/trac/ompi/changeset/31588 that followed)
> represent a REGRESSION in that between 1.8.1 and 1.8.2rc2 Open MPI has lost
> support for F08 with the PGI compilers.
>
> Yes, and the answer is for PGI to support more of the F2003 standard.
>  Then there might be a hope for supporting the MPI F08 bindings.  :-)
>
> Glib answer aside...
>
> The fact of the matter is that Fortran compilers are a nightmare of what
> specific Fortran features they support.  As part of r31587 and r31588,
> there was a simplification made to the (already quite complex) F08 bindings
> in OMPI to only support Fortran compilers that support PROCEDURE.
>
> I don't think I realized that I would be cutting off PGI support with this
> change.
>
> That being said, unless someone really screams, I would greatly prefer not
> to put back in the "support compilers who do not support PROCEDURE" code
> because a) it creates the problem that we solved by taking that stuff out,
> b) it adds more complexity to the F08 bindings, and c) we'll have to solve
> the original problem a different way... and I don't know how to do that.
>  :-\
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15374.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Jeff Squyres (jsquyres)
On Jul 30, 2014, at 12:36 AM, Paul Hargrove  wrote:

> Unfortunately, this (and https://svn.open-mpi.org/trac/ompi/changeset/31588 
> that followed) represent a REGRESSION in that between 1.8.1 and 1.8.2rc2 Open 
> MPI has lost support for F08 with the PGI compilers.

Yes, and the answer is for PGI to support more of the F2003 standard.  Then 
there might be a hope for supporting the MPI F08 bindings.  :-)

Glib answer aside...

The fact of the matter is that Fortran compilers are a nightmare of what 
specific Fortran features they support.  As part of r31587 and r31588, there 
was a simplification made to the (already quite complex) F08 bindings in OMPI 
to only support Fortran compilers that support PROCEDURE.

I don't think I realized that I would be cutting off PGI support with this 
change.

That being said, unless someone really screams, I would greatly prefer not to 
put back in the "support compilers who do not support PROCEDURE" code because 
a) it creates the problem that we solved by taking that stuff out, b) it adds 
more complexity to the F08 bindings, and c) we'll have to solve the original 
problem a different way... and I don't know how to do that.  :-\

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread Jeff Squyres (jsquyres)
WHAT: Should we make the job size (i.e., initial number of procs) available in 
OPAL?

WHY: At least 2 BTLs are using this info (*more below)

WHERE: usnic and ugni

TIMEOUT: there's already been some inflammatory emails about this; let's 
discuss next Tuesday on the teleconf: Tue, 5 Aug 2014

MORE DETAIL:

This is an open question.  We *have* the information at the time that the BTLs 
are initialized: do we allow that information to go down to OPAL?

Ralph added this info down in OPAL in r32355, but George reverted it in r32361.

Points for: YES, WE SHOULD
+++ 2 BTLs were using it (usinc, ugni)
+++ Other RTE job-related info are already in OPAL (num local ranks, local rank)

Points for: NO, WE SHOULD NOT
--- What exactly is this number (e.g., num currently-connected procs?), and 
when is it updated?
--- We need to precisely delineate what belongs in OPAL vs. above-OPAL

FWIW: here's how ompi_process_info.num_procs was used before the BTL move down 
to OPAL:

- usnic: for a minor latency optimization / sizing of a shared receive buffer 
queue length, and for the initial size of a peer lookup hash
- ugni: to determine the size of the per-peer buffers used for send/recv 
communication

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-30 Thread Ralph Castain
We've run into the same problem with frameworks in different projects having 
overlapping names, let alone symbols. So if you have an easy solution, please 
go for it. What we need is for not only the symbols, but the mca libs to 
contain the project names so they don't overlap each other.


On Jul 30, 2014, at 2:34 PM, Dave Goodell (dgoodell)  wrote:

> Jeff and I were talking about some namespacing issues that have come up in 
> the recent BTL move from OMPI to OPAL.  AFAIK, the current system for 
> namespacing external symbols is to name them "mca_FRAMEWORK_COMPONENT_symbol" 
> (e.g., "mca_btl_tcp_add_procs" in the tcp BTL).  Similarly, the DSO for the 
> component is named "mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").
> 
> Jeff asserted that the eventual goal is to move to a system where all MCA 
> frameworks/components are also prefixed by the project name.  So the above 
> examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".  Does 
> anyone actually care about pursuing this goal?
> 
> I ask because if nobody wants to pursue the goal of adding project names to 
> namespaces then I already have an easy solution to most of our namespacing 
> problems.  OTOH, if someone does wish to pursue that goal, then I have a 
> namespace-related RFC that I would like to propose (in a subsequent email).
> 
> -Dave
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15371.php



[OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-30 Thread Dave Goodell (dgoodell)
Jeff and I were talking about some namespacing issues that have come up in the 
recent BTL move from OMPI to OPAL.  AFAIK, the current system for namespacing 
external symbols is to name them "mca_FRAMEWORK_COMPONENT_symbol" (e.g., 
"mca_btl_tcp_add_procs" in the tcp BTL).  Similarly, the DSO for the component 
is named "mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").

Jeff asserted that the eventual goal is to move to a system where all MCA 
frameworks/components are also prefixed by the project name.  So the above 
examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".  Does 
anyone actually care about pursuing this goal?

I ask because if nobody wants to pursue the goal of adding project names to 
namespaces then I already have an easy solution to most of our namespacing 
problems.  OTOH, if someone does wish to pursue that goal, then I have a 
namespace-related RFC that I would like to propose (in a subsequent email).

-Dave



Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-30 Thread Nathan Hjelm

That is what I would prefer. I was trying to not disturb things too
much :). Please bring the changes over!

-Nathan

On Wed, Jul 30, 2014 at 03:18:44PM -0400, George Bosilca wrote:
>Why do you want to add new versions? This will lead to having two, almost
>identical, sets of atomics that are conceptually equivalent but different
>in terms of code. And we will have to maintained both!
>I did a similar change in a fork of OPAL in another project but instead of
>adding another flavor of atomics, I completely replaced the available ones
>with a set returning the old value. I can bring the code over.
>  George.
> 
>On Tue, Jul 29, 2014 at 5:29 PM, Paul Hargrove  wrote:
> 
>  On Tue, Jul 29, 2014 at 2:10 PM, Nathan Hjelm  wrote:
> 
>Is there a reason why the
>current implementations of opal atomics (add, cmpset) do not return
>the
>old value?
> 
>  Because some CPUs don't implement such an atomic instruction?
> 
>  On any CPU one *can* certainly synthesize the desired operation with an
>  added read before the compare-and-swap to return a value that was
>  present at some time before a failed cmpset.  That is almost certainly
>  sufficient for your purposes.  However, the added load makes it
>  (marginally) more expensive on some CPUs that only have the native
>  equivalent of gcc's __sync_bool_compare_and_swap().
> 
>  -Paul
>  --
>  Paul H. Hargrove  phhargr...@lbl.gov
>  Future Technologies Group
>  Computer and Data Sciences Department Tel: +1-510-495-2352
>  Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post:
>  http://www.open-mpi.org/community/lists/devel/2014/07/15328.php

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15369.php



pgpUHSVoeg0RH.pgp
Description: PGP signature


Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-30 Thread George Bosilca
Why do you want to add new versions? This will lead to having two, almost
identical, sets of atomics that are conceptually equivalent but different
in terms of code. And we will have to maintained both!

I did a similar change in a fork of OPAL in another project but instead of
adding another flavor of atomics, I completely replaced the available ones
with a set returning the old value. I can bring the code over.

  George.



On Tue, Jul 29, 2014 at 5:29 PM, Paul Hargrove  wrote:

>
> On Tue, Jul 29, 2014 at 2:10 PM, Nathan Hjelm  wrote:
>
>> Is there a reason why the
>> current implementations of opal atomics (add, cmpset) do not return the
>> old value?
>>
>
> Because some CPUs don't implement such an atomic instruction?
>
> On any CPU one *can* certainly synthesize the desired operation with an
> added read before the compare-and-swap to return a value that was present
> at some time before a failed cmpset.  That is almost certainly sufficient
> for your purposes.  However, the added load makes it (marginally) more
> expensive on some CPUs that only have the native equivalent of gcc's
> __sync_bool_compare_and_swap().
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15328.php
>


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
The underlying structure changed, so a little bit of fiddling is normal.
Instead of using a field in the ompi_proc_t you are now using a field down
in opal_proc_t, a field that simply cannot have the same type as before
(orte_process_name_t).

  George.



On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain  wrote:

> George - my point was that we regularly tested using the method in that
> routine, and now we have to do something a little different. So it is an
> "issue" in that we have to make changes across the code base to ensure we
> do things the "new" way, that's all
>
> On Jul 30, 2014, at 9:17 AM, George Bosilca  wrote:
>
> No, this is not going to be an issue if the opal_identifier_t is used
> correctly (aka only via the exposed accessors).
>
>   George.
>
>
>
> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain  wrote:
>
>> Yeah, my fix won't work for big endian machines - this is going to be an
>> issue across the code base now, so we'll have to troll and fix it. I was
>> doing the minimal change required to fix the trunk in the meantime.
>>
>> On Jul 30, 2014, at 9:06 AM, George Bosilca  wrote:
>>
>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>> bits storage location used by the upper layer to save some local key that
>> can be later used to extract information. Calling the OPAL level compare
>> function might be a better fit there.
>>
>>   George.
>>
>>
>>
>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> Ralph,
>>>
>>> was it really that simple ?
>>>
>>> proc_temp->super.proc_name has type opal_process_name_t :
>>> typedef opal_identifier_t opal_process_name_t;
>>> typedef uint64_t opal_identifier_t;
>>>
>>> *but*
>>>
>>> item_ptr->peer has type orte_process_name_t :
>>> struct orte_process_name_t {
>>>orte_jobid_t jobid;
>>>orte_vpid_t vpid;
>>> };
>>>
>>> bottom line, is r32357 still valid on a big endian arch ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain 
>>> wrote:
>>>
 I just fixed this one - all that was required was an ampersand as the
 name was being passed into the function instead of a pointer to the name

 r32357

 On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
 gilles.gouaillar...@gmail.com> wrote:

 Rolf,

 r32353 can be seen as a suspect...
 Even if it is correct, it might have exposed the bug discussed in #4815
 even more (e.g. we hit the bug 100% after the fix)

 does the attached patch to #4815 fixes the problem ?

 If yes, and if you see this issue as a showstopper, feel free to commit
 it and drop a note to #4815
 ( I am afk until tomorrow)

 Cheers,

 Gilles

 Rolf vandeVaart  wrote:

 Just an FYI that my trunk version (r32355) does not work at all anymore
 if I do not include "--mca coll ^ml".Here is a stack trace from the
 ibm/pt2pt/send test running on a single node.



 (gdb) where

 #0  0x7f6c0d1321d0 in ?? ()

 #1  

 #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
 '\017', name1=0x192350001, name2=0xbaf76c) at 
 ../../orte/util/name_fns.c:522

 #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
 (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
 back_files=0x7f6bf3ffd6c8,

 comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
 "sm_payload_mem_", map_all=false) at
 ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

 #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
 (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
 reg_data=0xba28c0)

 at
 ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

 #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
 (ml_module=0xba5c40) at 
 ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

 #6  0x7f6c0cced68f in ml_module_memory_initialization
 (ml_module=0xba5c40) at 
 ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

 #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
 ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

 #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
 priority=0x7fffe7991b58) at
 ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

 #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
 comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)

 at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

 #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
 comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)

 at 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
George - my point was that we regularly tested using the method in that 
routine, and now we have to do something a little different. So it is an 
"issue" in that we have to make changes across the code base to ensure we do 
things the "new" way, that's all

On Jul 30, 2014, at 9:17 AM, George Bosilca  wrote:

> No, this is not going to be an issue if the opal_identifier_t is used 
> correctly (aka only via the exposed accessors).
> 
>   George.
> 
> 
> 
> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain  wrote:
> Yeah, my fix won't work for big endian machines - this is going to be an 
> issue across the code base now, so we'll have to troll and fix it. I was 
> doing the minimal change required to fix the trunk in the meantime.
> 
> On Jul 30, 2014, at 9:06 AM, George Bosilca  wrote:
> 
>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 bits 
>> storage location used by the upper layer to save some local key that can be 
>> later used to extract information. Calling the OPAL level compare function 
>> might be a better fit there.
>> 
>>   George.
>> 
>> 
>> 
>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>>  wrote:
>> Ralph,
>> 
>> was it really that simple ?
>> 
>> proc_temp->super.proc_name has type opal_process_name_t :
>> typedef opal_identifier_t opal_process_name_t;
>> typedef uint64_t opal_identifier_t;
>> 
>> *but*
>> 
>> item_ptr->peer has type orte_process_name_t :
>> struct orte_process_name_t {
>>orte_jobid_t jobid;
>>orte_vpid_t vpid;
>> };
>> 
>> bottom line, is r32357 still valid on a big endian arch ?
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> 
>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain  wrote:
>> I just fixed this one - all that was required was an ampersand as the name 
>> was being passed into the function instead of a pointer to the name
>> 
>> r32357
>> 
>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>>  wrote:
>> 
>>> Rolf,
>>> 
>>> r32353 can be seen as a suspect...
>>> Even if it is correct, it might have exposed the bug discussed in #4815 
>>> even more (e.g. we hit the bug 100% after the fix)
>>> 
>>> does the attached patch to #4815 fixes the problem ?
>>> 
>>> If yes, and if you see this issue as a showstopper, feel free to commit it 
>>> and drop a note to #4815
>>> ( I am afk until tomorrow)
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> Rolf vandeVaart  wrote:
>>> Just an FYI that my trunk version (r32355) does not work at all anymore if 
>>> I do not include "--mca coll ^ml".Here is a stack trace from the 
>>> ibm/pt2pt/send test running on a single node.
>>> 
>>>  
>>> 
>>> (gdb) where
>>> 
>>> #0  0x7f6c0d1321d0 in ?? ()
>>> 
>>> #1  
>>> 
>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>> 
>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>>> back_files=0x7f6bf3ffd6c8,
>>> 
>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>>> map_all=false) at 
>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>> 
>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>>> reg_data=0xba28c0)
>>> 
>>> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>> 
>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) 
>>> at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>> 
>>> #6  0x7f6c0cced68f in ml_module_memory_initialization 
>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>> 
>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>> 
>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>>> priority=0x7fffe7991b58) at 
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>> 
>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>> 
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>> 
>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>>> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>> 
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>> 
>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>> 
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>> 
>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>> 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
No, this is not going to be an issue if the opal_identifier_t is used
correctly (aka only via the exposed accessors).

  George.



On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain  wrote:

> Yeah, my fix won't work for big endian machines - this is going to be an
> issue across the code base now, so we'll have to troll and fix it. I was
> doing the minimal change required to fix the trunk in the meantime.
>
> On Jul 30, 2014, at 9:06 AM, George Bosilca  wrote:
>
> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
> bits storage location used by the upper layer to save some local key that
> can be later used to extract information. Calling the OPAL level compare
> function might be a better fit there.
>
>   George.
>
>
>
> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Ralph,
>>
>> was it really that simple ?
>>
>> proc_temp->super.proc_name has type opal_process_name_t :
>> typedef opal_identifier_t opal_process_name_t;
>> typedef uint64_t opal_identifier_t;
>>
>> *but*
>>
>> item_ptr->peer has type orte_process_name_t :
>> struct orte_process_name_t {
>>orte_jobid_t jobid;
>>orte_vpid_t vpid;
>> };
>>
>> bottom line, is r32357 still valid on a big endian arch ?
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain  wrote:
>>
>>> I just fixed this one - all that was required was an ampersand as the
>>> name was being passed into the function instead of a pointer to the name
>>>
>>> r32357
>>>
>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>> Rolf,
>>>
>>> r32353 can be seen as a suspect...
>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>> even more (e.g. we hit the bug 100% after the fix)
>>>
>>> does the attached patch to #4815 fixes the problem ?
>>>
>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>> it and drop a note to #4815
>>> ( I am afk until tomorrow)
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> Rolf vandeVaart  wrote:
>>>
>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>>> ibm/pt2pt/send test running on a single node.
>>>
>>>
>>>
>>> (gdb) where
>>>
>>> #0  0x7f6c0d1321d0 in ?? ()
>>>
>>> #1  
>>>
>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>>
>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>>> back_files=0x7f6bf3ffd6c8,
>>>
>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>> "sm_payload_mem_", map_all=false) at
>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>
>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>> reg_data=0xba28c0)
>>>
>>> at
>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>
>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>
>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>
>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>
>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>> priority=0x7fffe7991b58) at
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>
>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>
>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>
>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>
>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>
>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>
>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>> requested=0, provided=0x7fffe79922e8) at
>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>
>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>> argv=0x7fffe7992340) at pinit.c:84

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
Yeah, my fix won't work for big endian machines - this is going to be an issue 
across the code base now, so we'll have to troll and fix it. I was doing the 
minimal change required to fix the trunk in the meantime.

On Jul 30, 2014, at 9:06 AM, George Bosilca  wrote:

> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 bits 
> storage location used by the upper layer to save some local key that can be 
> later used to extract information. Calling the OPAL level compare function 
> might be a better fit there.
> 
>   George.
> 
> 
> 
> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>  wrote:
> Ralph,
> 
> was it really that simple ?
> 
> proc_temp->super.proc_name has type opal_process_name_t :
> typedef opal_identifier_t opal_process_name_t;
> typedef uint64_t opal_identifier_t;
> 
> *but*
> 
> item_ptr->peer has type orte_process_name_t :
> struct orte_process_name_t {
>orte_jobid_t jobid;
>orte_vpid_t vpid;
> };
> 
> bottom line, is r32357 still valid on a big endian arch ?
> 
> Cheers,
> 
> Gilles
> 
> 
> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain  wrote:
> I just fixed this one - all that was required was an ampersand as the name 
> was being passed into the function instead of a pointer to the name
> 
> r32357
> 
> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>  wrote:
> 
>> Rolf,
>> 
>> r32353 can be seen as a suspect...
>> Even if it is correct, it might have exposed the bug discussed in #4815 even 
>> more (e.g. we hit the bug 100% after the fix)
>> 
>> does the attached patch to #4815 fixes the problem ?
>> 
>> If yes, and if you see this issue as a showstopper, feel free to commit it 
>> and drop a note to #4815
>> ( I am afk until tomorrow)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> Rolf vandeVaart  wrote:
>> Just an FYI that my trunk version (r32355) does not work at all anymore if I 
>> do not include "--mca coll ^ml".Here is a stack trace from the 
>> ibm/pt2pt/send test running on a single node.
>> 
>>  
>> 
>> (gdb) where
>> 
>> #0  0x7f6c0d1321d0 in ?? ()
>> 
>> #1  
>> 
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>> 
>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>> back_files=0x7f6bf3ffd6c8,
>> 
>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>> map_all=false) at 
>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>> 
>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>> reg_data=0xba28c0)
>> 
>> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>> 
>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>> 
>> #6  0x7f6c0cced68f in ml_module_memory_initialization 
>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>> 
>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>> 
>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>> priority=0x7fffe7991b58) at 
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>> 
>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>> 
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>> 
>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>> 
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>> 
>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>> 
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>> 
>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>> 
>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>> 
>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>> requested=0, provided=0x7fffe79922e8) at 
>> ../../ompi/runtime/ompi_mpi_init.c:918
>> 
>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, 
>> argv=0x7fffe7992340) at pinit.c:84
>> 
>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>> 
>> (gdb) up
>> 
>> #1  
>> 
>> (gdb) up
>> 
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
Yes. opal_process_name_t has basically no meaning by itself, it is a 64
bits storage location used by the upper layer to save some local key that
can be later used to extract information. Calling the OPAL level compare
function might be a better fit there.

  George.



On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Ralph,
>
> was it really that simple ?
>
> proc_temp->super.proc_name has type opal_process_name_t :
> typedef opal_identifier_t opal_process_name_t;
> typedef uint64_t opal_identifier_t;
>
> *but*
>
> item_ptr->peer has type orte_process_name_t :
> struct orte_process_name_t {
>orte_jobid_t jobid;
>orte_vpid_t vpid;
> };
>
> bottom line, is r32357 still valid on a big endian arch ?
>
> Cheers,
>
> Gilles
>
>
> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain  wrote:
>
>> I just fixed this one - all that was required was an ampersand as the
>> name was being passed into the function instead of a pointer to the name
>>
>> r32357
>>
>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>> Rolf,
>>
>> r32353 can be seen as a suspect...
>> Even if it is correct, it might have exposed the bug discussed in #4815
>> even more (e.g. we hit the bug 100% after the fix)
>>
>> does the attached patch to #4815 fixes the problem ?
>>
>> If yes, and if you see this issue as a showstopper, feel free to commit
>> it and drop a note to #4815
>> ( I am afk until tomorrow)
>>
>> Cheers,
>>
>> Gilles
>>
>> Rolf vandeVaart  wrote:
>>
>> Just an FYI that my trunk version (r32355) does not work at all anymore
>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>> ibm/pt2pt/send test running on a single node.
>>
>>
>>
>> (gdb) where
>>
>> #0  0x7f6c0d1321d0 in ?? ()
>>
>> #1  
>>
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>
>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>> back_files=0x7f6bf3ffd6c8,
>>
>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>> "sm_payload_mem_", map_all=false) at
>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>
>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>> reg_data=0xba28c0)
>>
>> at
>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>
>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40)
>> at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>
>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>
>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>
>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>> priority=0x7fffe7991b58) at
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>
>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>
>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0,
>> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>
>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>
>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>
>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>
>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>> requested=0, provided=0x7fffe79922e8) at
>> ../../ompi/runtime/ompi_mpi_init.c:918
>>
>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>> argv=0x7fffe7992340) at pinit.c:84
>>
>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>>
>> (gdb) up
>>
>> #1  
>>
>> (gdb) up
>>
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>
>> 522   if (name1->jobid < name2->jobid) {
>>
>> (gdb) print name1
>>
>> $1 = (const orte_process_name_t *) 0x192350001
>>
>> (gdb) print *name1
>>
>> Cannot access memory at address 0x192350001
>>
>> (gdb) print name2
>>
>> $2 = (const orte_process_name_t *) 0xbaf76c
>>
>> (gdb) print *name2
>>
>> $3 = {jobid = 2452946945, vpid = 1}
>>
>> (gdb)
>>
>>
>>
>>
>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
Ralph,

was it really that simple ?

proc_temp->super.proc_name has type opal_process_name_t :
typedef opal_identifier_t opal_process_name_t;
typedef uint64_t opal_identifier_t;

*but*

item_ptr->peer has type orte_process_name_t :
struct orte_process_name_t {
   orte_jobid_t jobid;
   orte_vpid_t vpid;
};

bottom line, is r32357 still valid on a big endian arch ?

Cheers,

Gilles


On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain  wrote:

> I just fixed this one - all that was required was an ampersand as the name
> was being passed into the function instead of a pointer to the name
>
> r32357
>
> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
> gilles.gouaillar...@gmail.com> wrote:
>
> Rolf,
>
> r32353 can be seen as a suspect...
> Even if it is correct, it might have exposed the bug discussed in #4815
> even more (e.g. we hit the bug 100% after the fix)
>
> does the attached patch to #4815 fixes the problem ?
>
> If yes, and if you see this issue as a showstopper, feel free to commit it
> and drop a note to #4815
> ( I am afk until tomorrow)
>
> Cheers,
>
> Gilles
>
> Rolf vandeVaart  wrote:
>
> Just an FYI that my trunk version (r32355) does not work at all anymore if
> I do not include "--mca coll ^ml".Here is a stack trace from the
> ibm/pt2pt/send test running on a single node.
>
>
>
> (gdb) where
>
> #0  0x7f6c0d1321d0 in ?? ()
>
> #1  
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
> back_files=0x7f6bf3ffd6c8,
>
> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_",
> map_all=false) at
> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
> reg_data=0xba28c0)
>
> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40)
> at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
> #6  0x7f6c0cced68f in ml_module_memory_initialization
> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
> priority=0x7fffe7991b58) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0,
> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
> requested=0, provided=0x7fffe79922e8) at
> ../../ompi/runtime/ompi_mpi_init.c:918
>
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
> argv=0x7fffe7992340) at pinit.c:84
>
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
> (gdb) up
>
> #1  
>
> (gdb) up
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> 522   if (name1->jobid < name2->jobid) {
>
> (gdb) print name1
>
> $1 = (const orte_process_name_t *) 0x192350001
>
> (gdb) print *name1
>
> Cannot access memory at address 0x192350001
>
> (gdb) print name2
>
> $2 = (const orte_process_name_t *) 0xbaf76c
>
> (gdb) print *name2
>
> $3 = {jobid = 2452946945, vpid = 1}
>
> (gdb)
>
>
>
>
>
>
>
> >-Original Message-
>
> >From: devel [mailto:devel-boun...@open-mpi.org
> ] On Behalf Of Gilles
>
> >Gouaillardet
>
> >Sent: Wednesday, July 30, 2014 2:16 AM
>
> >To: Open MPI Developers
>
> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>
> >
>
> >George,
>
> >
>
> >#4815 is indirectly related to the move :
>
> >
>
> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>
> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>
> >(which causes 

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-30 Thread Nathan Hjelm

Yup, just noticed that. All component variables should be registered
with mca_base_component_var_register but the versions were registered
with the generic register function. The code in question is the oldest
part of the MCA rewrite so it probably was missed when the component
variable register function was added. Fixing now. 

-Nathan

On Thu, Jul 31, 2014 at 12:40:55AM +0900, KAWASHIMA Takahiro wrote:
> Nathan,
> 
> The diffrences seems to be the flags on registering.
> 
> Normal MCA variables shmem_sysv_priority etc. have flag
> MCA_BASE_VAR_FLAG_DWG so that they are deregistered through
> mca_base_var_group_deregister in mca_base_component_unload.
> 
> But shmem_sysv_major_version doesn't have the flag.
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > This is odd. The variable in question is registered by the MCA itself. I
> > will take a look and see if I can determine why it isn't being
> > deregistered correctly when the rest of the component's parameters are.
> > 
> > -Nathan
> > 
> > On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote:
> > > Nathan,
> > > 
> > > Thanks for your response.
> > > 
> > > Yes. My previous mail was the result of uncommented code.
> > > Now I also pulled latest varList source code which uncommented
> > > the section you mentioned, but the result was same.
> > > 
> > > If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX
> > > for variables for unloaded components, not returning
> > > MPI_T_ERR_INVALID_INDEX is the problem.
> > > 
> > > I run varList on GDB and found that MPI_T_cvar_get_info returns
> > > MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane).
> > > But it returns MPI_SUCCESS for shmem_sysv_major_version.
> > > The difference is mbv_flags values. mbv_flags is 0x44 for
> > > shmem_sysv_priority on MPI_T_cvar_get_info call so that
> > > mca_base_var_get function in opal/mca/base/mca_base_var.c
> > > returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for
> > > shmem_sysv_major_version so that mca_base_var_get function
> > > returns OPAL_SUCCESS.
> > > 
> > > Control variables for unloaded components are not deregistered
> > > completely?
> > > 
> > > I can track it more when I have time.
> > > 
> > > My environment:
> > >   OS: Debian GNU/Linux wheezy
> > >   CPU: x86_64
> > >   Run: mpiexec -n 1 varList
> > >   Open MPI source: trunk r32338 (almost latest)
> > >   Open MPI configure:
> > > enable_picky=yes
> > > enable_debug=yes
> > > enable_mem_debug=yes
> > > enable_mem_profile=yes
> > > enable_memchecker=no
> > > 
> > > enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras
> > > enable_contrib_no_build=vt
> > > enable_mpi_cxx=no
> > > enable_mpi_f77=no
> > > enable_mpi_f90=no
> > > enable_ipv6=no
> > > enable_mpi_io=no
> > > with_devel_headers=no
> > > with_wrapper_cflags=-g
> > > with_wrapper_cxxflags=-g
> > > with_wrapper_fflags=-g
> > > with_wrapper_fcflags=-g
> > > 
> > > Regards,
> > > KAWASHIMA Takahiro
> > > 
> > > > The problem is the code in question does not check the return code of
> > > > MPI_T_cvar_handle_alloc . We are returning an error and they still try
> > > > to use the handle (which is stale). Uncomment this section of the code:
> > > > 
> > > > 
> > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: 
> > > > This variable is not recognized by Mvapich. It is OpenMPI specific.
> > > > //  continue;
> > > > 
> > > > 
> > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> > > > must not have implemented it (and thus should not claim to be MPI 3.0).
> > > > 
> > > > -Nathan
> > > > 
> > > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> > > > > Hi,
> > > > > 
> > > > > I encountered the same SEGV reported on the users list when
> > > > > running varList program.
> > > > > 
> > > > >   http://www.open-mpi.org/community/lists/users/2014/07/24792.php
> > > > > 
> > > > > mpiexec -n 1 ./varList:
> > > > > 
> > > > > ... snip ...
> > > > > event U/D-2 CHAR   n/a
> > > > >   ALL
> > > > > event_base_verboseD/D-8 INTn/a
> > > > >   LOCAL0
> > > > > event_libevent2021_event_include  U/A-3 CHAR   n/a
> > > > >   LOCALpoll
> > > > > opal_event_includeU/A-3 CHAR   n/a
> > > > >   LOCALpoll
> > > > > event_libevent2021_major_version  D/A-9 INTn/a
> > > > >   UNKNOWN  1
> > > > > event_libevent2021_minor_version  D/A-9 INTn/a

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Rolf vandeVaart
Thanks Ralph and Gilles!  All is looking good for me again.  I think all tests 
are passing again.  Will check results again tomorrow.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 30, 2014 10:49 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

I just fixed this one - all that was required was an ampersand as the name was 
being passed into the function instead of a pointer to the name

r32357

On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
> wrote:


Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart > wrote:

Just an FYI that my trunk version (r32355) does not work at all anymore if I do 
not include "--mca coll ^ml".Here is a stack trace from the ibm/pt2pt/send 
test running on a single node.



(gdb) where

#0  0x7f6c0d1321d0 in ?? ()

#1  

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
back_files=0x7f6bf3ffd6c8,

comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
map_all=false) at 
../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
reg_data=0xba28c0)

at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
priority=0x7fffe7991b58) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355

#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
component=0x7f6c0cf50940, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317

#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281

#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
../../../../ompi/mca/coll/base/coll_base_comm_select.c:117

#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918

#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
at pinit.c:84

#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32

(gdb) up

#1  

(gdb) up

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

522   if (name1->jobid < name2->jobid) {

(gdb) print name1

$1 = (const orte_process_name_t *) 0x192350001

(gdb) print *name1

Cannot access memory at address 0x192350001

(gdb) print name2

$2 = (const orte_process_name_t *) 0xbaf76c

(gdb) print *name2

$3 = {jobid = 2452946945, vpid = 1}

(gdb)







>-Original Message-

>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles

>Gouaillardet

>Sent: Wednesday, July 30, 2014 2:16 AM

>To: Open MPI Developers

>Subject: Re: [OMPI devel] trunk compilation errors in jenkins

>

>George,

>

>#4815 is indirectly related to the move :

>

>in bcol/basesmuma, we used to compare ompi_process_name_t, and now

>we (try to) compare an ompi_process_name_t and an opal_process_name_t

>(which causes a glory SIGSEGV)

>

>i proposed a temporary patch which is both broken and unelegant, could you

>please advise a correct solution ?

>

>Cheers,

>

>Gilles

>

>On 2014/07/27 7:37, George Bosilca wrote:

>> If you have any issue with the move, I'll be happy to help and/or support

>you on your 

Re: [OMPI devel] OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles GOUAILLARDET
I will fix this tomorrow

Right now, --enable-mpi-fortran is --enable-mpi-fortran=yes is 
--enable-mpi-fortran=all :
So configure aborts if not all bindings can be built

In ompi_configure_options.m4 :
OMPI_FORTRAN_USER_REQUESTED=0
108 case "x$enable_mpi_fortran" in
109 x)
110 AC_MSG_RESULT([yes (all/default)])
111 OMPI_WANT_FORTRAN_MPIFH_BINDINGS=1
112 OMPI_WANT_FORTRAN_USEMPI_BINDINGS=1
113 OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS=1
114 ;;
115 
116 xyes|xall)
117 AC_MSG_RESULT([yes (all)])
118 OMPI_FORTRAN_USER_REQUESTED=1
119 OMPI_WANT_FORTRAN_MPIFH_BINDINGS=1
120 OMPI_WANT_FORTRAN_USEMPI_BINDINGS=1
121 OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS=1
122 ;;

OMPI_FORTRAN_USER_REQUESTED=1
should only happen when xall an not when xyes

I will review this tomorrow,
In the mean time, feel free to revert the changeset or simply not use the 
--enable-mpi-fortran for now

Cheers,

Gilles

Ralph Castain  wrote:
>Ummthis really broke things now. I can't build the fortran bindings at 
>all, and I don't have a PGI compiler. I also didn't specify a level of Fortran 
>support, but just had --enable-mpi-fortran
>
>Maybe we need to revert this commit until we figure out a better solution?
>
>On Jul 30, 2014, at 12:16 AM, Gilles Gouaillardet 
> wrote:
>
>> Paul,
>> 
>> this is a fair point.
>> 
>> i commited r32354 in order to abort configure in this case
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/07/30 15:11, Paul Hargrove wrote:
>>> On a related topic:
>>> 
>>> I configured with an explicit --enable-mpi-fortran=usempif08.
>>> Then configure found PROCEDURE was missing/broken.
>>> The result is that the build continued, but without the requested f08
>>> support.
>>> 
>>> If the user has explicitly enabled a given level of Fortran support, but it
>>> cannot be provided, shouldn't this be a configure-time error?
>>> 
>>> -Paul
>>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15352.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/07/15357.php


Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-30 Thread Nathan Hjelm

This is odd. The variable in question is registered by the MCA itself. I
will take a look and see if I can determine why it isn't being
deregistered correctly when the rest of the component's parameters are.

-Nathan

On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote:
> Nathan,
> 
> Thanks for your response.
> 
> Yes. My previous mail was the result of uncommented code.
> Now I also pulled latest varList source code which uncommented
> the section you mentioned, but the result was same.
> 
> If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX
> for variables for unloaded components, not returning
> MPI_T_ERR_INVALID_INDEX is the problem.
> 
> I run varList on GDB and found that MPI_T_cvar_get_info returns
> MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane).
> But it returns MPI_SUCCESS for shmem_sysv_major_version.
> The difference is mbv_flags values. mbv_flags is 0x44 for
> shmem_sysv_priority on MPI_T_cvar_get_info call so that
> mca_base_var_get function in opal/mca/base/mca_base_var.c
> returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for
> shmem_sysv_major_version so that mca_base_var_get function
> returns OPAL_SUCCESS.
> 
> Control variables for unloaded components are not deregistered
> completely?
> 
> I can track it more when I have time.
> 
> My environment:
>   OS: Debian GNU/Linux wheezy
>   CPU: x86_64
>   Run: mpiexec -n 1 varList
>   Open MPI source: trunk r32338 (almost latest)
>   Open MPI configure:
> enable_picky=yes
> enable_debug=yes
> enable_mem_debug=yes
> enable_mem_profile=yes
> enable_memchecker=no
> 
> enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras
> enable_contrib_no_build=vt
> enable_mpi_cxx=no
> enable_mpi_f77=no
> enable_mpi_f90=no
> enable_ipv6=no
> enable_mpi_io=no
> with_devel_headers=no
> with_wrapper_cflags=-g
> with_wrapper_cxxflags=-g
> with_wrapper_fflags=-g
> with_wrapper_fcflags=-g
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > The problem is the code in question does not check the return code of
> > MPI_T_cvar_handle_alloc . We are returning an error and they still try
> > to use the handle (which is stale). Uncomment this section of the code:
> > 
> > 
> > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This 
> > variable is not recognized by Mvapich. It is OpenMPI specific.
> > //  continue;
> > 
> > 
> > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> > must not have implemented it (and thus should not claim to be MPI 3.0).
> > 
> > -Nathan
> > 
> > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> > > Hi,
> > > 
> > > I encountered the same SEGV reported on the users list when
> > > running varList program.
> > > 
> > >   http://www.open-mpi.org/community/lists/users/2014/07/24792.php
> > > 
> > > mpiexec -n 1 ./varList:
> > > 
> > > ... snip ...
> > > event U/D-2 CHAR   n/a  
> > > ALL
> > > event_base_verboseD/D-8 INTn/a  
> > > LOCAL0
> > > event_libevent2021_event_include  U/A-3 CHAR   n/a  
> > > LOCALpoll
> > > opal_event_includeU/A-3 CHAR   n/a  
> > > LOCALpoll
> > > event_libevent2021_major_version  D/A-9 INTn/a  
> > > UNKNOWN  1
> > > event_libevent2021_minor_version  D/A-9 INTn/a  
> > > UNKNOWN  9
> > > event_libevent2021_release_versionD/A-9 INTn/a  
> > > UNKNOWN  0
> > > shmem U/D-2 CHAR   n/a  
> > > ALL
> > > shmem_base_verboseD/D-8 INTn/a  
> > > LOCAL0
> > > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR   n/a  
> > > ALL-EQ
> > > shmem_mmap_priority   U/A-3 INTn/a  
> > > ALL  50
> > > shmem_mmap_enable_nfs_warning D/A-9 INTn/a  
> > > LOCALtrue
> > > shmem_mmap_relocate_backing_file  D/A-9 INTn/a  
> > > ALL  0
> > > shmem_mmap_backing_file_base_dir  D/A-9 CHAR   n/a  
> > > ALL  /dev/shm
> > > shmem_mmap_major_version  D/A-9 INTn/a  
> > > UNKNOWN  1
> > > shmem_mmap_minor_version  D/A-9 INTn/a  
> > > UNKNOWN  9
> > > shmem_mmap_release_versionD/A-9 INTn/a  
> > > UNKNOWN  0
> > > shmem_posix_major_version D/A-9 INTn/a  

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Ralph Castain
Ummthis really broke things now. I can't build the fortran bindings at all, 
and I don't have a PGI compiler. I also didn't specify a level of Fortran 
support, but just had --enable-mpi-fortran

Maybe we need to revert this commit until we figure out a better solution?

On Jul 30, 2014, at 12:16 AM, Gilles Gouaillardet 
 wrote:

> Paul,
> 
> this is a fair point.
> 
> i commited r32354 in order to abort configure in this case
> 
> Cheers,
> 
> Gilles
> 
> On 2014/07/30 15:11, Paul Hargrove wrote:
>> On a related topic:
>> 
>> I configured with an explicit --enable-mpi-fortran=usempif08.
>> Then configure found PROCEDURE was missing/broken.
>> The result is that the build continued, but without the requested f08
>> support.
>> 
>> If the user has explicitly enabled a given level of Fortran support, but it
>> cannot be provided, shouldn't this be a configure-time error?
>> 
>> -Paul
>> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15352.php



Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
I just fixed this one - all that was required was an ampersand as the name was 
being passed into the function instead of a pointer to the name

r32357

On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
 wrote:

> Rolf,
> 
> r32353 can be seen as a suspect...
> Even if it is correct, it might have exposed the bug discussed in #4815 even 
> more (e.g. we hit the bug 100% after the fix)
> 
> does the attached patch to #4815 fixes the problem ?
> 
> If yes, and if you see this issue as a showstopper, feel free to commit it 
> and drop a note to #4815
> ( I am afk until tomorrow)
> 
> Cheers,
> 
> Gilles
> 
> Rolf vandeVaart  wrote:
> Just an FYI that my trunk version (r32355) does not work at all anymore if I 
> do not include "--mca coll ^ml".Here is a stack trace from the 
> ibm/pt2pt/send test running on a single node.
> 
>  
> 
> (gdb) where
> 
> #0  0x7f6c0d1321d0 in ?? ()
> 
> #1  
> 
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
> 
> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
> back_files=0x7f6bf3ffd6c8,
> 
> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
> map_all=false) at 
> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
> 
> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
> reg_data=0xba28c0)
> 
> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
> 
> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
> 
> #6  0x7f6c0cced68f in ml_module_memory_initialization 
> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
> 
> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
> 
> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
> priority=0x7fffe7991b58) at 
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
> 
> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
> 
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
> 
> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
> priority=0x7fffe7991b58, module=0x7fffe7991b90)
> 
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
> 
> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
> 
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
> 
> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
> 
> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
> 
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
> requested=0, provided=0x7fffe79922e8) at 
> ../../ompi/runtime/ompi_mpi_init.c:918
> 
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, 
> argv=0x7fffe7992340) at pinit.c:84
> 
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
> 
> (gdb) up
> 
> #1  
> 
> (gdb) up
> 
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
> 
> 522   if (name1->jobid < name2->jobid) {
> 
> (gdb) print name1
> 
> $1 = (const orte_process_name_t *) 0x192350001
> 
> (gdb) print *name1
> 
> Cannot access memory at address 0x192350001
> 
> (gdb) print name2
> 
> $2 = (const orte_process_name_t *) 0xbaf76c
> 
> (gdb) print *name2
> 
> $3 = {jobid = 2452946945, vpid = 1}
> 
> (gdb)
> 
>  
> 
>  
> 
>  
> 
> >-Original Message-
> 
> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
> 
> >Gouaillardet
> 
> >Sent: Wednesday, July 30, 2014 2:16 AM
> 
> >To: Open MPI Developers
> 
> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
> 
> > 
> 
> >George,
> 
> > 
> 
> >#4815 is indirectly related to the move :
> 
> > 
> 
> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
> 
> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
> 
> >(which causes a glory SIGSEGV)
> 
> > 
> 
> >i proposed a temporary patch which is both broken and unelegant, could you
> 
> >please advise a correct solution ?
> 
> > 
> 
> >Cheers,
> 
> > 
> 
> >Gilles
> 
> > 
> 
> >On 2014/07/27 7:37, George Bosilca wrote:
> 
> >> If you have any issue with the move, I’ll be happy to help and/or support
> 
> >you on your last move toward a completely generic BTL. To facilitate your
> 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles GOUAILLARDET
Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart  wrote:
>
>
>Just an FYI that my trunk version (r32355) does not work at all anymore if I 
>do not include "--mca coll ^ml".    Here is a stack trace from the 
>ibm/pt2pt/send test running on a single node.
>
> 
>
>(gdb) where
>
>#0  0x7f6c0d1321d0 in ?? ()
>
>#1  
>
>#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>back_files=0x7f6bf3ffd6c8, 
>
>comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>map_all=false) at 
>../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
>#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
>(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>reg_data=0xba28c0)
>
>    at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
>#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
>#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
>at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
>#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
>#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>priority=0x7fffe7991b58) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
>#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
>#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
>#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
>#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
>#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
>#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918
>
>#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
>at pinit.c:84
>
>#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
>(gdb) up
>
>#1  
>
>(gdb) up
>
>#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>522       if (name1->jobid < name2->jobid) {
>
>(gdb) print name1
>
>$1 = (const orte_process_name_t *) 0x192350001
>
>(gdb) print *name1
>
>Cannot access memory at address 0x192350001
>
>(gdb) print name2
>
>$2 = (const orte_process_name_t *) 0xbaf76c
>
>(gdb) print *name2
>
>$3 = {jobid = 2452946945, vpid = 1}
>
>(gdb)
>
> 
>
> 
>
> 
>
>>-Original Message-
>
>>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>
>>Gouaillardet
>
>>Sent: Wednesday, July 30, 2014 2:16 AM
>
>>To: Open MPI Developers
>
>>Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>
>>
>
> 
>
>>George,
>
>>
>
> 
>
>>#4815 is indirectly related to the move :
>
>>
>
> 
>
>>in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>
>>we (try to) compare an ompi_process_name_t and an opal_process_name_t
>
>>(which causes a glory SIGSEGV)
>
>>
>
> 
>
>>i proposed a temporary patch which is both broken and unelegant, could you
>
>>please advise a correct solution ?
>
>>
>
> 
>
>>Cheers,
>
>>
>
> 
>
>>Gilles
>
>>
>
> 
>
>>On 2014/07/27 7:37, George Bosilca wrote:
>
>>> If you have any issue with the move, I’ll be happy to help and/or support
>
>>you on your last move toward a completely generic BTL. To facilitate your
>
>>work I exposed a minimalistic set of OMPI information at the OPAL level. Take
>
>>a look at opal/util/proc.h for more info, but please try not to expose more.
>
>>
>
> 
>
>>___
>
>>devel mailing list
>
>>de...@open-mpi.org
>
>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>>Link to this post: http://www.open-
>

Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Rolf vandeVaart
Just an FYI that my trunk version (r32355) does not work at all anymore if I do 
not include "--mca coll ^ml".Here is a stack trace from the ibm/pt2pt/send 
test running on a single node.



(gdb) where

#0  0x7f6c0d1321d0 in ?? ()

#1  

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
back_files=0x7f6bf3ffd6c8,

comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
map_all=false) at 
../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
reg_data=0xba28c0)

at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
priority=0x7fffe7991b58) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355

#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
component=0x7f6c0cf50940, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317

#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281

#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
../../../../ompi/mca/coll/base/coll_base_comm_select.c:117

#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918

#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
at pinit.c:84

#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32

(gdb) up

#1  

(gdb) up

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

522   if (name1->jobid < name2->jobid) {

(gdb) print name1

$1 = (const orte_process_name_t *) 0x192350001

(gdb) print *name1

Cannot access memory at address 0x192350001

(gdb) print name2

$2 = (const orte_process_name_t *) 0xbaf76c

(gdb) print *name2

$3 = {jobid = 2452946945, vpid = 1}

(gdb)







>-Original Message-

>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles

>Gouaillardet

>Sent: Wednesday, July 30, 2014 2:16 AM

>To: Open MPI Developers

>Subject: Re: [OMPI devel] trunk compilation errors in jenkins

>

>George,

>

>#4815 is indirectly related to the move :

>

>in bcol/basesmuma, we used to compare ompi_process_name_t, and now

>we (try to) compare an ompi_process_name_t and an opal_process_name_t

>(which causes a glory SIGSEGV)

>

>i proposed a temporary patch which is both broken and unelegant, could you

>please advise a correct solution ?

>

>Cheers,

>

>Gilles

>

>On 2014/07/27 7:37, George Bosilca wrote:

>> If you have any issue with the move, I'll be happy to help and/or support

>you on your last move toward a completely generic BTL. To facilitate your

>work I exposed a minimalistic set of OMPI information at the OPAL level. Take

>a look at opal/util/proc.h for more info, but please try not to expose more.

>

>___

>devel mailing list

>de...@open-mpi.org

>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel

>Link to this post: 
>http://www.open-

>mpi.org/community/lists/devel/2014/07/15348.php

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul,

this is a fair point.

i commited r32354 in order to abort configure in this case

Cheers,

Gilles

On 2014/07/30 15:11, Paul Hargrove wrote:
> On a related topic:
>
> I configured with an explicit --enable-mpi-fortran=usempif08.
> Then configure found PROCEDURE was missing/broken.
> The result is that the build continued, but without the requested f08
> support.
>
> If the user has explicitly enabled a given level of Fortran support, but it
> cannot be provided, shouldn't this be a configure-time error?
>
> -Paul
>



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Hi Paul, thank you for your comment.

I don't think my mpi_f08.mod is older one, because the time stamp is
equal to the time when I rebuilt them today.

[mishima@manage openmpi-1.8.2rc2-pgi14.7]$ ll lib/mpi*
-rwxr-xr-x 1 mishima mishima315 Jul 30 12:27 lib/mpi_ext.mod
-rwxr-xr-x 1 mishima mishima327 Jul 30 12:27 lib/mpi_f08_ext.mod
-rwxr-xr-x 1 mishima mishima  11716 Jul 30 12:27
lib/mpi_f08_interfaces_callbacks.mod
-rwxr-xr-x 1 mishima mishima 374813 Jul 30 12:27 lib/mpi_f08_interfaces.mod
-rwxr-xr-x 1 mishima mishima 715615 Jul 30 12:27 lib/mpi_f08.mod
-rwxr-xr-x 1 mishima mishima  14730 Jul 30 12:27 lib/mpi_f08_sizeof.mod
-rwxr-xr-x 1 mishima mishima  77141 Jul 30 12:27 lib/mpi_f08_types.mod
-rwxr-xr-x 1 mishima mishima 878339 Jul 30 12:27 lib/mpi.mod

Regards,
Tetsuya

> On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove wrote:
>
> On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove wrote:
> I am trying again with an explicit --enable-mpi-fortran=usempi at
configure time to see what happens.
>
> Of course that should have said --enable-mpi-fortran=usempif08
>
> I've switched to using PG13.6 for my testing.
> I find that even when I pass that flag I see that use_mpi_f08 is NOT
enabled:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking
variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK...
no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
IGNORE_TKR
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
yes
> checking if Fortran compiler supports PROCEDURE... no
> checking if building Fortran 'use mpi_f08' bindings... no
>
> Contrast that to openmpi-1.8.1 and the same compiler:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking
variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK...
no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
IGNORE_TKR
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
yes
> checking if Fortran compiler supports optional arguments... yes
> checking if Fortran compiler supports PRIVATE... yes
> checking if Fortran compiler supports PROTECTED... yes
> checking if Fortran compiler supports ABSTRACT... yes
> checking if Fortran compiler supports ASYNCHRONOUS... yes
> checking if Fortran compiler supports PROCEDURE... no
> checking size of Fortran type(test_mpi_handle)... 4
> checking Fortran compiler F08 assumed rank syntax... not cached; checking
> checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
> checking Fortran compiler F08 assumed rank syntax... no
> checking which mpi_f08 implementation to build... "good" compiler, no
array subsections
> checking if building Fortran 'use mpi_f08' bindings... yes
>
> So, somewhere between 1.8.1 and 1.8.2rc2 something has happened in the
configure logic to disqualify the pgf90 compiler.
>
> I also surprised to see 1.8.2rc2 performing *fewer* tests of FC then
1.8.1 did (unless they moved elsewhere?).
>
> In the end I cannot reproduce the originally reported problem for the
simple reason that I instead see:
>
> {hargrove@hopper04
openmpi-1.8.2rc2-linux-x86_64-pgi-14.4}$ ./INST/bin/mpif90 ../test.f
> PGF90-F-0004-Unable to open MODULE file mpi_f08.mod (../test.f: 2)
> PGF90/x86-64 Linux 14.4-0: compilation aborted
>
>
> Tetsuya Mishima,
>
> Is it possible that your installation of 1.8.2rc2 was to the same prefix
as an older build?
> It that is the case, you may have the mpi_f08.mod from the older build
even though no f08 support is in the new build.
>
>
> -Paul
>
>
> --
>
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: 

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


This is another one.

(See attached file: openmpi-1.8.2rc2-pgi14.7.tar.gz)

Tetusya

> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php

openmpi-1.8.2rc2-pgi14.7.tar.gz
Description: Binary data


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Hi Jeff,

Sorry for poor information and late reply. Today, I attended a very very
long meeting ...

Anyway, I attached compile-output and configure-log.
(due to file size limitation, I send them in twice)

I hope you could find the problem.

(See attached file: openmpi-1.8-pgi14.7.tar.gz)

Regards,
Tetsuya

> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php

openmpi-1.8-pgi14.7.tar.gz
Description: Binary data


Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
George,

#4815 is indirectly related to the move :

in bcol/basesmuma, we used to compare ompi_process_name_t, and now we
(try to)
compare an ompi_process_name_t and an opal_process_name_t (which causes
a glory SIGSEGV)

i proposed a temporary patch which is both broken and unelegant,
could you please advise a correct solution ?

Cheers,

Gilles

On 2014/07/27 7:37, George Bosilca wrote:
> If you have any issue with the move, I’ll be happy to help and/or support you 
> on your last move toward a completely generic BTL. To facilitate your work I 
> exposed a minimalistic set of OMPI information at the OPAL level. Take a look 
> at opal/util/proc.h for more info, but please try not to expose more.



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
On a related topic:

I configured with an explicit --enable-mpi-fortran=usempif08.
Then configure found PROCEDURE was missing/broken.
The result is that the build continued, but without the requested f08
support.

If the user has explicitly enabled a given level of Fortran support, but it
cannot be provided, shouldn't this be a configure-time error?

-Paul


On Tue, Jul 29, 2014 at 9:41 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

>  Paul,
>
> i am sorry i missed that.
>
> and you are right, 1.8.1 and 1.8 from svn differs :
>
> from svn (config/ompi_setup_mpi_fortran.m4)
> # Per https://svn.open-mpi.org/trac/ompi/ticket/4590, if the
> # Fortran compiler doesn't support PROCEDURE in the way we
> # want/need, disable the mpi_f08 module.
> OMPI_FORTRAN_HAVE_PROCEDURE=0
> AS_IF([test $OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS -eq 1 -a \
>$OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS -eq 1],
>   [ # Does the compiler support "procedure"
>OMPI_FORTRAN_CHECK_PROCEDURE(
>[OMPI_FORTRAN_HAVE_PROCEDURE=1],
>[OMPI_FORTRAN_HAVE_PROCEDURE=0
> OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS=0])])
>
> 1.8.1 does not disqualify f08 bindings if PROCEDURE is not supported.
> /* for the sake of completion, in some cases, 1.8.1 *might* disqualify f08
> bindings if PROCEDURE *is* supported :
> # Per https://svn.open-mpi.org/trac/ompi/ticket/4157, temporarily
> # disqualify the fortran compiler if it exhibits the behavior
> # described in that ticket.  Short version: OMPI does something
> # non-Fortran that we don't have time to fix 1.7.4.  So we just
> # disqualify Fortran compilers who actually enforce this issue,
> # and we'll fix OMPI to be Fortran-compliant after 1.7.4
> AS_IF([test $OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS -eq 1 && \
>test $OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS -eq 1 && \
>test $OMPI_FORTRAN_HAVE_PROCEDURE -eq 1 && \
>test $OMPI_FORTRAN_HAVE_ABSTRACT -eq 1],
>   [ # Check for ticket 4157
>OMPI_FORTRAN_CHECK_TICKET_4157(
>[],
>[ # If we don't have this, don't build the mpi_f08 module
> OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS=0])])
>
>
> from the sources and #4590, f08 binding is intentionally disabled since
> PGI compilers does not support PROCEDURE.
> i agree this is really bad for PGI users :-(
>
> Jeff, can you comment on that ?
>
> Cheers,
>
> Gilles
>
> On 2014/07/30 13:25, Paul Hargrove wrote:
>
> Giles,
>
> If you look more carefully at the output I provided you will see that 1.8.1
> *does* test for PROCEDURE support and finds it lacking.  BOTH outputs
> include:
>  checking if Fortran compiler supports PROCEDURE... no
>
> However in the 1.8.1 case that is apparently not sufficient to disqualify
> building the f08 module.
>
> The test does fail in both 1.8.1 and 1.8.2rc2.
> Here is the related portion of config.log from one of them:
>
> configure:57708: checking if Fortran compiler supports PROCEDURE
> configure:57735: pgf90 -c -g conftest.f90 >&5 PGF90-S-0155-Illegal
> procedure interface - mpi_user_function (conftest.f90: 12)
> PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
> 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc configure:57735:
> $? = 2 configure: failed program was: | MODULE proc_mod | INTERFACE |
> SUBROUTINE MPI_User_function | END SUBROUTINE | END INTERFACE | END MODULE
> proc_mod | | PROGRAM test_proc | INTERFACE | SUBROUTINE binky(user_fn) |
> USE proc_mod | PROCEDURE(MPI_User_function) :: user_fn | END SUBROUTINE |
> END INTERFACE | END PROGRAM configure:57751: result: no
>
> Other than the line numbers the 1.8.1 and 1.8.2rc2 output are identical in
> this respect.
>
> The test also fails run manually:
>
> {hargrove@hopper04 OMPI}$ pgf90 -c -g conftest.f90 PGF90-S-0155-Illegal
> procedure interface - mpi_user_function (conftest.f90: 12)
> PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
> 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc
> {hargrove@hopper04 OMPI}$ pgf90 -V pgf90 13.10-0 64-bit target on x86-64
> Linux -tp shanghai The Portland Group - PGI Compilers and Tools Copyright
> (c) 2013, NVIDIA CORPORATION. All rights reserved.
>
> -Paul
>
> On Tue, Jul 29, 2014 at 9:09 PM, Gilles Gouaillardet 
>  wrote:
>
>
>   Paul,
>
> from the logs, the only difference i see is about Fortran PROCEDURE.
>
> openpmi 1.8 (svn checkout) does not build the usempif08 bindings if
> PROCEDURE is not supported.
>
> from the logs, openmpi 1.8.1 does not check whether PROCEDURE is supported
> or not
>
> here is the sample program to check PROCEDURE (from
> config/ompi_fortran_check_procedure.m4)
>
> MODULE proc_mod
> INTERFACE
> SUBROUTINE MPI_User_function
> END SUBROUTINE
> END INTERFACE
> END MODULE proc_mod
>
> PROGRAM test_proc
> INTERFACE
> SUBROUTINE 

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul,

i am sorry i missed that.

and you are right, 1.8.1 and 1.8 from svn differs :

from svn (config/ompi_setup_mpi_fortran.m4)
# Per https://svn.open-mpi.org/trac/ompi/ticket/4590, if the
# Fortran compiler doesn't support PROCEDURE in the way we
# want/need, disable the mpi_f08 module.
OMPI_FORTRAN_HAVE_PROCEDURE=0
AS_IF([test $OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS -eq 1 -a \
   $OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS -eq 1],
  [ # Does the compiler support "procedure"
   OMPI_FORTRAN_CHECK_PROCEDURE(
   [OMPI_FORTRAN_HAVE_PROCEDURE=1],
   [OMPI_FORTRAN_HAVE_PROCEDURE=0
OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS=0])])

1.8.1 does not disqualify f08 bindings if PROCEDURE is not supported.
/* for the sake of completion, in some cases, 1.8.1 *might* disqualify
f08 bindings if PROCEDURE *is* supported :
# Per https://svn.open-mpi.org/trac/ompi/ticket/4157, temporarily
# disqualify the fortran compiler if it exhibits the behavior
# described in that ticket.  Short version: OMPI does something
# non-Fortran that we don't have time to fix 1.7.4.  So we just
# disqualify Fortran compilers who actually enforce this issue,
# and we'll fix OMPI to be Fortran-compliant after 1.7.4
AS_IF([test $OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS -eq 1 && \
   test $OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS -eq 1 && \
   test $OMPI_FORTRAN_HAVE_PROCEDURE -eq 1 && \
   test $OMPI_FORTRAN_HAVE_ABSTRACT -eq 1],
  [ # Check for ticket 4157
   OMPI_FORTRAN_CHECK_TICKET_4157(
   [],
   [ # If we don't have this, don't build the mpi_f08 module
OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS=0])])


from the sources and #4590, f08 binding is intentionally disabled since
PGI compilers does not support PROCEDURE.
i agree this is really bad for PGI users :-(

Jeff, can you comment on that ?

Cheers,

Gilles

On 2014/07/30 13:25, Paul Hargrove wrote:
> Giles,
>
> If you look more carefully at the output I provided you will see that 1.8.1
> *does* test for PROCEDURE support and finds it lacking.  BOTH outputs
> include:
>  checking if Fortran compiler supports PROCEDURE... no
>
> However in the 1.8.1 case that is apparently not sufficient to disqualify
> building the f08 module.
>
> The test does fail in both 1.8.1 and 1.8.2rc2.
> Here is the related portion of config.log from one of them:
>
> configure:57708: checking if Fortran compiler supports PROCEDURE
> configure:57735: pgf90 -c -g conftest.f90 >&5 PGF90-S-0155-Illegal
> procedure interface - mpi_user_function (conftest.f90: 12)
> PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
> 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc configure:57735:
> $? = 2 configure: failed program was: | MODULE proc_mod | INTERFACE |
> SUBROUTINE MPI_User_function | END SUBROUTINE | END INTERFACE | END MODULE
> proc_mod | | PROGRAM test_proc | INTERFACE | SUBROUTINE binky(user_fn) |
> USE proc_mod | PROCEDURE(MPI_User_function) :: user_fn | END SUBROUTINE |
> END INTERFACE | END PROGRAM configure:57751: result: no
>
> Other than the line numbers the 1.8.1 and 1.8.2rc2 output are identical in
> this respect.
>
> The test also fails run manually:
>
> {hargrove@hopper04 OMPI}$ pgf90 -c -g conftest.f90 PGF90-S-0155-Illegal
> procedure interface - mpi_user_function (conftest.f90: 12)
> PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
> 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc
> {hargrove@hopper04 OMPI}$ pgf90 -V pgf90 13.10-0 64-bit target on x86-64
> Linux -tp shanghai The Portland Group - PGI Compilers and Tools Copyright
> (c) 2013, NVIDIA CORPORATION. All rights reserved.
>
> -Paul
>
> On Tue, Jul 29, 2014 at 9:09 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul,
>>
>> from the logs, the only difference i see is about Fortran PROCEDURE.
>>
>> openpmi 1.8 (svn checkout) does not build the usempif08 bindings if
>> PROCEDURE is not supported.
>>
>> from the logs, openmpi 1.8.1 does not check whether PROCEDURE is supported
>> or not
>>
>> here is the sample program to check PROCEDURE (from
>> config/ompi_fortran_check_procedure.m4)
>>
>> MODULE proc_mod
>> INTERFACE
>> SUBROUTINE MPI_User_function
>> END SUBROUTINE
>> END INTERFACE
>> END MODULE proc_mod
>>
>> PROGRAM test_proc
>> INTERFACE
>> SUBROUTINE binky(user_fn)
>>   USE proc_mod
>>   PROCEDURE(MPI_User_function) :: user_fn
>> END SUBROUTINE
>> END INTERFACE
>> END PROGRAM
>>
>> i do not have a PGI license, could you please confirm the PGI compiler
>> fails compiling the test above ?
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/07/30 12:54, Paul Hargrove wrote:
>>
>> On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove  
>>  wrote:
>>
>>
>>  On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove  
>>  wrote:
>>
>>

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
Giles,

If you look more carefully at the output I provided you will see that 1.8.1
*does* test for PROCEDURE support and finds it lacking.  BOTH outputs
include:
 checking if Fortran compiler supports PROCEDURE... no

However in the 1.8.1 case that is apparently not sufficient to disqualify
building the f08 module.

The test does fail in both 1.8.1 and 1.8.2rc2.
Here is the related portion of config.log from one of them:

configure:57708: checking if Fortran compiler supports PROCEDURE
configure:57735: pgf90 -c -g conftest.f90 >&5 PGF90-S-0155-Illegal
procedure interface - mpi_user_function (conftest.f90: 12)
PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc configure:57735:
$? = 2 configure: failed program was: | MODULE proc_mod | INTERFACE |
SUBROUTINE MPI_User_function | END SUBROUTINE | END INTERFACE | END MODULE
proc_mod | | PROGRAM test_proc | INTERFACE | SUBROUTINE binky(user_fn) |
USE proc_mod | PROCEDURE(MPI_User_function) :: user_fn | END SUBROUTINE |
END INTERFACE | END PROGRAM configure:57751: result: no

Other than the line numbers the 1.8.1 and 1.8.2rc2 output are identical in
this respect.

The test also fails run manually:

{hargrove@hopper04 OMPI}$ pgf90 -c -g conftest.f90 PGF90-S-0155-Illegal
procedure interface - mpi_user_function (conftest.f90: 12)
PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc
{hargrove@hopper04 OMPI}$ pgf90 -V pgf90 13.10-0 64-bit target on x86-64
Linux -tp shanghai The Portland Group - PGI Compilers and Tools Copyright
(c) 2013, NVIDIA CORPORATION. All rights reserved.

-Paul

On Tue, Jul 29, 2014 at 9:09 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

>  Paul,
>
> from the logs, the only difference i see is about Fortran PROCEDURE.
>
> openpmi 1.8 (svn checkout) does not build the usempif08 bindings if
> PROCEDURE is not supported.
>
> from the logs, openmpi 1.8.1 does not check whether PROCEDURE is supported
> or not
>
> here is the sample program to check PROCEDURE (from
> config/ompi_fortran_check_procedure.m4)
>
> MODULE proc_mod
> INTERFACE
> SUBROUTINE MPI_User_function
> END SUBROUTINE
> END INTERFACE
> END MODULE proc_mod
>
> PROGRAM test_proc
> INTERFACE
> SUBROUTINE binky(user_fn)
>   USE proc_mod
>   PROCEDURE(MPI_User_function) :: user_fn
> END SUBROUTINE
> END INTERFACE
> END PROGRAM
>
> i do not have a PGI license, could you please confirm the PGI compiler
> fails compiling the test above ?
>
> Cheers,
>
> Gilles
>
> On 2014/07/30 12:54, Paul Hargrove wrote:
>
> On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove  
>  wrote:
>
>
>  On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove  
>  wrote:
>
>
>  I am trying again with an explicit --enable-mpi-fortran=usempi at
> configure time to see what happens.
>
>
>  Of course that should have said --enable-mpi-fortran=usempif08
>
>
>  I've switched to using PG13.6 for my testing.
> I find that even when I pass that flag I see that use_mpi_f08 is NOT
> enabled:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
> IGNORE_TKR
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
> yes
> checking if Fortran compiler supports PROCEDURE... no
> *checking if building Fortran 'use mpi_f08' bindings... no*
>
> Contrast that to openmpi-1.8.1 and the same compiler:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
> IGNORE_TKR
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
> yes
> checking if Fortran compiler supports optional arguments... yes
> checking if 

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul,

from the logs, the only difference i see is about Fortran PROCEDURE.

openpmi 1.8 (svn checkout) does not build the usempif08 bindings if
PROCEDURE is not supported.

from the logs, openmpi 1.8.1 does not check whether PROCEDURE is
supported or not

here is the sample program to check PROCEDURE (from
config/ompi_fortran_check_procedure.m4)

MODULE proc_mod
INTERFACE
SUBROUTINE MPI_User_function
END SUBROUTINE
END INTERFACE
END MODULE proc_mod

PROGRAM test_proc
INTERFACE
SUBROUTINE binky(user_fn)
  USE proc_mod
  PROCEDURE(MPI_User_function) :: user_fn
END SUBROUTINE
END INTERFACE
END PROGRAM

i do not have a PGI license, could you please confirm the PGI compiler
fails compiling the test above ?

Cheers,

Gilles

On 2014/07/30 12:54, Paul Hargrove wrote:
> On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove  wrote:
>
>> On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove  wrote:
>>
>>> I am trying again with an explicit --enable-mpi-fortran=usempi at
>>> configure time to see what happens.
>>>
>> Of course that should have said --enable-mpi-fortran=usempif08
>>
> I've switched to using PG13.6 for my testing.
> I find that even when I pass that flag I see that use_mpi_f08 is NOT
> enabled:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
> IGNORE_TKR
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
> yes
> checking if Fortran compiler supports PROCEDURE... no
> *checking if building Fortran 'use mpi_f08' bindings... no*
>
> Contrast that to openmpi-1.8.1 and the same compiler:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
> IGNORE_TKR
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
> yes
> checking if Fortran compiler supports optional arguments... yes
> checking if Fortran compiler supports PRIVATE... yes
> checking if Fortran compiler supports PROTECTED... yes
> checking if Fortran compiler supports ABSTRACT... yes
> checking if Fortran compiler supports ASYNCHRONOUS... yes
> checking if Fortran compiler supports PROCEDURE... no
> checking size of Fortran type(test_mpi_handle)... 4
> checking Fortran compiler F08 assumed rank syntax... not cached; checking
> checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
> checking Fortran compiler F08 assumed rank syntax... no
> checking which mpi_f08 implementation to build... "good" compiler, no array
> subsections
> *checking if building Fortran 'use mpi_f08' bindings... yes*
>
> So, somewhere between 1.8.1 and 1.8.2rc2 something has happened in the
> configure logic to disqualify the pgf90 compiler.
>
> I also surprised to see 1.8.2rc2 performing *fewer* tests of FC then 1.8.1
> did (unless they moved elsewhere?).
>
> In the end I cannot reproduce the originally reported problem for the
> simple reason that I instead see:
>
> {hargrove@hopper04 openmpi-1.8.2rc2-linux-x86_64-pgi-14.4}$
> ./INST/bin/mpif90 ../test.f
> PGF90-F-0004-Unable to open MODULE file mpi_f08.mod (../test.f: 2)
> PGF90/x86-64 Linux 14.4-0: compilation aborted
>
>
> Tetsuya Mishima,
>
> Is it possible that your installation of 1.8.2rc2 was to the same prefix as
> an older build?
> It that is the case, you may have the mpi_f08.mod from the older build even
> though no f08 support is in the new build.
>
>
> -Paul
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15342.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Sorry for poor information. I attached compile-output and configure-log.
I hope you could find the problem.

(See attached file: openmpi-pgi14.7.tar.gz)

Regards,
Tetsuya Mishima

> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php

openmpi-pgi14.7.tar.gz
Description: Binary data


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Paul Hargrove
On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove  wrote:

>
> On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove  wrote:
>
>> I am trying again with an explicit --enable-mpi-fortran=usempi at
>> configure time to see what happens.
>>
>
> Of course that should have said --enable-mpi-fortran=usempif08
>

I've switched to using PG13.6 for my testing.
I find that even when I pass that flag I see that use_mpi_f08 is NOT
enabled:

checking Fortran compiler ignore TKR syntax... not cached; checking variants
checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
IGNORE_TKR
checking if Fortran compiler supports ISO_C_BINDING... yes
checking if building Fortran 'use mpi' bindings... yes
checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
checking if Fortran compiler supports TYPE, BIND(C)... yes
checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
yes
checking if Fortran compiler supports PROCEDURE... no
*checking if building Fortran 'use mpi_f08' bindings... no*

Contrast that to openmpi-1.8.1 and the same compiler:

checking Fortran compiler ignore TKR syntax... not cached; checking variants
checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
IGNORE_TKR
checking if building Fortran 'use mpi' bindings... yes
checking if Fortran compiler supports ISO_C_BINDING... yes
checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
checking if Fortran compiler supports TYPE, BIND(C)... yes
checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
yes
checking if Fortran compiler supports optional arguments... yes
checking if Fortran compiler supports PRIVATE... yes
checking if Fortran compiler supports PROTECTED... yes
checking if Fortran compiler supports ABSTRACT... yes
checking if Fortran compiler supports ASYNCHRONOUS... yes
checking if Fortran compiler supports PROCEDURE... no
checking size of Fortran type(test_mpi_handle)... 4
checking Fortran compiler F08 assumed rank syntax... not cached; checking
checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
checking Fortran compiler F08 assumed rank syntax... no
checking which mpi_f08 implementation to build... "good" compiler, no array
subsections
*checking if building Fortran 'use mpi_f08' bindings... yes*

So, somewhere between 1.8.1 and 1.8.2rc2 something has happened in the
configure logic to disqualify the pgf90 compiler.

I also surprised to see 1.8.2rc2 performing *fewer* tests of FC then 1.8.1
did (unless they moved elsewhere?).

In the end I cannot reproduce the originally reported problem for the
simple reason that I instead see:

{hargrove@hopper04 openmpi-1.8.2rc2-linux-x86_64-pgi-14.4}$
./INST/bin/mpif90 ../test.f
PGF90-F-0004-Unable to open MODULE file mpi_f08.mod (../test.f: 2)
PGF90/x86-64 Linux 14.4-0: compilation aborted


Tetsuya Mishima,

Is it possible that your installation of 1.8.2rc2 was to the same prefix as
an older build?
It that is the case, you may have the mpi_f08.mod from the older build even
though no f08 support is in the new build.


-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900