Re: [OMPI users] HWLOC problem

2011-06-07 Thread Jeff Squyres
This should be addressed on the hwloc-users list; I'll reply over there.


On Jun 7, 2011, at 12:51 PM, vaibhav dutt wrote:

> Hi,
> 
> I have installed HWLOC 1.2 on my cluster , each node has two Intel Xeon E5450 
> quad cores.
> When I try to execute the command "lstopo" to determine the hardware topology 
> of my system,
> I get an error like:
> 
> ./lstopo: error while loading shared libraries: libhwloc.so.3: cannot open 
> shared object file: No such file or directory
> 
> 
> Can anyone please help me as to what is the reason for this error and where 
> can I find this shared
> library.
> 
> Thanks.
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] HWLOC problem

2011-06-07 Thread vaibhav dutt
Hi,

I have installed HWLOC 1.2 on my cluster , each node has two Intel Xeon
E5450 quad cores.
When I try to execute the command "lstopo" to determine the hardware
topology of my system,
I get an error like:

./lstopo: error while loading shared libraries: libhwloc.so.3: cannot open
shared object file: No such file or directory


Can anyone please help me as to what is the reason for this error and where
can I find this shared
library.

Thanks.


[OMPI users] Building OpenMPI v. 1.4.3 in VS2008

2011-06-07 Thread Alan Nichols
Hello,

I'm currently trying to build OpenMPI v. 1.4.3 from source, in VS2008.  
Platform is Win7, SP1 installed ( I realize that this is possibly not an ideal 
approach as v. 1.5.3 has installers for Windows binaries.  However for 
compatibility with other programs I need to use v. 1.4.3 if at all possible;  
also as I have many other libraries build under VS2008, I need to use the 
VS2008 compiler if at all possible).

Following the README.WINDOWS file I found, I used CMake to build a Windows .sln 
file.  I accepted the default CMake settings, with the exception that I only 
created a Release build of OpenMPI.  Upon my first attempt to build the 
solution, I got an error about a missing file stdint.h.  I was able to fix this 
by including the stdint.h from VS2010.  However I now get new errors referencing

__attribute__((__always_inline__))

__asm__ __volatile__("": : :"memory")

These look to me like linux-specific problems -- is it even possible to do what 
I'm attempting, or are the code bases and compiler fundamentally at odds here?  
If it is possible can you explain where my error lies?

Thanks for your help,

Alan Nichols



Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread Edgar Gabriel


On 6/7/2011 10:23 AM, George Bosilca wrote:
> 
> On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:
> 
>> George,
>> 
>> I did not look over all the details of your test, but it looks to
>> me like you are violating one of the requirements of
>> intercomm_create namely the request that the two groups have to be
>> disjoint. In your case the parent process(es) are part of both
>> local intra-communicators, isn't it?
> 
> The two groups of the two local communicators are disjoints. One
> contains A,B while the other only C. The bridge communicator contains
> A,C.
> 
> I'm confident my example is supposed to work. At least for Open MPI
> the error is under the hood, as the resulting inter-communicator is
> valid but contains NULL endpoints for the remote process.

I'll come back to that later, I am not yet convinced that your code is
correct :-) Your local groups might be disjoint, but I am worried about
the ranks of the remote leader in your example. THey can not be 0 from
both groups perspective.

> 
> Regarding the fact that the two leader should be separate processes,
> you will not find any wording about this in the current version of
> the standard. In the 1.1 there were two opposite sentences about this
> one stating that the two groups can be disjoint, while the other
> claiming that the two leaders can be the same process. After
> discussion, the agreement was that the two groups have to be
> disjoint, and the standard has been amended to match the agreement.


I realized that this is a non-issue. If the two local groups are
disjoint, there is no way that the two local leaders are the same process.

Thanks
Edgar

> 
> george.
> 
> 
>> 
>> I just have MPI-1.1. at hand right now, but here is what it says: 
>> 
>> 
>> Overlap of local and remote groups that are bound into an 
>> inter-communicator is prohibited. If there is overlap, then the
>> program is erroneous and is likely to deadlock.
>> 
>>  so bottom line is that the two local intra-communicators that
>> are being used have to be disjoint, and the bridgecomm needs to be
>> a communicator where at least one process of each of the two
>> disjoint groups need to be able to talk to each other.
>> Interestingly I did not find a sentence whether it is allowed to be
>> the same process, or whether the two local leaders need to be
>> separate processes...
>> 
>> 
>> Thanks Edgar
>> 
>> 
>> On 6/7/2011 12:57 AM, George Bosilca wrote:
>>> Frederic,
>>> 
>>> Attached you will find an example that is supposed to work. The
>>> main difference with your code is on T3, T4 where you have
>>> inversed the local and remote comm. As depicted on the picture
>>> attached below, during the 3th step you will create the intercomm
>>> between ab and c (no overlap) using ac as a bridge communicator
>>> (here the two roots, a and c, can exchange messages).
>>> 
>>> Based on the MPI 2.2 standard, especially on the paragraph in
>>> PS:, the attached code should have been working. Unfortunately, I
>>> couldn't run it successfully neither with Open MPI trunk nor
>>> MPICH2 1.4rc1.
>>> 
>>> george.
>>> 
>>> PS: Here is what the MPI standard states about the
>>> MPI_Intercomm_create:
 The function MPI_INTERCOMM_CREATE can be used to create an
 inter-communicator from two existing intra-communicators, in
 the following situation: At least one selected member from each
 group (the “group leader”) has the ability to communicate with
 the selected member from the other group; that is, a “peer”
 communicator exists to which both leaders belong, and each
 leader knows the rank of the other leader in this peer
 communicator. Furthermore, members of each group know the rank
 of their leader.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
>>> 
 Hello,
 
 I have a problem using MPI_Intercomm_create.
 
 I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two
 spawn operations by T0.
 
 So I have two intra-communicator :
 
 intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4
 
 my goal is to make a collective loop to build a single
 intra-communicator containing T0, T1, T2, T3, T4
 
 I tried to do it using MPI_Intercomm_create and
 MPI_Intercom_merge calls, but without success (I always get MPI
 internal errors).
 
 What I am doing :
 
 on T0 : ***
 
 MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
 
 on T1 and T2 : **
 
 MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
 
 on T3 and T4 : **
 
 MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
 
 
 I'm certainly missing something. Could anybody help me to solve
 this problem ?
 
 Best regards,
 
 Frédéric.
 
 PS : of course I did an extensive web search without finding
 anything usefull on my problem.
 
 ___

Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread George Bosilca

On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:

> George,
> 
> I did not look over all the details of your test, but it looks to me
> like you are violating one of the requirements of intercomm_create
> namely the request that the two groups have to be disjoint. In your case
> the parent process(es) are part of both local intra-communicators, isn't it?

The two groups of the two local communicators are disjoints. One contains A,B 
while the other only C. The bridge communicator contains A,C.

I'm confident my example is supposed to work. At least for Open MPI the error 
is under the hood, as the resulting inter-communicator is valid but contains 
NULL endpoints for the remote process.

Regarding the fact that the two leader should be separate processes, you will 
not find any wording about this in the current version of the standard. In the 
1.1 there were two opposite sentences about this one stating that the two 
groups can be disjoint, while the other claiming that the two leaders can be 
the same process. After discussion, the agreement was that the two groups have 
to be disjoint, and the standard has been amended to match the agreement.

  george.


> 
> I just have MPI-1.1. at hand right now, but here is what it says:
> 
> 
> Overlap of local and remote groups that are bound into an
> inter-communicator is prohibited. If there is overlap, then the program
> is erroneous and is likely to deadlock.
> 
> 
> so bottom line is that the two local intra-communicators that are being
> used have to be disjoint, and the bridgecomm needs to be a communicator
> where at least one process of each of the two disjoint groups need to be
> able to talk to each other. Interestingly I did not find a sentence
> whether it is allowed to be the same process, or whether the two local
> leaders need to be separate processes...
> 
> 
> Thanks
> Edgar
> 
> 
> On 6/7/2011 12:57 AM, George Bosilca wrote:
>> Frederic,
>> 
>> Attached you will find an example that is supposed to work. The main 
>> difference with your code is on T3, T4 where you have inversed the local and 
>> remote comm. As depicted on the picture attached below, during the 3th step 
>> you will create the intercomm between ab and c (no overlap) using ac as a 
>> bridge communicator (here the two roots, a and c, can exchange messages).
>> 
>> Based on the MPI 2.2 standard, especially on the paragraph in PS:, the 
>> attached code should have been working. Unfortunately, I couldn't run it 
>> successfully neither with Open MPI trunk nor MPICH2 1.4rc1. 
>> 
>> george.
>> 
>> PS: Here is what the MPI standard states about the MPI_Intercomm_create:
>>> The function MPI_INTERCOMM_CREATE can be used to create an 
>>> inter-communicator from two existing intra-communicators, in the following 
>>> situation: At least one selected member from each group (the “group 
>>> leader”) has the ability to communicate with the selected member from the 
>>> other group; that is, a “peer” communicator exists to which both leaders 
>>> belong, and each leader knows the rank of the other leader in this peer 
>>> communicator. Furthermore, members of each group know the rank of their 
>>> leader.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
>> 
>>> Hello,
>>> 
>>> I have a problem using MPI_Intercomm_create.
>>> 
>>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn
>>> operations by T0.
>>> 
>>> So I have two intra-communicator :
>>> 
>>> intra0 contains : T0, T1, T2
>>> intra1 contains : T0, T3, T4
>>> 
>>> my goal is to make a collective loop to build a single intra-communicator
>>> containing T0, T1, T2, T3, T4
>>> 
>>> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls,
>>> but without success (I always get MPI internal errors).
>>> 
>>> What I am doing :
>>> 
>>> on T0 :
>>> ***
>>> 
>>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
>>> 
>>> on T1 and T2 :
>>> **
>>> 
>>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
>>> 
>>> on T3 and T4 :
>>> **
>>> 
>>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
>>> 
>>> 
>>> I'm certainly missing something. Could anybody help me to solve this
>>> problem ?
>>> 
>>> Best regards,
>>> 
>>> Frédéric.
>>> 
>>> PS : of course I did an extensive web search without finding anything
>>> usefull on my problem.
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Edgar Gabriel
> Assistant Professor
> Parallel Software Technologies Lab  http://pstl.cs.uh.edu
> Department of Computer Science  University of Houston
> Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
> Tel: +1 (713) 743-3857  Fa

Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread Edgar Gabriel
George,

I did not look over all the details of your test, but it looks to me
like you are violating one of the requirements of intercomm_create
namely the request that the two groups have to be disjoint. In your case
the parent process(es) are part of both local intra-communicators, isn't it?

I just have MPI-1.1. at hand right now, but here is what it says:


Overlap of local and remote groups that are bound into an
inter-communicator is prohibited. If there is overlap, then the program
is erroneous and is likely to deadlock.


so bottom line is that the two local intra-communicators that are being
used have to be disjoint, and the bridgecomm needs to be a communicator
where at least one process of each of the two disjoint groups need to be
able to talk to each other. Interestingly I did not find a sentence
whether it is allowed to be the same process, or whether the two local
leaders need to be separate processes...


Thanks
Edgar


On 6/7/2011 12:57 AM, George Bosilca wrote:
> Frederic,
> 
> Attached you will find an example that is supposed to work. The main 
> difference with your code is on T3, T4 where you have inversed the local and 
> remote comm. As depicted on the picture attached below, during the 3th step 
> you will create the intercomm between ab and c (no overlap) using ac as a 
> bridge communicator (here the two roots, a and c, can exchange messages).
> 
> Based on the MPI 2.2 standard, especially on the paragraph in PS:, the 
> attached code should have been working. Unfortunately, I couldn't run it 
> successfully neither with Open MPI trunk nor MPICH2 1.4rc1. 
> 
>  george.
> 
> PS: Here is what the MPI standard states about the MPI_Intercomm_create:
>> The function MPI_INTERCOMM_CREATE can be used to create an 
>> inter-communicator from two existing intra-communicators, in the following 
>> situation: At least one selected member from each group (the “group leader”) 
>> has the ability to communicate with the selected member from the other 
>> group; that is, a “peer” communicator exists to which both leaders belong, 
>> and each leader knows the rank of the other leader in this peer 
>> communicator. Furthermore, members of each group know the rank of their 
>> leader.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
> 
>> Hello,
>>
>> I have a problem using MPI_Intercomm_create.
>>
>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn
>> operations by T0.
>>
>> So I have two intra-communicator :
>>
>> intra0 contains : T0, T1, T2
>> intra1 contains : T0, T3, T4
>>
>> my goal is to make a collective loop to build a single intra-communicator
>> containing T0, T1, T2, T3, T4
>>
>> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls,
>> but without success (I always get MPI internal errors).
>>
>> What I am doing :
>>
>> on T0 :
>> ***
>>
>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
>>
>> on T1 and T2 :
>> **
>>
>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
>>
>> on T3 and T4 :
>> **
>>
>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
>>
>>
>> I'm certainly missing something. Could anybody help me to solve this
>> problem ?
>>
>> Best regards,
>>
>> Frédéric.
>>
>> PS : of course I did an extensive web search without finding anything
>> usefull on my problem.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] parallel I/O on 64-bit indexed arays

2011-06-07 Thread Jeff Squyres
On Jun 7, 2011, at 4:53 AM, Troels Haugboelle wrote:

> In principle yes, but the problem is we have an unequal amount of particles 
> on each node, so the length of each array is not guaranteed to be divisible 
> by 2, 4 or any other number. If I have understood the definition of 
> MPI_TYPE_CREATE_SUBARRAY correctly the offset can be 64-bit, but not the 
> global array size, so, optimally, what I am looking for is something that has 
> unequal size for each thread, simple vector, and with 64-bit offsets and 
> global array size.

It's a bit awkward, but you can still make datatypes to give the offset that 
you want.  E.g., if you need an offset of 2B+31 bytes, you can make datatype A 
with type contig of N=(2B/sizeof(int)) int's.  Then make datatype B with type 
struct, containing type A and 31 MPI_BYTEs.  Then use 1 instance of datatype B 
to get the offset that you want.

You could make utility functions that, given a specific (64 bit) offset, it 
makes an MPI datatype that matches the offset, and then frees it (and all 
sub-datatypes).

There is a bit of overhead in creating these datatypes, but it should be 
dwarfed by the amount of data that you're reading/writing, right?

It's awkward, but it should work.

> Another possible workaround would be to identify subsections that do not pass 
> 2B elements, make sub communicators, and then let each of them dump their 
> elements with proper offsets. It may work. The problematic architecture is a 
> BG/P. On other clusters doing simple I/O, letting all threads open the file, 
> seek to their position, and then write their chunk works fine, but somehow on 
> BG/P performance drops dramatically. My guess is that there is some file 
> locking, or we are overwhelming the I/O nodes..
> 
>> This ticket for the MPI-3 standard is a first step in the right direction, 
>> but won't do everything you need (this is more FYI):
>> 
>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/265
>> 
>> See the PDF attached to the ticket; it's going up for a "first reading" in a 
>> month.  It'll hopefully be part of the MPI-3 standard by the end of the year 
>> (Fab Tillier, CC'ed, has been the chief proponent of this ticket for the 
>> past several months).
>> 
>> Quincey Koziol from the HDF group is going to propose a follow on to this 
>> ticket, specifically about the case you're referring to -- large counts for 
>> file functions and datatype constructors.  Quincey -- can you expand on what 
>> you'll be proposing, perchance?
> 
> Interesting, I think something along the lines of the note would be very 
> useful and needed for large applications.
> 
> Thanks a lot for the pointers and your suggestions,
> 
> cheers,
> 
> Troels


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread Jeff Squyres
George --

Do we need to file a bug about this?


On Jun 7, 2011, at 1:57 AM, George Bosilca wrote:

> Frederic,
> 
> Attached you will find an example that is supposed to work. The main 
> difference with your code is on T3, T4 where you have inversed the local and 
> remote comm. As depicted on the picture attached below, during the 3th step 
> you will create the intercomm between ab and c (no overlap) using ac as a 
> bridge communicator (here the two roots, a and c, can exchange messages).
> 
> Based on the MPI 2.2 standard, especially on the paragraph in PS:, the 
> attached code should have been working. Unfortunately, I couldn't run it 
> successfully neither with Open MPI trunk nor MPICH2 1.4rc1. 
> 
> george.
> 
> PS: Here is what the MPI standard states about the MPI_Intercomm_create:
>> The function MPI_INTERCOMM_CREATE can be used to create an 
>> inter-communicator from two existing intra-communicators, in the following 
>> situation: At least one selected member from each group (the “group leader”) 
>> has the ability to communicate with the selected member from the other 
>> group; that is, a “peer” communicator exists to which both leaders belong, 
>> and each leader knows the rank of the other leader in this peer 
>> communicator. Furthermore, members of each group know the rank of their 
>> leader.
> 
>  Attachment.txt>


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] ifort 12.0.4 install problem

2011-06-07 Thread Virginie trinite
Thanks for your reply
In fact as you said it was a strange problem with the .bashrc
with sudo make install I have the error, but with sudo bash and then
make install the error disappears.
I did'nt understand why
Everything seems OK now, no more problem at runtime.

Thanks


2011/6/7 Jeff Squyres :
> On Jun 6, 2011, at 10:43 AM, Virginie trinite wrote:
>
>> I try to compile open-mpi with ifort 12.0.4. My system is ubuntu
>> lucid. Previous intallation with ifort 11.1 was fine.
>>
>> configure and make all seems to work well, but make install report an error:
>> libtool: line 7847: icc: command not found
>> libtool: install: error: relink `lipopen-rte.la' with the above
>> command before installing it
>>
>> I want to underline that icc is a knom command for bash.
>
> Somehow it became unknown.  Is your PATH being reset somehow?  Or perhaps if 
> your .bashrc resetting your PATH such that even if "which icc" finds it at 
> the shell prompt, if sub-shells have your .bashrc invoked, the PATH gets 
> reset (or the icc settings don't get inherited properly), and therefore it 
> becomes unknown...?
>
>> I have check the FAQ and it seems to me that the problem is more like
>> the one report for IBM compiler. So I try with
>
> I'm a little confused why you're mentioning the IBM compiler...?  This issue 
> is a shell/build issue (I assume...?  You only sent a few lines from the 
> output, so I can't tell exactly where the error is occurring).
>
>> configure CC=icc CXX=icpc F77=ifort FC=ifort --disable-shared --enable-static
>> Now the install finish without error, but when I try to run mpi I have
>> error message:
>
> Now I'm very confused.  :-\
>
> Can you please send all the information listed here:
>
>    http://www.open-mpi.org/community/help/
>
> This will help me understand what the problem is and what you tried to do to 
> fix it.
>
> Thanks.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] parallel I/O on 64-bit indexed arays

2011-06-07 Thread Troels Haugboelle



If I understand your question correctly, this is *exactly* one of the reasons that the 
MPI Forum has been arguing about the use of a new type, "MPI_Count", for 
certain parameters that can get very, very large.


Yes, that would help, but unfortunately only in the future.


Sidenote: I believe that a workaround for you is to create some new MPI 
datatypes (e.g., of type contiguous) that you can then use to multiply to get 
to the offsets that you want.  I.e., if you make a type contig datatype of 4 
doubles, you can still only specify up to 2B of them, but that will now get you 
up to an offset of (2B * 4 * sizeof(double)) rather than (2B * sizeof(double)). 
 Make sense?


In principle yes, but the problem is we have an unequal amount of 
particles on each node, so the length of each array is not guaranteed to 
be divisible by 2, 4 or any other number. If I have understood the 
definition of MPI_TYPE_CREATE_SUBARRAY correctly the offset can be 
64-bit, but not the global array size, so, optimally, what I am looking 
for is something that has unequal size for each thread, simple vector, 
and with 64-bit offsets and global array size.


Another possible workaround would be to identify subsections that do not 
pass 2B elements, make sub communicators, and then let each of them dump 
their elements with proper offsets. It may work. The problematic 
architecture is a BG/P. On other clusters doing simple I/O, letting all 
threads open the file, seek to their position, and then write their 
chunk works fine, but somehow on BG/P performance drops dramatically. My 
guess is that there is some file locking, or we are overwhelming the I/O 
nodes..



This ticket for the MPI-3 standard is a first step in the right direction, but 
won't do everything you need (this is more FYI):

 https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/265

See the PDF attached to the ticket; it's going up for a "first reading" in a 
month.  It'll hopefully be part of the MPI-3 standard by the end of the year (Fab 
Tillier, CC'ed, has been the chief proponent of this ticket for the past several months).

Quincey Koziol from the HDF group is going to propose a follow on to this 
ticket, specifically about the case you're referring to -- large counts for 
file functions and datatype constructors.  Quincey -- can you expand on what 
you'll be proposing, perchance?


Interesting, I think something along the lines of the note would be very 
useful and needed for large applications.


Thanks a lot for the pointers and your suggestions,

cheers,

Troels


Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread George Bosilca
Frederic,

Attached you will find an example that is supposed to work. The main difference 
with your code is on T3, T4 where you have inversed the local and remote comm. 
As depicted on the picture attached below, during the 3th step you will create 
the intercomm between ab and c (no overlap) using ac as a bridge communicator 
(here the two roots, a and c, can exchange messages).

Based on the MPI 2.2 standard, especially on the paragraph in PS:, the attached 
code should have been working. Unfortunately, I couldn't run it successfully 
neither with Open MPI trunk nor MPICH2 1.4rc1. 

 george.

PS: Here is what the MPI standard states about the MPI_Intercomm_create:
> The function MPI_INTERCOMM_CREATE can be used to create an inter-communicator 
> from two existing intra-communicators, in the following situation: At least 
> one selected member from each group (the “group leader”) has the ability to 
> communicate with the selected member from the other group; that is, a “peer” 
> communicator exists to which both leaders belong, and each leader knows the 
> rank of the other leader in this peer communicator. Furthermore, members of 
> each group know the rank of their leader.



intercomm_create.c
Description: Binary data




PastedGraphic-2.pdf
Description: Adobe PDF document


On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:

> Hello,
> 
> I have a problem using MPI_Intercomm_create.
> 
> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn
> operations by T0.
> 
> So I have two intra-communicator :
> 
> intra0 contains : T0, T1, T2
> intra1 contains : T0, T3, T4
> 
> my goal is to make a collective loop to build a single intra-communicator
> containing T0, T1, T2, T3, T4
> 
> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls,
> but without success (I always get MPI internal errors).
> 
> What I am doing :
> 
> on T0 :
> ***
> 
> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
> 
> on T1 and T2 :
> **
> 
> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
> 
> on T3 and T4 :
> **
> 
> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
> 
> 
> I'm certainly missing something. Could anybody help me to solve this
> problem ?
> 
> Best regards,
> 
> Frédéric.
> 
> PS : of course I did an extensive web search without finding anything
> usefull on my problem.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users