Re: [OMPI users] HWLOC problem
This should be addressed on the hwloc-users list; I'll reply over there. On Jun 7, 2011, at 12:51 PM, vaibhav dutt wrote: > Hi, > > I have installed HWLOC 1.2 on my cluster , each node has two Intel Xeon E5450 > quad cores. > When I try to execute the command "lstopo" to determine the hardware topology > of my system, > I get an error like: > > ./lstopo: error while loading shared libraries: libhwloc.so.3: cannot open > shared object file: No such file or directory > > > Can anyone please help me as to what is the reason for this error and where > can I find this shared > library. > > Thanks. > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] HWLOC problem
Hi, I have installed HWLOC 1.2 on my cluster , each node has two Intel Xeon E5450 quad cores. When I try to execute the command "lstopo" to determine the hardware topology of my system, I get an error like: ./lstopo: error while loading shared libraries: libhwloc.so.3: cannot open shared object file: No such file or directory Can anyone please help me as to what is the reason for this error and where can I find this shared library. Thanks.
[OMPI users] Building OpenMPI v. 1.4.3 in VS2008
Hello, I'm currently trying to build OpenMPI v. 1.4.3 from source, in VS2008. Platform is Win7, SP1 installed ( I realize that this is possibly not an ideal approach as v. 1.5.3 has installers for Windows binaries. However for compatibility with other programs I need to use v. 1.4.3 if at all possible; also as I have many other libraries build under VS2008, I need to use the VS2008 compiler if at all possible). Following the README.WINDOWS file I found, I used CMake to build a Windows .sln file. I accepted the default CMake settings, with the exception that I only created a Release build of OpenMPI. Upon my first attempt to build the solution, I got an error about a missing file stdint.h. I was able to fix this by including the stdint.h from VS2010. However I now get new errors referencing __attribute__((__always_inline__)) __asm__ __volatile__("": : :"memory") These look to me like linux-specific problems -- is it even possible to do what I'm attempting, or are the code bases and compiler fundamentally at odds here? If it is possible can you explain where my error lies? Thanks for your help, Alan Nichols
Re: [OMPI users] Problem with MPI_Intercomm_create
On 6/7/2011 10:23 AM, George Bosilca wrote: > > On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote: > >> George, >> >> I did not look over all the details of your test, but it looks to >> me like you are violating one of the requirements of >> intercomm_create namely the request that the two groups have to be >> disjoint. In your case the parent process(es) are part of both >> local intra-communicators, isn't it? > > The two groups of the two local communicators are disjoints. One > contains A,B while the other only C. The bridge communicator contains > A,C. > > I'm confident my example is supposed to work. At least for Open MPI > the error is under the hood, as the resulting inter-communicator is > valid but contains NULL endpoints for the remote process. I'll come back to that later, I am not yet convinced that your code is correct :-) Your local groups might be disjoint, but I am worried about the ranks of the remote leader in your example. THey can not be 0 from both groups perspective. > > Regarding the fact that the two leader should be separate processes, > you will not find any wording about this in the current version of > the standard. In the 1.1 there were two opposite sentences about this > one stating that the two groups can be disjoint, while the other > claiming that the two leaders can be the same process. After > discussion, the agreement was that the two groups have to be > disjoint, and the standard has been amended to match the agreement. I realized that this is a non-issue. If the two local groups are disjoint, there is no way that the two local leaders are the same process. Thanks Edgar > > george. > > >> >> I just have MPI-1.1. at hand right now, but here is what it says: >> >> >> Overlap of local and remote groups that are bound into an >> inter-communicator is prohibited. If there is overlap, then the >> program is erroneous and is likely to deadlock. >> >> so bottom line is that the two local intra-communicators that >> are being used have to be disjoint, and the bridgecomm needs to be >> a communicator where at least one process of each of the two >> disjoint groups need to be able to talk to each other. >> Interestingly I did not find a sentence whether it is allowed to be >> the same process, or whether the two local leaders need to be >> separate processes... >> >> >> Thanks Edgar >> >> >> On 6/7/2011 12:57 AM, George Bosilca wrote: >>> Frederic, >>> >>> Attached you will find an example that is supposed to work. The >>> main difference with your code is on T3, T4 where you have >>> inversed the local and remote comm. As depicted on the picture >>> attached below, during the 3th step you will create the intercomm >>> between ab and c (no overlap) using ac as a bridge communicator >>> (here the two roots, a and c, can exchange messages). >>> >>> Based on the MPI 2.2 standard, especially on the paragraph in >>> PS:, the attached code should have been working. Unfortunately, I >>> couldn't run it successfully neither with Open MPI trunk nor >>> MPICH2 1.4rc1. >>> >>> george. >>> >>> PS: Here is what the MPI standard states about the >>> MPI_Intercomm_create: The function MPI_INTERCOMM_CREATE can be used to create an inter-communicator from two existing intra-communicators, in the following situation: At least one selected member from each group (the “group leader”) has the ability to communicate with the selected member from the other group; that is, a “peer” communicator exists to which both leaders belong, and each leader knows the rank of the other leader in this peer communicator. Furthermore, members of each group know the rank of their leader. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote: >>> Hello, I have a problem using MPI_Intercomm_create. I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn operations by T0. So I have two intra-communicator : intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4 my goal is to make a collective loop to build a single intra-communicator containing T0, T1, T2, T3, T4 I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls, but without success (I always get MPI internal errors). What I am doing : on T0 : *** MPI_Intercom_create(intra0,0,intra1,0,1,&new_com) on T1 and T2 : ** MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com) on T3 and T4 : ** MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com) I'm certainly missing something. Could anybody help me to solve this problem ? Best regards, Frédéric. PS : of course I did an extensive web search without finding anything usefull on my problem. ___
Re: [OMPI users] Problem with MPI_Intercomm_create
On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote: > George, > > I did not look over all the details of your test, but it looks to me > like you are violating one of the requirements of intercomm_create > namely the request that the two groups have to be disjoint. In your case > the parent process(es) are part of both local intra-communicators, isn't it? The two groups of the two local communicators are disjoints. One contains A,B while the other only C. The bridge communicator contains A,C. I'm confident my example is supposed to work. At least for Open MPI the error is under the hood, as the resulting inter-communicator is valid but contains NULL endpoints for the remote process. Regarding the fact that the two leader should be separate processes, you will not find any wording about this in the current version of the standard. In the 1.1 there were two opposite sentences about this one stating that the two groups can be disjoint, while the other claiming that the two leaders can be the same process. After discussion, the agreement was that the two groups have to be disjoint, and the standard has been amended to match the agreement. george. > > I just have MPI-1.1. at hand right now, but here is what it says: > > > Overlap of local and remote groups that are bound into an > inter-communicator is prohibited. If there is overlap, then the program > is erroneous and is likely to deadlock. > > > so bottom line is that the two local intra-communicators that are being > used have to be disjoint, and the bridgecomm needs to be a communicator > where at least one process of each of the two disjoint groups need to be > able to talk to each other. Interestingly I did not find a sentence > whether it is allowed to be the same process, or whether the two local > leaders need to be separate processes... > > > Thanks > Edgar > > > On 6/7/2011 12:57 AM, George Bosilca wrote: >> Frederic, >> >> Attached you will find an example that is supposed to work. The main >> difference with your code is on T3, T4 where you have inversed the local and >> remote comm. As depicted on the picture attached below, during the 3th step >> you will create the intercomm between ab and c (no overlap) using ac as a >> bridge communicator (here the two roots, a and c, can exchange messages). >> >> Based on the MPI 2.2 standard, especially on the paragraph in PS:, the >> attached code should have been working. Unfortunately, I couldn't run it >> successfully neither with Open MPI trunk nor MPICH2 1.4rc1. >> >> george. >> >> PS: Here is what the MPI standard states about the MPI_Intercomm_create: >>> The function MPI_INTERCOMM_CREATE can be used to create an >>> inter-communicator from two existing intra-communicators, in the following >>> situation: At least one selected member from each group (the “group >>> leader”) has the ability to communicate with the selected member from the >>> other group; that is, a “peer” communicator exists to which both leaders >>> belong, and each leader knows the rank of the other leader in this peer >>> communicator. Furthermore, members of each group know the rank of their >>> leader. >> >> >> >> >> >> >> >> >> >> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote: >> >>> Hello, >>> >>> I have a problem using MPI_Intercomm_create. >>> >>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn >>> operations by T0. >>> >>> So I have two intra-communicator : >>> >>> intra0 contains : T0, T1, T2 >>> intra1 contains : T0, T3, T4 >>> >>> my goal is to make a collective loop to build a single intra-communicator >>> containing T0, T1, T2, T3, T4 >>> >>> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls, >>> but without success (I always get MPI internal errors). >>> >>> What I am doing : >>> >>> on T0 : >>> *** >>> >>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com) >>> >>> on T1 and T2 : >>> ** >>> >>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com) >>> >>> on T3 and T4 : >>> ** >>> >>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com) >>> >>> >>> I'm certainly missing something. Could anybody help me to solve this >>> problem ? >>> >>> Best regards, >>> >>> Frédéric. >>> >>> PS : of course I did an extensive web search without finding anything >>> usefull on my problem. >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Edgar Gabriel > Assistant Professor > Parallel Software Technologies Lab http://pstl.cs.uh.edu > Department of Computer Science University of Houston > Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA > Tel: +1 (713) 743-3857 Fa
Re: [OMPI users] Problem with MPI_Intercomm_create
George, I did not look over all the details of your test, but it looks to me like you are violating one of the requirements of intercomm_create namely the request that the two groups have to be disjoint. In your case the parent process(es) are part of both local intra-communicators, isn't it? I just have MPI-1.1. at hand right now, but here is what it says: Overlap of local and remote groups that are bound into an inter-communicator is prohibited. If there is overlap, then the program is erroneous and is likely to deadlock. so bottom line is that the two local intra-communicators that are being used have to be disjoint, and the bridgecomm needs to be a communicator where at least one process of each of the two disjoint groups need to be able to talk to each other. Interestingly I did not find a sentence whether it is allowed to be the same process, or whether the two local leaders need to be separate processes... Thanks Edgar On 6/7/2011 12:57 AM, George Bosilca wrote: > Frederic, > > Attached you will find an example that is supposed to work. The main > difference with your code is on T3, T4 where you have inversed the local and > remote comm. As depicted on the picture attached below, during the 3th step > you will create the intercomm between ab and c (no overlap) using ac as a > bridge communicator (here the two roots, a and c, can exchange messages). > > Based on the MPI 2.2 standard, especially on the paragraph in PS:, the > attached code should have been working. Unfortunately, I couldn't run it > successfully neither with Open MPI trunk nor MPICH2 1.4rc1. > > george. > > PS: Here is what the MPI standard states about the MPI_Intercomm_create: >> The function MPI_INTERCOMM_CREATE can be used to create an >> inter-communicator from two existing intra-communicators, in the following >> situation: At least one selected member from each group (the “group leader”) >> has the ability to communicate with the selected member from the other >> group; that is, a “peer” communicator exists to which both leaders belong, >> and each leader knows the rank of the other leader in this peer >> communicator. Furthermore, members of each group know the rank of their >> leader. > > > > > > > > > > On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote: > >> Hello, >> >> I have a problem using MPI_Intercomm_create. >> >> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn >> operations by T0. >> >> So I have two intra-communicator : >> >> intra0 contains : T0, T1, T2 >> intra1 contains : T0, T3, T4 >> >> my goal is to make a collective loop to build a single intra-communicator >> containing T0, T1, T2, T3, T4 >> >> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls, >> but without success (I always get MPI internal errors). >> >> What I am doing : >> >> on T0 : >> *** >> >> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com) >> >> on T1 and T2 : >> ** >> >> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com) >> >> on T3 and T4 : >> ** >> >> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com) >> >> >> I'm certainly missing something. Could anybody help me to solve this >> problem ? >> >> Best regards, >> >> Frédéric. >> >> PS : of course I did an extensive web search without finding anything >> usefull on my problem. >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 signature.asc Description: OpenPGP digital signature
Re: [OMPI users] parallel I/O on 64-bit indexed arays
On Jun 7, 2011, at 4:53 AM, Troels Haugboelle wrote: > In principle yes, but the problem is we have an unequal amount of particles > on each node, so the length of each array is not guaranteed to be divisible > by 2, 4 or any other number. If I have understood the definition of > MPI_TYPE_CREATE_SUBARRAY correctly the offset can be 64-bit, but not the > global array size, so, optimally, what I am looking for is something that has > unequal size for each thread, simple vector, and with 64-bit offsets and > global array size. It's a bit awkward, but you can still make datatypes to give the offset that you want. E.g., if you need an offset of 2B+31 bytes, you can make datatype A with type contig of N=(2B/sizeof(int)) int's. Then make datatype B with type struct, containing type A and 31 MPI_BYTEs. Then use 1 instance of datatype B to get the offset that you want. You could make utility functions that, given a specific (64 bit) offset, it makes an MPI datatype that matches the offset, and then frees it (and all sub-datatypes). There is a bit of overhead in creating these datatypes, but it should be dwarfed by the amount of data that you're reading/writing, right? It's awkward, but it should work. > Another possible workaround would be to identify subsections that do not pass > 2B elements, make sub communicators, and then let each of them dump their > elements with proper offsets. It may work. The problematic architecture is a > BG/P. On other clusters doing simple I/O, letting all threads open the file, > seek to their position, and then write their chunk works fine, but somehow on > BG/P performance drops dramatically. My guess is that there is some file > locking, or we are overwhelming the I/O nodes.. > >> This ticket for the MPI-3 standard is a first step in the right direction, >> but won't do everything you need (this is more FYI): >> >> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/265 >> >> See the PDF attached to the ticket; it's going up for a "first reading" in a >> month. It'll hopefully be part of the MPI-3 standard by the end of the year >> (Fab Tillier, CC'ed, has been the chief proponent of this ticket for the >> past several months). >> >> Quincey Koziol from the HDF group is going to propose a follow on to this >> ticket, specifically about the case you're referring to -- large counts for >> file functions and datatype constructors. Quincey -- can you expand on what >> you'll be proposing, perchance? > > Interesting, I think something along the lines of the note would be very > useful and needed for large applications. > > Thanks a lot for the pointers and your suggestions, > > cheers, > > Troels -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Problem with MPI_Intercomm_create
George -- Do we need to file a bug about this? On Jun 7, 2011, at 1:57 AM, George Bosilca wrote: > Frederic, > > Attached you will find an example that is supposed to work. The main > difference with your code is on T3, T4 where you have inversed the local and > remote comm. As depicted on the picture attached below, during the 3th step > you will create the intercomm between ab and c (no overlap) using ac as a > bridge communicator (here the two roots, a and c, can exchange messages). > > Based on the MPI 2.2 standard, especially on the paragraph in PS:, the > attached code should have been working. Unfortunately, I couldn't run it > successfully neither with Open MPI trunk nor MPICH2 1.4rc1. > > george. > > PS: Here is what the MPI standard states about the MPI_Intercomm_create: >> The function MPI_INTERCOMM_CREATE can be used to create an >> inter-communicator from two existing intra-communicators, in the following >> situation: At least one selected member from each group (the “group leader”) >> has the ability to communicate with the selected member from the other >> group; that is, a “peer” communicator exists to which both leaders belong, >> and each leader knows the rank of the other leader in this peer >> communicator. Furthermore, members of each group know the rank of their >> leader. > > Attachment.txt> -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] ifort 12.0.4 install problem
Thanks for your reply In fact as you said it was a strange problem with the .bashrc with sudo make install I have the error, but with sudo bash and then make install the error disappears. I did'nt understand why Everything seems OK now, no more problem at runtime. Thanks 2011/6/7 Jeff Squyres : > On Jun 6, 2011, at 10:43 AM, Virginie trinite wrote: > >> I try to compile open-mpi with ifort 12.0.4. My system is ubuntu >> lucid. Previous intallation with ifort 11.1 was fine. >> >> configure and make all seems to work well, but make install report an error: >> libtool: line 7847: icc: command not found >> libtool: install: error: relink `lipopen-rte.la' with the above >> command before installing it >> >> I want to underline that icc is a knom command for bash. > > Somehow it became unknown. Is your PATH being reset somehow? Or perhaps if > your .bashrc resetting your PATH such that even if "which icc" finds it at > the shell prompt, if sub-shells have your .bashrc invoked, the PATH gets > reset (or the icc settings don't get inherited properly), and therefore it > becomes unknown...? > >> I have check the FAQ and it seems to me that the problem is more like >> the one report for IBM compiler. So I try with > > I'm a little confused why you're mentioning the IBM compiler...? This issue > is a shell/build issue (I assume...? You only sent a few lines from the > output, so I can't tell exactly where the error is occurring). > >> configure CC=icc CXX=icpc F77=ifort FC=ifort --disable-shared --enable-static >> Now the install finish without error, but when I try to run mpi I have >> error message: > > Now I'm very confused. :-\ > > Can you please send all the information listed here: > > http://www.open-mpi.org/community/help/ > > This will help me understand what the problem is and what you tried to do to > fix it. > > Thanks. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] parallel I/O on 64-bit indexed arays
If I understand your question correctly, this is *exactly* one of the reasons that the MPI Forum has been arguing about the use of a new type, "MPI_Count", for certain parameters that can get very, very large. Yes, that would help, but unfortunately only in the future. Sidenote: I believe that a workaround for you is to create some new MPI datatypes (e.g., of type contiguous) that you can then use to multiply to get to the offsets that you want. I.e., if you make a type contig datatype of 4 doubles, you can still only specify up to 2B of them, but that will now get you up to an offset of (2B * 4 * sizeof(double)) rather than (2B * sizeof(double)). Make sense? In principle yes, but the problem is we have an unequal amount of particles on each node, so the length of each array is not guaranteed to be divisible by 2, 4 or any other number. If I have understood the definition of MPI_TYPE_CREATE_SUBARRAY correctly the offset can be 64-bit, but not the global array size, so, optimally, what I am looking for is something that has unequal size for each thread, simple vector, and with 64-bit offsets and global array size. Another possible workaround would be to identify subsections that do not pass 2B elements, make sub communicators, and then let each of them dump their elements with proper offsets. It may work. The problematic architecture is a BG/P. On other clusters doing simple I/O, letting all threads open the file, seek to their position, and then write their chunk works fine, but somehow on BG/P performance drops dramatically. My guess is that there is some file locking, or we are overwhelming the I/O nodes.. This ticket for the MPI-3 standard is a first step in the right direction, but won't do everything you need (this is more FYI): https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/265 See the PDF attached to the ticket; it's going up for a "first reading" in a month. It'll hopefully be part of the MPI-3 standard by the end of the year (Fab Tillier, CC'ed, has been the chief proponent of this ticket for the past several months). Quincey Koziol from the HDF group is going to propose a follow on to this ticket, specifically about the case you're referring to -- large counts for file functions and datatype constructors. Quincey -- can you expand on what you'll be proposing, perchance? Interesting, I think something along the lines of the note would be very useful and needed for large applications. Thanks a lot for the pointers and your suggestions, cheers, Troels
Re: [OMPI users] Problem with MPI_Intercomm_create
Frederic, Attached you will find an example that is supposed to work. The main difference with your code is on T3, T4 where you have inversed the local and remote comm. As depicted on the picture attached below, during the 3th step you will create the intercomm between ab and c (no overlap) using ac as a bridge communicator (here the two roots, a and c, can exchange messages). Based on the MPI 2.2 standard, especially on the paragraph in PS:, the attached code should have been working. Unfortunately, I couldn't run it successfully neither with Open MPI trunk nor MPICH2 1.4rc1. george. PS: Here is what the MPI standard states about the MPI_Intercomm_create: > The function MPI_INTERCOMM_CREATE can be used to create an inter-communicator > from two existing intra-communicators, in the following situation: At least > one selected member from each group (the “group leader”) has the ability to > communicate with the selected member from the other group; that is, a “peer” > communicator exists to which both leaders belong, and each leader knows the > rank of the other leader in this peer communicator. Furthermore, members of > each group know the rank of their leader. intercomm_create.c Description: Binary data PastedGraphic-2.pdf Description: Adobe PDF document On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote: > Hello, > > I have a problem using MPI_Intercomm_create. > > I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn > operations by T0. > > So I have two intra-communicator : > > intra0 contains : T0, T1, T2 > intra1 contains : T0, T3, T4 > > my goal is to make a collective loop to build a single intra-communicator > containing T0, T1, T2, T3, T4 > > I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls, > but without success (I always get MPI internal errors). > > What I am doing : > > on T0 : > *** > > MPI_Intercom_create(intra0,0,intra1,0,1,&new_com) > > on T1 and T2 : > ** > > MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com) > > on T3 and T4 : > ** > > MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com) > > > I'm certainly missing something. Could anybody help me to solve this > problem ? > > Best regards, > > Frédéric. > > PS : of course I did an extensive web search without finding anything > usefull on my problem. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users