Re: [OMPI users] ORTE_ERROR: orte_ess_base_open failed
I have no further ideas, I'm afraid. The only thing I can see is that your directory tree doesn't look right - if /usr/local is your prefix, then there should be a /usr/local/lib/openmpi directory, and the .la's should be in there. You might try reinstalling it to somewhere other than /usr/local - perhaps put it somewhere under your home directory instead so you don't need root permissions to do the install. See if the directory tree looks any different. It would also help to see your configure line, and know something more about your system. It looks like you have slurm, so I assume this is some kind of Linux box? On Aug 26, 2012, at 7:23 PM, Shanthini Kannan wrote: > Hello Ralph, > /usr/local/lib is in my LD_LIBRARY_PATH. > I am running the right version of mpirun and I do have all permissions on > them. > > Thanks! > Shanthini > > On Fri, Aug 24, 2012 at 7:30 PM, Ralph Castain wrote: > And just to be sure - /usr/local/lib is in your ld_lib_path, yes? > > You might also check the permissions to ensure you can read them. Also, check > "which mpirun" - let's make sure you are running the one you think! > > On Aug 24, 2012, at 4:22 PM, Shanthini Kannan wrote: > >> Thanks Ralph. >> My prefix is /usr/local and I see that mca_ess_env.la is present in >> /usr/local/lib directory. >> >> -bash-4.2# pwd >> /usr/local/lib >> -bash-4.2# ls mca_ess* >> mca_ess_env.la mca_ess_singleton.la mca_ess_slurm.la mca_ess_tool.la >> mca_ess_env.so mca_ess_singleton.so mca_ess_slurm.so mca_ess_tool.so >> mca_ess_hnp.la mca_ess_slave.la mca_ess_slurmd.la >> mca_ess_hnp.so mca_ess_slave.so mca_ess_slurmd.so >> -bash-4.2# >> >> On Fri, Aug 24, 2012 at 7:13 PM, Ralph Castain wrote: >> Check you /lib directory - there should be an openmpi directory >> under it, and that should have a bunch of libs in it. One of those should be >> mca_ess_env.la >> >> Is it there? >> >> On Aug 24, 2012, at 3:27 PM, Shanthini Kannan wrote: >> >>> I had the OMPI lib directory added in /etc/ld.so.conf. >>> I also added it in LD_LIBRARY_PATH, but it made no difference. >>> When I call mpirun, should I specify the MCA in command-line? >>> Thanks! >>> >>> On Fri, Aug 24, 2012 at 2:07 PM, Ralph Castain wrote: >>> I suspect your LD_LIBRARY_PATH doesn't include the OMPI lib location >>> >>> On Aug 24, 2012, at 10:58 AM, Shanthini Kannan >>> wrote: >>> Hello, I am running mpptest over Open MPI (v1.5.4). I get the following error saying component "env" in Framework "ess" is not found. Am I missing something? I am new to MPI and any help you can offer is appreciated. A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: AV8 Framework: ess Component: env -- [AV8:05354] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 120 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_base_open failed --> Returned value Error (-1) instead of ORTE_SUCCESS Thanks! Shanthini ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] ORTE_ERROR: orte_ess_base_open failed
Hello Ralph, /usr/local/lib is in my LD_LIBRARY_PATH. I am running the right version of mpirun and I do have all permissions on them. Thanks! Shanthini On Fri, Aug 24, 2012 at 7:30 PM, Ralph Castain wrote: > And just to be sure - /usr/local/lib is in your ld_lib_path, yes? > > You might also check the permissions to ensure you can read them. Also, > check "which mpirun" - let's make sure you are running the one you think! > > On Aug 24, 2012, at 4:22 PM, Shanthini Kannan > wrote: > > Thanks Ralph. > My prefix is /usr/local and I see that mca_ess_env.la is present in > /usr/local/lib directory. > > -bash-4.2# pwd > /usr/local/lib > -bash-4.2# ls mca_ess* > mca_ess_env.la mca_ess_singleton.la mca_ess_slurm.la mca_ess_tool.la > mca_ess_env.so mca_ess_singleton.so mca_ess_slurm.so mca_ess_tool.so > mca_ess_hnp.la mca_ess_slave.la mca_ess_slurmd.la > mca_ess_hnp.so mca_ess_slave.so mca_ess_slurmd.so > -bash-4.2# > > On Fri, Aug 24, 2012 at 7:13 PM, Ralph Castain wrote: > >> Check you /lib directory - there should be an openmpi directory >> under it, and that should have a bunch of libs in it. One of those should >> be mca_ess_env.la >> >> Is it there? >> >> On Aug 24, 2012, at 3:27 PM, Shanthini Kannan >> wrote: >> >> I had the OMPI lib directory added in /etc/ld.so.conf. >> I also added it in LD_LIBRARY_PATH, but it made no difference. >> When I call mpirun, should I specify the MCA in command-line? >> Thanks! >> >> On Fri, Aug 24, 2012 at 2:07 PM, Ralph Castain wrote: >> >>> I suspect your LD_LIBRARY_PATH doesn't include the OMPI lib location >>> >>> On Aug 24, 2012, at 10:58 AM, Shanthini Kannan >>> wrote: >>> >>> Hello, >>> I am running mpptest over Open MPI (v1.5.4). >>> I get the following error saying component "env" in Framework "ess" is >>> not found. Am I missing something? I am new to MPI and any help you can >>> offer is appreciated. >>> >>> A requested component was not found, or was unable to be opened. This >>> means that this component is either not installed or is unable to be >>> used on your system (e.g., sometimes this means that shared libraries >>> that the component requires are unable to be found/loaded). Note that >>> Open MPI stopped checking at the first component that it did not find. >>> >>> Host: AV8 >>> Framework: ess >>> Component: env >>> >>> -- >>> [AV8:05354] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file >>> runtime/orte_init.c at line 120 >>> >>> -- >>> It looks like orte_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during orte_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> orte_ess_base_open failed >>> --> Returned value Error (-1) instead of ORTE_SUCCESS >>> >>> Thanks! >>> Shanthini >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter
Hello Hristo, Jeff, Thanks a lot for your note. I understand the concept much better now. In fact, now I understand why the word "maximum number of elements in the receive buffer" in all of the documentation means. However, I still think that the online documentation is confusing (and little vague), and could be worded better. It is worsened by the fact that all other sites simply copy the description verbatim. Thanks a lot anyway! Devendra From: "Iliev, Hristo" To: devendra rai Cc: Open MPI Users Sent: Tuesday, 21 August 2012, 10:37 Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter Hello, Devendra, Sending and receiving messages in MPI are atomic operations - they complete only when the whole message was sent or received. MPI_Test only tells you if the operation has completed - there is no indication like "30% of the message was sent/received, stay tuned for more". On the sender side, the message is constructed by taking bytes from various locations in memory, specified by the type map of the MPI datatype used. Then on the receiver side the message is deconstructed back into memory by placing the received bytes according to the type map of the MPI datatype provided. The combination of receive datatype and receive count gives you a certain number of bytes (that is the type size obtainable by MPI_Type_size times "count"). If the message is shorter, that means that some elements of the receive buffer will not be filled, which is OK - you can test exactly how many elements were filled with MPI_Get_count on the status of the receive operation. If the message was longer, however, there won't be enough place to put all the data that the message is carrying and an overflow error would occur. This works best by example. Image that in one process you issue: MPI_Send(data, 80, MPI_BYTE, ...); This will send a message containing 80 elements of type byte. Now on the receiver side you issue: MPI_Irecv(data, 160, MPI_BYTE, ..., &request); What will happen is that the message will be received in its entirety since 80 times the size of MPI_BYTE is less than or equal to 160 times the size of MPI_BYTE. Calling MPI_Test on "request" will produce true in the completion flag and you will get back a status variable (unless you provided MPI_STATUS_IGNORE) and then you can call: MPI_Get_count(&status, MPI_BYTE, &count); Now "count" will contain 80 - the actual number of elements received. But if the receive operation was instead: MPI_Irecv(data, 40, MPI_BYTE, ..., &request); since 40 times the size of MPI_BYTE is less than the size of the message, there will be not enough space to receive the entire message and an overflow error would occur. The MPI_Irecv itself only initiates the receive operation and will not return an error. Rather you will obtain the overflow error in the MPI_ERROR field of the status argument, returned by MPI_Test (the test call itself will return MPI_SUCCESS). Since MPI operations are atomic, you cannot send a message of 160 element and then receive it with two separate receives of size 80 - this is very important and it is difficult to grasp initially by people, who come to MPI from the traditional Unix network programming. I would recommend that you head to http://www.mpi-forum.org/ and download from there the PDF of the latest MPI 2.2 standard (or order the printed book). Unlike many other standard documents this one is actually readable by normal people and contains many useful explanations and examples. Read through entire section 3.2 to get a better idea of how messaging works in MPI. Hope that helps to clarify things, Hristo On 21.08.2012, at 10:01, devendra rai wrote: Hello Jeff and Hristo, > >Now I am completely confused: > >So, let's say, the complete reception requires 8192 bytes. And, I have: > >MPI_Irecv( > (void*)this->receivebuffer,/* the receive buffer */ > this->receive_packetsize, /* 80 */ > MPI_BYTE, /* The data type expected >*/ > this->transmittingnode, /* The node from which to >receive */ > this->uniquetag, /* Tag */ > MPI_COMM_WORLD, /* Communicator */ > &Irecv_request /* request handle */ > ); > > > > > >That means, the the MPI_Test will tell me that the reception is complete when >I have received the first 80 bytes. Correct? > > >Next, let[s say that I have a receive buffer with a capacity of 160 bytes, >then, will overflow error occur here? Even if I have decided to receive a >large payload in chunks of 80 bytes? > > >I am sorry, the manual and the API reference was too vague for me. > > >Thanks a lot > > >Devendra > > > > From: "Iliev, Hristo" >To: Open MPI Users >Cc: devendra rai
Re: [OMPI users] openmpi 1.6.1 Questions
Thanks and super cool. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote: > On Aug 24, 2012, at 10:45 AM, Brock Palen wrote: > >>> Right now we should be just warning if we can't register 3/4 of your >>> physical memory (we can't really test for anything more than that). But it >>> doesn't abort. >> Ok >> >>> We could add a tunable that makes it abort in this case, if you think that >>> would be useful. >> I think so, in my case that would mean a node is miss-configured, and rather >> than running slowly I want it brought to my attention, > > > Ok, this is easy enough to add. Due to a PGI compilation issue, it looks > like we're going to probably roll a 1.6.2 in the immediate future; we can > include this in there. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users