Re: [OMPI devel] Locality info
On Oct 19, 2011, at 5:05 PM, George Bosilca wrote: > Wonderful!!! We've been waiting for such functionality for a while. My pleasure :-) > > I do have some questions/remarks related to this patch. > > What is the my_node_rank in the orte_proc_info_t structure? The node rank is a local ranking of procs on a node, starting with 0 for the lowest vpid on the node and going up from there. It normally was passed in the environment and picked up in the ess components so it could be used to select a static port during oob init, if those were specified. I moved it to a more general place solely because I wanted to move a bunch of replicated code to the ess/base instead of having it in nearly every module. I debated about putting it in ess/base.h instead, but since other places in the code might also want it, figured I'd make it more globally available. If it turns out nobody needs it, we can move it back into just the ess. > Is there any difference between using the field my_node_rank or the vpid part > of the my_daemon? Yes - my_daemon refers to the local daemon. The node rank refers solely to the relative ranking of application procs on the node. > What is the correct way of finding that two processes are on the same remote > location, comparing their daemon vpid or their node_rank? Daemon vpid > How the node_rank change with respect to dynamic process management when new > daemons are joining? This is where node_rank comes into play. The mapper sees across jobs that are sharing nodes, so the mapper currently is responsible for computing the node_rank of a proc. This info gets transmitted to all daemons, including new dynamically started ones, in the launch msg. So everyone always has a picture of the node_rank for every proc. > > The flag OPAL_PROC_ON_L*CACHE is only set for local processes if I understand > correctly your last email? Yes - all the locality flags refer only to the location of another process relative to you, you being an app process. As I said, though, this can easily be extended to return the relative locality of two procs on a remote node, if that would be of use. > > I guess proc_flags in proc.h should be opal_paffinity_locality_t to match the > flags on the ORTE level? My bad - I thought I had changed it? If not, it certainly needs to be... > > A more high level remark. The fact that the locality information is > automatically packed and exchanged during the grpcomm modex call seems a > little bit weird (do the upper level have a saying on it?). I would not have > thought that the grpcomm (which based on the grpcomm.h header file is a > framework providing communication services that span entire jobs or > collections of processes) is the place to put it. I agree - I wasn't entirely sure where to put it, frankly. It needs to be somewhere that both direct launch and mpirun-launched apps can see it. Could go in the MPI layer, I suppose. Suggestions welcome! > > Thanks, > george. > > > On Oct 19, 2011, at 16:28 , Ralph Castain wrote: > >> Hi folks >> >> For those of you who don't follow the commits... >> >> I just committed (r25323) an extension of the orte_ess.proc_get_locality >> function that allows a process to get its relative resource usage with any >> other proc in the job. In other words, you can provide a process name to the >> function, and the returned bitmask tells you if you share a node, numa, >> socket, caches (by level), core, and hyperthread with that process. >> >> If you are on the same node and unbound, of course, you share all of those. >> However, if you are bound, then this can help tell you if you are on a >> common numa node, sharing an L1 cache, etc. Might be handy. >> >> I implemented the underlying functionality so that we can further extend it >> to tell you the relative resource location of two procs on a remote node. If >> that someday becomes of interest, it would be relatively easy to do - but >> would require passing more info around. Hence, I've allowed for it, but not >> implemented it until there is some identified need. >> >> Locality info is available anytime after the modex is completed during >> MPI_Init, and is supported regardless of launch environment (minus cnos, for >> now), launch by mpirun, or direct-launch - in other words, pretty much >> always. >> >> Hope it proves of help in your work >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Locality info
Wonderful!!! We've been waiting for such functionality for a while. I do have some questions/remarks related to this patch. What is the my_node_rank in the orte_proc_info_t structure? Is there any difference between using the field my_node_rank or the vpid part of the my_daemon? What is the correct way of finding that two processes are on the same remote location, comparing their daemon vpid or their node_rank? How the node_rank change with respect to dynamic process management when new daemons are joining? The flag OPAL_PROC_ON_L*CACHE is only set for local processes if I understand correctly your last email? I guess proc_flags in proc.h should be opal_paffinity_locality_t to match the flags on the ORTE level? A more high level remark. The fact that the locality information is automatically packed and exchanged during the grpcomm modex call seems a little bit weird (do the upper level have a saying on it?). I would not have thought that the grpcomm (which based on the grpcomm.h header file is a framework providing communication services that span entire jobs or collections of processes) is the place to put it. Thanks, george. On Oct 19, 2011, at 16:28 , Ralph Castain wrote: > Hi folks > > For those of you who don't follow the commits... > > I just committed (r25323) an extension of the orte_ess.proc_get_locality > function that allows a process to get its relative resource usage with any > other proc in the job. In other words, you can provide a process name to the > function, and the returned bitmask tells you if you share a node, numa, > socket, caches (by level), core, and hyperthread with that process. > > If you are on the same node and unbound, of course, you share all of those. > However, if you are bound, then this can help tell you if you are on a common > numa node, sharing an L1 cache, etc. Might be handy. > > I implemented the underlying functionality so that we can further extend it > to tell you the relative resource location of two procs on a remote node. If > that someday becomes of interest, it would be relatively easy to do - but > would require passing more info around. Hence, I've allowed for it, but not > implemented it until there is some identified need. > > Locality info is available anytime after the modex is completed during > MPI_Init, and is supported regardless of launch environment (minus cnos, for > now), launch by mpirun, or direct-launch - in other words, pretty much always. > > Hope it proves of help in your work > Ralph > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
There are several OPAL level error codes not used in the current code. OPAL_ERR_TOPO_SLOT_LIST_NOT_SUPPORTED OPAL_ERR_TOPO_SOCKET_NOT_SUPPORTED OPAL_ERR_TOPO_CORE_NOT_SUPPORTED OPAL_ERR_NOT_ENOUGH_SOCKETS OPAL_ERR_NOT_ENOUGH_CORES OPAL_ERR_INVALID_PHYS_CPU OPAL_ERR_MULTIPLE_AFFINITIES If somebody feels like filling up an RFC to remove them, please feel free to go ahead. george. On Oct 19, 2011, at 18:41 , George Bosilca wrote: > A careful reading of the committed patch, would have pointed out that none of > the concerns raised so far were true, the "old-way" behavior of the OMPI code > was preserved. Moreover, every single of the error codes removed were not > used in ages. > > What Brian pointed out as evil, evil being a subjective notion by itself, > didn't prevent the correct behavior of the code, nor affected in any way it's > correctness. Anyway, to address his concern I pushed a patch (25333) putting > the OMPI error codes back where they were originally. > > In other words we spent a very unproductive day, arguing over unfounded > arguments and "thought-to-be" behaviors. > > george. > > > On Oct 19, 2011, at 17:50 , Barrett, Brian W wrote: > >> George - >> >> I wrote the error code gorp; I'm pretty sure I know exactly how it was >> supposed to work. >> >> There are 58 codes unused between OPAL_NETWORK_NOT_PARSEABLE and >> OPAL_ERR_MAX. I now see what you did with ERR_REQUEST, and it's evil. >> THat's not the intent of the error code logic at all. If you want to >> change that, I'm not necessarily opposed to it, but that's something that >> should be discussed in an RFC. What the current code does is not >> consistent with the original intent. >> >> I don't agree that you shouldn't propagate error codes through OMPI; in >> fact, the original intent of the design was to allow such propagation. >> Again, such a change should be discussed as part of an RFC. >> >> Brian >> >> On 10/19/11 4:50 PM, "George Bosilca"wrote: >> >>> I don't know how you think that the error codes work in Open MPI, so I'll >>> take the liberty to depict it here so we all agree we're talking about >>> the same thing. >>> >>> The opal_strerror is a nice feature, it allow to register a range of >>> error codes with a particular error converter. Every time you look for >>> the meaning of a particular error code, the first convertor with a range >>> enveloping the looked at value, will translate it into an error string. >>> >>> This is only currently used by OPAL and ORTE directly. It worked at the >>> OMPI level only because we mapped __all__ OMPI errors to OPAL or ORTE >>> ones. This behavior didn't change after my patch, you can still use >>> opal_strerror to get the error string for all OPAL/ORTE/OMPI errors. >>> >>> There is a small "variation" for OMPI_ERR_REQUEST, the only really OMPI >>> specific error code today. The OMPI error codes are actually inserted >>> between the OPAL and the ORTE ones (there is a gap of 100 elements), so >>> there is __no__ possible overlap right now. If at one point we add more >>> than 100 OMPI level, we should certainly revisit this. >>> >>> Now, resulting from my patch, there is a difference. One should not >>> simply forward an ORTE code into the stack of OMPI, and hope it just >>> works. Errors should be dealt with where they happens, and if not >>> possible they should be translated into the actual layer error code. The >>> error propagation should be compartmentalized, and has to be translated >>> into an error code that has a meaning at the OMPI level. The current >>> patch should not prevent the mixed error-code code to work, as >>> opal_strerror retains the same behavior as before. However, this coding >>> practice should be avoided. I tried to clean the current code of such >>> instances few days ago in r25230. >>> >>> Moreover, this is similar to how we deal with the error codes between >>> OMPI and MPI layers, and seems like a sane way to compose libraries. You >>> deal with a specific layer error code when you get it (usually after the >>> call to a function from that specific layer), not later on when you don't >>> even know exactly what the execution path was. >>> >>> george. >>> >>> PS: I'll fix the +/- issue. >>> >>> On Oct 19, 2011, at 14:09 , Jeff Squyres wrote: >>> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error codes. That seems like a very bad idea (in addition to the mixing of + and -). For one thing, that breaks opal_strerror(). That, in itself, seems like a dealbreaker. On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: > I actually think it's worse than that. An ORTE error code can now have > the same error code as an OMPI error. OMPI_ERR_REQUEST and > ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. > Or, they should, if George hadn't made a mistake (see below). The > sharing >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
A careful reading of the committed patch, would have pointed out that none of the concerns raised so far were true, the "old-way" behavior of the OMPI code was preserved. Moreover, every single of the error codes removed were not used in ages. What Brian pointed out as evil, evil being a subjective notion by itself, didn't prevent the correct behavior of the code, nor affected in any way it's correctness. Anyway, to address his concern I pushed a patch (25333) putting the OMPI error codes back where they were originally. In other words we spent a very unproductive day, arguing over unfounded arguments and "thought-to-be" behaviors. george. On Oct 19, 2011, at 17:50 , Barrett, Brian W wrote: > George - > > I wrote the error code gorp; I'm pretty sure I know exactly how it was > supposed to work. > > There are 58 codes unused between OPAL_NETWORK_NOT_PARSEABLE and > OPAL_ERR_MAX. I now see what you did with ERR_REQUEST, and it's evil. > THat's not the intent of the error code logic at all. If you want to > change that, I'm not necessarily opposed to it, but that's something that > should be discussed in an RFC. What the current code does is not > consistent with the original intent. > > I don't agree that you shouldn't propagate error codes through OMPI; in > fact, the original intent of the design was to allow such propagation. > Again, such a change should be discussed as part of an RFC. > > Brian > > On 10/19/11 4:50 PM, "George Bosilca"wrote: > >> I don't know how you think that the error codes work in Open MPI, so I'll >> take the liberty to depict it here so we all agree we're talking about >> the same thing. >> >> The opal_strerror is a nice feature, it allow to register a range of >> error codes with a particular error converter. Every time you look for >> the meaning of a particular error code, the first convertor with a range >> enveloping the looked at value, will translate it into an error string. >> >> This is only currently used by OPAL and ORTE directly. It worked at the >> OMPI level only because we mapped __all__ OMPI errors to OPAL or ORTE >> ones. This behavior didn't change after my patch, you can still use >> opal_strerror to get the error string for all OPAL/ORTE/OMPI errors. >> >> There is a small "variation" for OMPI_ERR_REQUEST, the only really OMPI >> specific error code today. The OMPI error codes are actually inserted >> between the OPAL and the ORTE ones (there is a gap of 100 elements), so >> there is __no__ possible overlap right now. If at one point we add more >> than 100 OMPI level, we should certainly revisit this. >> >> Now, resulting from my patch, there is a difference. One should not >> simply forward an ORTE code into the stack of OMPI, and hope it just >> works. Errors should be dealt with where they happens, and if not >> possible they should be translated into the actual layer error code. The >> error propagation should be compartmentalized, and has to be translated >> into an error code that has a meaning at the OMPI level. The current >> patch should not prevent the mixed error-code code to work, as >> opal_strerror retains the same behavior as before. However, this coding >> practice should be avoided. I tried to clean the current code of such >> instances few days ago in r25230. >> >> Moreover, this is similar to how we deal with the error codes between >> OMPI and MPI layers, and seems like a sane way to compose libraries. You >> deal with a specific layer error code when you get it (usually after the >> call to a function from that specific layer), not later on when you don't >> even know exactly what the execution path was. >> >> george. >> >> PS: I'll fix the +/- issue. >> >> On Oct 19, 2011, at 14:09 , Jeff Squyres wrote: >> >>> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error >>> codes. That seems like a very bad idea (in addition to the mixing of + >>> and -). >>> >>> For one thing, that breaks opal_strerror(). That, in itself, seems >>> like a dealbreaker. >>> >>> >>> On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: >>> I actually think it's worse than that. An ORTE error code can now have the same error code as an OMPI error. OMPI_ERR_REQUEST and ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. Or, they should, if George hadn't made a mistake (see below). The sharing of return codes seems... bad. Also, there's a bug in George's patch. Error codes are all negative, so OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be OMPI_ERR_BASE - 1, not plus 2. Brian On 10/19/11 1:32 PM, "Ralph Castain" wrote: > I've been wrestling with something from this commit, and I'm unsure of > the right answer. So please consider this a general design question > for > the community. > > This commit removes all the
[OMPI devel] RFC: upgrade to libevent 2.0.13 (removing 2.0.7)
WHAT: upgrade to libevent 2.0.13 WHY: libevent bug fixes WHEN: Nov 2, 2011 TIMEOUT: 2 weeks *** Jeff, Ralph, and I have been using the libevent2013 component for the last month without issue. In 2 weeks I will: - remove opal/mca/event/libevent207 - remove opal/mca/event/libevent2013/.ompi_ignore - remove opal/mca/event/libevent2013/.ompi_unignore -Nathan
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
George - I wrote the error code gorp; I'm pretty sure I know exactly how it was supposed to work. There are 58 codes unused between OPAL_NETWORK_NOT_PARSEABLE and OPAL_ERR_MAX. I now see what you did with ERR_REQUEST, and it's evil. THat's not the intent of the error code logic at all. If you want to change that, I'm not necessarily opposed to it, but that's something that should be discussed in an RFC. What the current code does is not consistent with the original intent. I don't agree that you shouldn't propagate error codes through OMPI; in fact, the original intent of the design was to allow such propagation. Again, such a change should be discussed as part of an RFC. Brian On 10/19/11 4:50 PM, "George Bosilca"wrote: >I don't know how you think that the error codes work in Open MPI, so I'll >take the liberty to depict it here so we all agree we're talking about >the same thing. > >The opal_strerror is a nice feature, it allow to register a range of >error codes with a particular error converter. Every time you look for >the meaning of a particular error code, the first convertor with a range >enveloping the looked at value, will translate it into an error string. > >This is only currently used by OPAL and ORTE directly. It worked at the >OMPI level only because we mapped __all__ OMPI errors to OPAL or ORTE >ones. This behavior didn't change after my patch, you can still use >opal_strerror to get the error string for all OPAL/ORTE/OMPI errors. > >There is a small "variation" for OMPI_ERR_REQUEST, the only really OMPI >specific error code today. The OMPI error codes are actually inserted >between the OPAL and the ORTE ones (there is a gap of 100 elements), so >there is __no__ possible overlap right now. If at one point we add more >than 100 OMPI level, we should certainly revisit this. > >Now, resulting from my patch, there is a difference. One should not >simply forward an ORTE code into the stack of OMPI, and hope it just >works. Errors should be dealt with where they happens, and if not >possible they should be translated into the actual layer error code. The >error propagation should be compartmentalized, and has to be translated >into an error code that has a meaning at the OMPI level. The current >patch should not prevent the mixed error-code code to work, as >opal_strerror retains the same behavior as before. However, this coding >practice should be avoided. I tried to clean the current code of such >instances few days ago in r25230. > >Moreover, this is similar to how we deal with the error codes between >OMPI and MPI layers, and seems like a sane way to compose libraries. You >deal with a specific layer error code when you get it (usually after the >call to a function from that specific layer), not later on when you don't >even know exactly what the execution path was. > > george. > >PS: I'll fix the +/- issue. > >On Oct 19, 2011, at 14:09 , Jeff Squyres wrote: > >> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error >>codes. That seems like a very bad idea (in addition to the mixing of + >>and -). >> >> For one thing, that breaks opal_strerror(). That, in itself, seems >>like a dealbreaker. >> >> >> On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: >> >>> I actually think it's worse than that. An ORTE error code can now have >>> the same error code as an OMPI error. OMPI_ERR_REQUEST and >>> ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. >>> Or, they should, if George hadn't made a mistake (see below). The >>>sharing >>> of return codes seems... bad. >>> >>> Also, there's a bug in George's patch. Error codes are all negative, >>>so >>> OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be >>> OMPI_ERR_BASE - 1, not plus 2. >>> >>> Brian >>> >>> On 10/19/11 1:32 PM, "Ralph Castain" wrote: >>> I've been wrestling with something from this commit, and I'm unsure of the right answer. So please consider this a general design question for the community. This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we used to declare OMPI-prefixed equivalents to every ORTE-prefixed constant. I understand the thinking (or at least, what I suspect was the thought), but it creates an issue. Suppose I have an ompi-level function (A) that calls another ompi-level function (B). Invisible to A is that B calls an orte-level function. B dutifully checks the error return from the orte-level function against an ORTE-prefixed constant. However, if that return isn't "success", what does B return up to A? It cannot return the OMPI equivalent to the orte error constant because it no longer exists. It could return the orte error code, but A has no way of knowing it is going to get a non-OMPI constant, and therefore won't be able to understand it -
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
On Oct 19, 2011, at 2:50 PM, George Bosilca wrote: > I don't know how you think that the error codes work in Open MPI, so I'll > take the liberty to depict it here so we all agree we're talking about the > same thing. > > The opal_strerror is a nice feature, it allow to register a range of error > codes with a particular error converter. Every time you look for the meaning > of a particular error code, the first convertor with a range enveloping the > looked at value, will translate it into an error string. > > This is only currently used by OPAL and ORTE directly. It worked at the OMPI > level only because we mapped __all__ OMPI errors to OPAL or ORTE ones. This > behavior didn't change after my patch, you can still use opal_strerror to get > the error string for all OPAL/ORTE/OMPI errors. > > There is a small "variation" for OMPI_ERR_REQUEST, the only really OMPI > specific error code today. The OMPI error codes are actually inserted between > the OPAL and the ORTE ones (there is a gap of 100 elements), so there is > __no__ possible overlap right now. If at one point we add more than 100 OMPI > level, we should certainly revisit this. > > Now, resulting from my patch, there is a difference. One should not simply > forward an ORTE code into the stack of OMPI, and hope it just works. Errors > should be dealt with where they happens, and if not possible they should be > translated into the actual layer error code. The error propagation should be > compartmentalized, and has to be translated into an error code that has a > meaning at the OMPI level. The current patch should not prevent the mixed > error-code code to work, as opal_strerror retains the same behavior as > before. However, this coding practice should be avoided. I tried to clean the > current code of such instances few days ago in r25230. > > Moreover, this is similar to how we deal with the error codes between OMPI > and MPI layers, and seems like a sane way to compose libraries. You deal with > a specific layer error code when you get it (usually after the call to a > function from that specific layer), not later on when you don't even know > exactly what the execution path was. I'll have to ponder your logic. Not saying I disagree, but it would have been much nicer if you had explained your intended purpose in an RFC before imposing such a philosophy. We were passing error codes up the ladder to allow higher levels to take corrective action that might extend beyond the scope of the immediate code - e.g., it might lead someone to use a specific different component in the framework if they knew that the RML was no longer working. We have lost that ability now, though we can regain it by defining OMPI error codes that don't equate to ORTE values, but retain the same meaning - and then translating as required. Not sure what that buys us, but maybe it will make some people feel better. > > george. > > PS: I'll fix the +/- issue. > > On Oct 19, 2011, at 14:09 , Jeff Squyres wrote: > >> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error >> codes. That seems like a very bad idea (in addition to the mixing of + and >> -). >> >> For one thing, that breaks opal_strerror(). That, in itself, seems like a >> dealbreaker. >> >> >> On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: >> >>> I actually think it's worse than that. An ORTE error code can now have >>> the same error code as an OMPI error. OMPI_ERR_REQUEST and >>> ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. >>> Or, they should, if George hadn't made a mistake (see below). The sharing >>> of return codes seems... bad. >>> >>> Also, there's a bug in George's patch. Error codes are all negative, so >>> OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be >>> OMPI_ERR_BASE - 1, not plus 2. >>> >>> Brian >>> >>> On 10/19/11 1:32 PM, "Ralph Castain"wrote: >>> I've been wrestling with something from this commit, and I'm unsure of the right answer. So please consider this a general design question for the community. This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we used to declare OMPI-prefixed equivalents to every ORTE-prefixed constant. I understand the thinking (or at least, what I suspect was the thought), but it creates an issue. Suppose I have an ompi-level function (A) that calls another ompi-level function (B). Invisible to A is that B calls an orte-level function. B dutifully checks the error return from the orte-level function against an ORTE-prefixed constant. However, if that return isn't "success", what does B return up to A? It cannot return the OMPI equivalent to the orte error constant because it no longer exists. It could return the orte error code, but A has no way of knowing it is going to get a non-OMPI constant, and
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
Can I have an example on how the current trunk is broken due to this change? Thanks, george. On Oct 19, 2011, at 16:32 , Ralph Castain wrote: > I propose that we retain the rest of the changeset, but revert the OMPI > constants to bring back their ORTE equivalents. We clearly should scrub those > and update them to ensure they are both used and current, but it seems to me > we lose more than we gain by removing them. > > > On Oct 19, 2011, at 12:09 PM, Jeff Squyres wrote: > >> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error >> codes. That seems like a very bad idea (in addition to the mixing of + and >> -). >> >> For one thing, that breaks opal_strerror(). That, in itself, seems like a >> dealbreaker. >> >> >> On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: >> >>> I actually think it's worse than that. An ORTE error code can now have >>> the same error code as an OMPI error. OMPI_ERR_REQUEST and >>> ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. >>> Or, they should, if George hadn't made a mistake (see below). The sharing >>> of return codes seems... bad. >>> >>> Also, there's a bug in George's patch. Error codes are all negative, so >>> OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be >>> OMPI_ERR_BASE - 1, not plus 2. >>> >>> Brian >>> >>> On 10/19/11 1:32 PM, "Ralph Castain"wrote: >>> I've been wrestling with something from this commit, and I'm unsure of the right answer. So please consider this a general design question for the community. This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we used to declare OMPI-prefixed equivalents to every ORTE-prefixed constant. I understand the thinking (or at least, what I suspect was the thought), but it creates an issue. Suppose I have an ompi-level function (A) that calls another ompi-level function (B). Invisible to A is that B calls an orte-level function. B dutifully checks the error return from the orte-level function against an ORTE-prefixed constant. However, if that return isn't "success", what does B return up to A? It cannot return the OMPI equivalent to the orte error constant because it no longer exists. It could return the orte error code, but A has no way of knowing it is going to get a non-OMPI constant, and therefore won't be able to understand it - it will be an "unrecognized error". I guess one option is to require that B "translate" the return code and pass some OMPI error up the chain, but this prevents anything upwards from understanding the nature of the problem and potentially taking corrective and/or alternative action. Seems awfully limiting, as most of the time the only option will be the vanilla "OMPI_ERROR". Thoughts? >>> -- >>> Brian W. Barrett >>> Dept. 1423: Scalable System Software >>> Sandia National Laboratories >>> >>> >>> >>> >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
I don't know how you think that the error codes work in Open MPI, so I'll take the liberty to depict it here so we all agree we're talking about the same thing. The opal_strerror is a nice feature, it allow to register a range of error codes with a particular error converter. Every time you look for the meaning of a particular error code, the first convertor with a range enveloping the looked at value, will translate it into an error string. This is only currently used by OPAL and ORTE directly. It worked at the OMPI level only because we mapped __all__ OMPI errors to OPAL or ORTE ones. This behavior didn't change after my patch, you can still use opal_strerror to get the error string for all OPAL/ORTE/OMPI errors. There is a small "variation" for OMPI_ERR_REQUEST, the only really OMPI specific error code today. The OMPI error codes are actually inserted between the OPAL and the ORTE ones (there is a gap of 100 elements), so there is __no__ possible overlap right now. If at one point we add more than 100 OMPI level, we should certainly revisit this. Now, resulting from my patch, there is a difference. One should not simply forward an ORTE code into the stack of OMPI, and hope it just works. Errors should be dealt with where they happens, and if not possible they should be translated into the actual layer error code. The error propagation should be compartmentalized, and has to be translated into an error code that has a meaning at the OMPI level. The current patch should not prevent the mixed error-code code to work, as opal_strerror retains the same behavior as before. However, this coding practice should be avoided. I tried to clean the current code of such instances few days ago in r25230. Moreover, this is similar to how we deal with the error codes between OMPI and MPI layers, and seems like a sane way to compose libraries. You deal with a specific layer error code when you get it (usually after the call to a function from that specific layer), not later on when you don't even know exactly what the execution path was. george. PS: I'll fix the +/- issue. On Oct 19, 2011, at 14:09 , Jeff Squyres wrote: > Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error codes. > That seems like a very bad idea (in addition to the mixing of + and -). > > For one thing, that breaks opal_strerror(). That, in itself, seems like a > dealbreaker. > > > On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: > >> I actually think it's worse than that. An ORTE error code can now have >> the same error code as an OMPI error. OMPI_ERR_REQUEST and >> ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. >> Or, they should, if George hadn't made a mistake (see below). The sharing >> of return codes seems... bad. >> >> Also, there's a bug in George's patch. Error codes are all negative, so >> OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be >> OMPI_ERR_BASE - 1, not plus 2. >> >> Brian >> >> On 10/19/11 1:32 PM, "Ralph Castain"wrote: >> >>> I've been wrestling with something from this commit, and I'm unsure of >>> the right answer. So please consider this a general design question for >>> the community. >>> >>> This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we >>> used to declare OMPI-prefixed equivalents to every ORTE-prefixed >>> constant. I understand the thinking (or at least, what I suspect was the >>> thought), but it creates an issue. >>> >>> Suppose I have an ompi-level function (A) that calls another ompi-level >>> function (B). Invisible to A is that B calls an orte-level function. B >>> dutifully checks the error return from the orte-level function against an >>> ORTE-prefixed constant. >>> >>> However, if that return isn't "success", what does B return up to A? It >>> cannot return the OMPI equivalent to the orte error constant because it >>> no longer exists. It could return the orte error code, but A has no way >>> of knowing it is going to get a non-OMPI constant, and therefore won't be >>> able to understand it - it will be an "unrecognized error". >>> >>> I guess one option is to require that B "translate" the return code and >>> pass some OMPI error up the chain, but this prevents anything upwards >>> from understanding the nature of the problem and potentially taking >>> corrective and/or alternative action. Seems awfully limiting, as most of >>> the time the only option will be the vanilla "OMPI_ERROR". >>> >>> Thoughts? >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National Laboratories >> >> >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
I propose that we retain the rest of the changeset, but revert the OMPI constants to bring back their ORTE equivalents. We clearly should scrub those and update them to ensure they are both used and current, but it seems to me we lose more than we gain by removing them. On Oct 19, 2011, at 12:09 PM, Jeff Squyres wrote: > Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error codes. > That seems like a very bad idea (in addition to the mixing of + and -). > > For one thing, that breaks opal_strerror(). That, in itself, seems like a > dealbreaker. > > > On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: > >> I actually think it's worse than that. An ORTE error code can now have >> the same error code as an OMPI error. OMPI_ERR_REQUEST and >> ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. >> Or, they should, if George hadn't made a mistake (see below). The sharing >> of return codes seems... bad. >> >> Also, there's a bug in George's patch. Error codes are all negative, so >> OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be >> OMPI_ERR_BASE - 1, not plus 2. >> >> Brian >> >> On 10/19/11 1:32 PM, "Ralph Castain"wrote: >> >>> I've been wrestling with something from this commit, and I'm unsure of >>> the right answer. So please consider this a general design question for >>> the community. >>> >>> This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we >>> used to declare OMPI-prefixed equivalents to every ORTE-prefixed >>> constant. I understand the thinking (or at least, what I suspect was the >>> thought), but it creates an issue. >>> >>> Suppose I have an ompi-level function (A) that calls another ompi-level >>> function (B). Invisible to A is that B calls an orte-level function. B >>> dutifully checks the error return from the orte-level function against an >>> ORTE-prefixed constant. >>> >>> However, if that return isn't "success", what does B return up to A? It >>> cannot return the OMPI equivalent to the orte error constant because it >>> no longer exists. It could return the orte error code, but A has no way >>> of knowing it is going to get a non-OMPI constant, and therefore won't be >>> able to understand it - it will be an "unrecognized error". >>> >>> I guess one option is to require that B "translate" the return code and >>> pass some OMPI error up the chain, but this prevents anything upwards >>> from understanding the nature of the problem and potentially taking >>> corrective and/or alternative action. Seems awfully limiting, as most of >>> the time the only option will be the vanilla "OMPI_ERROR". >>> >>> Thoughts? >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National Laboratories >> >> >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Locality info
Sorry - referenced the wrong commit. It was r25331 On Oct 19, 2011, at 2:28 PM, Ralph Castain wrote: > Hi folks > > For those of you who don't follow the commits... > > I just committed (r25323) an extension of the orte_ess.proc_get_locality > function that allows a process to get its relative resource usage with any > other proc in the job. In other words, you can provide a process name to the > function, and the returned bitmask tells you if you share a node, numa, > socket, caches (by level), core, and hyperthread with that process. > > If you are on the same node and unbound, of course, you share all of those. > However, if you are bound, then this can help tell you if you are on a common > numa node, sharing an L1 cache, etc. Might be handy. > > I implemented the underlying functionality so that we can further extend it > to tell you the relative resource location of two procs on a remote node. If > that someday becomes of interest, it would be relatively easy to do - but > would require passing more info around. Hence, I've allowed for it, but not > implemented it until there is some identified need. > > Locality info is available anytime after the modex is completed during > MPI_Init, and is supported regardless of launch environment (minus cnos, for > now), launch by mpirun, or direct-launch - in other words, pretty much always. > > Hope it proves of help in your work > Ralph >
[OMPI devel] Locality info
Hi folks For those of you who don't follow the commits... I just committed (r25323) an extension of the orte_ess.proc_get_locality function that allows a process to get its relative resource usage with any other proc in the job. In other words, you can provide a process name to the function, and the returned bitmask tells you if you share a node, numa, socket, caches (by level), core, and hyperthread with that process. If you are on the same node and unbound, of course, you share all of those. However, if you are bound, then this can help tell you if you are on a common numa node, sharing an L1 cache, etc. Might be handy. I implemented the underlying functionality so that we can further extend it to tell you the relative resource location of two procs on a remote node. If that someday becomes of interest, it would be relatively easy to do - but would require passing more info around. Hence, I've allowed for it, but not implemented it until there is some identified need. Locality info is available anytime after the modex is completed during MPI_Init, and is supported regardless of launch environment (minus cnos, for now), launch by mpirun, or direct-launch - in other words, pretty much always. Hope it proves of help in your work Ralph
Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)
I posted my findings about the bad version no. macros to the same thread that described the Intel V12.1 optimizer bug (http://software.intel.com/en-us/forums/showthread.php?t=87132 ). The response I got is: Posted By: Hubert Haberstock (Intel) __ The build date is currently the only suitable macro. This allows to check for the Intel Compiler and for specific compiler versions. Makes sense? Regards, Hubert. __ That is contrary to what the online V12.1 documentation says. I'm going to find out what the previous versions do, then report this through my normal support channels. If the documentation is wrong, they should fix it; if the documentation is right, they should fix the compiler. (However, there will still be an errant V12.1.0 that reports itself as , so use of the version no. macros will never be reliable without a hack to handle this errant case.) I'll report here what I find about the values of the version no. macros. It is probably better, though, that automake/libtool rely on the output of icc -v, since that seems to always result in a value that matches the version of the product (as opposed to #define __INTEL_COMPILER and #define __ICC from within the V12.1.0 compiler). Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov On 19 Oct 2011, at 10:47 AM, Jeff Squyres wrote: Did this get reported to the Intel compiler support people? On Oct 19, 2011, at 8:24 AM, George Bosilca wrote: Thanks Larry, Will forward this info upstream. george. On Oct 18, 2011, at 21:56 , Larry Baker wrote: George, Thanks for the update. FYI, here's all the version numbers reported by the compiler releases I have installed: [baker@hydra ~]$ module load compilers/intel/11.1.080 [baker@hydra ~]$ icc -v Version 11.1 [baker@hydra ~]$ module unload compilers/intel/11.1.080 [baker@hydra ~]$ module load compilers/intel/2011.3.174 [baker@hydra ~]$ icc -v Version 12.0.3 [baker@hydra ~]$ module unload compilers/intel/2011.3.174 [baker@hydra ~]$ module load compilers/intel/2011.4.191 [baker@hydra ~]$ icc -v Version 12.0.4 [baker@hydra ~]$ module unload compilers/intel/2011.4.191 [baker@hydra ~]$ module load compilers/intel/2011.5.220 [baker@hydra ~]$ icc -v Version 12.0.5 [baker@hydra ~]$ module unload compilers/intel/2011.5.220 [baker@hydra ~]$ module load compilers/intel/2011.6.233 [baker@hydra ~]$ icc -v icc version 12.1.0 (gcc version 4.1.2 compatibility) [baker@hydra ~]$ module unload compilers/intel/2011.6.233 Another problem I found with the Intel 12.1.0 compiler: I started to look at adding a test for the Intel compiler version around the #pragma that disables optimization for OpenMPI and I found the __ICC and __INTEL_COMPILER predefined macros (compiler version no.) are not properly defined: $ icc -E -dD hello.c | grep __INTEL_COMPILER #define __INTEL_COMPILER #define __INTEL_COMPILER_BUILD_DATE 20110811 $ icc -E -dD hello.c | grep __ICC #define __ICC $ icc -v icc version 12.1.0 (gcc version 4.1.2 compatibility) I do not know if there is code in OpenMPI that looks at __ICC and __INTEL_COMPILER, but that could cause problems. (Pass this on upstream to the libtool people?) Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov On 17 Oct 2011, at 8:18 PM, George Bosilca wrote: Larry, Sorry for not updating this thread. The issue was identified and fixed by Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290 ). Please read the comments and the linked thread on the Intel forum for more info about. I couldn't find a trace of this being fixed in the 1.4 series, so I would wait upgrading until this issue gets resolved. Thanks, george. On Oct 17, 2011, at 23:00 , Larry Baker wrote: George, I have not had time to look over the 1.4.3 make check failure for Intel 2011.6.233 compilers. Have you? I had planned to get 1.4.3 compiled on all six of our compilers using the latest compiler releases. I was putting off upgrading to 1.4.4 or 1.5.x until after that to minimize the number of things that could go wrong. Do you recommend otherwise? Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov On 7 Oct 2011, at 6:46 PM, George Bosilca wrote: The may_alias attribute was part of a forward-looking attribute checking, at a time where few compiler supported them. This explains why they are not widely used in the library itself. Moreover, as they do not affect the compilation itself (as your test highlights this is not the issue with the icc 2011.6.233 compiler), there is no urge to remove the may_alias support. I just got that particular version of the compiler installed on one of our machines. I'll give it a try over the weekend. george. On Oct 7, 2011, at 20:21 , Larry Baker wrote: The test for the __may_alias_ attribute uses the following short code
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error codes. That seems like a very bad idea (in addition to the mixing of + and -). For one thing, that breaks opal_strerror(). That, in itself, seems like a dealbreaker. On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote: > I actually think it's worse than that. An ORTE error code can now have > the same error code as an OMPI error. OMPI_ERR_REQUEST and > ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. > Or, they should, if George hadn't made a mistake (see below). The sharing > of return codes seems... bad. > > Also, there's a bug in George's patch. Error codes are all negative, so > OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be > OMPI_ERR_BASE - 1, not plus 2. > > Brian > > On 10/19/11 1:32 PM, "Ralph Castain"wrote: > >> I've been wrestling with something from this commit, and I'm unsure of >> the right answer. So please consider this a general design question for >> the community. >> >> This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we >> used to declare OMPI-prefixed equivalents to every ORTE-prefixed >> constant. I understand the thinking (or at least, what I suspect was the >> thought), but it creates an issue. >> >> Suppose I have an ompi-level function (A) that calls another ompi-level >> function (B). Invisible to A is that B calls an orte-level function. B >> dutifully checks the error return from the orte-level function against an >> ORTE-prefixed constant. >> >> However, if that return isn't "success", what does B return up to A? It >> cannot return the OMPI equivalent to the orte error constant because it >> no longer exists. It could return the orte error code, but A has no way >> of knowing it is going to get a non-OMPI constant, and therefore won't be >> able to understand it - it will be an "unrecognized error". >> >> I guess one option is to require that B "translate" the return code and >> pass some OMPI error up the chain, but this prevents anything upwards >> from understanding the nature of the problem and potentially taking >> corrective and/or alternative action. Seems awfully limiting, as most of >> the time the only option will be the vanilla "OMPI_ERROR". >> >> Thoughts? > -- > Brian W. Barrett > Dept. 1423: Scalable System Software > Sandia National Laboratories > > > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
I actually think it's worse than that. An ORTE error code can now have the same error code as an OMPI error. OMPI_ERR_REQUEST and ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. Or, they should, if George hadn't made a mistake (see below). The sharing of return codes seems... bad. Also, there's a bug in George's patch. Error codes are all negative, so OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be OMPI_ERR_BASE - 1, not plus 2. Brian On 10/19/11 1:32 PM, "Ralph Castain"wrote: >I've been wrestling with something from this commit, and I'm unsure of >the right answer. So please consider this a general design question for >the community. > >This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we >used to declare OMPI-prefixed equivalents to every ORTE-prefixed >constant. I understand the thinking (or at least, what I suspect was the >thought), but it creates an issue. > >Suppose I have an ompi-level function (A) that calls another ompi-level >function (B). Invisible to A is that B calls an orte-level function. B >dutifully checks the error return from the orte-level function against an >ORTE-prefixed constant. > >However, if that return isn't "success", what does B return up to A? It >cannot return the OMPI equivalent to the orte error constant because it >no longer exists. It could return the orte error code, but A has no way >of knowing it is going to get a non-OMPI constant, and therefore won't be >able to understand it - it will be an "unrecognized error". > >I guess one option is to require that B "translate" the return code and >pass some OMPI error up the chain, but this prevents anything upwards >from understanding the nature of the problem and potentially taking >corrective and/or alternative action. Seems awfully limiting, as most of >the time the only option will be the vanilla "OMPI_ERROR". > >Thoughts? -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)
Did this get reported to the Intel compiler support people? On Oct 19, 2011, at 8:24 AM, George Bosilca wrote: > Thanks Larry, > > Will forward this info upstream. > > george. > > On Oct 18, 2011, at 21:56 , Larry Baker wrote: > >> George, >> >> Thanks for the update. FYI, here's all the version numbers reported by the >> compiler releases I have installed: >> >>> [baker@hydra ~]$ module load compilers/intel/11.1.080 >>> [baker@hydra ~]$ icc -v >>> Version 11.1 >>> [baker@hydra ~]$ module unload compilers/intel/11.1.080 >> >>> [baker@hydra ~]$ module load compilers/intel/2011.3.174 >>> [baker@hydra ~]$ icc -v >>> Version 12.0.3 >>> [baker@hydra ~]$ module unload compilers/intel/2011.3.174 >> >>> [baker@hydra ~]$ module load compilers/intel/2011.4.191 >>> [baker@hydra ~]$ icc -v >>> Version 12.0.4 >>> [baker@hydra ~]$ module unload compilers/intel/2011.4.191 >> >>> [baker@hydra ~]$ module load compilers/intel/2011.5.220 >>> [baker@hydra ~]$ icc -v >>> Version 12.0.5 >>> [baker@hydra ~]$ module unload compilers/intel/2011.5.220 >> >>> [baker@hydra ~]$ module load compilers/intel/2011.6.233 >>> [baker@hydra ~]$ icc -v >>> icc version 12.1.0 (gcc version 4.1.2 compatibility) >>> [baker@hydra ~]$ module unload compilers/intel/2011.6.233 >> >> Another problem I found with the Intel 12.1.0 compiler: I started to look at >> adding a test for the Intel compiler version around the #pragma that >> disables optimization for OpenMPI and I found the __ICC and __INTEL_COMPILER >> predefined macros (compiler version no.) are not properly defined: >> >> $ icc -E -dD hello.c | grep __INTEL_COMPILER >> #define __INTEL_COMPILER >> #define __INTEL_COMPILER_BUILD_DATE 20110811 >> >> $ icc -E -dD hello.c | grep __ICC >> #define __ICC >> >> $ icc -v >> icc version 12.1.0 (gcc version 4.1.2 compatibility) >> >> I do not know if there is code in OpenMPI that looks at __ICC and >> __INTEL_COMPILER, but that could cause problems. (Pass this on upstream to >> the libtool people?) >> >> Larry Baker >> US Geological Survey >> 650-329-5608 >> ba...@usgs.gov >> >> On 17 Oct 2011, at 8:18 PM, George Bosilca wrote: >> >>> Larry, >>> >>> Sorry for not updating this thread. The issue was identified and fixed by >>> Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). >>> Please read the comments and the linked thread on the Intel forum for more >>> info about. >>> >>> I couldn't find a trace of this being fixed in the 1.4 series, so I would >>> wait upgrading until this issue gets resolved. >>> >>> Thanks, >>> george. >>> >>> On Oct 17, 2011, at 23:00 , Larry Baker wrote: >>> George, I have not had time to look over the 1.4.3 make check failure for Intel 2011.6.233 compilers. Have you? I had planned to get 1.4.3 compiled on all six of our compilers using the latest compiler releases. I was putting off upgrading to 1.4.4 or 1.5.x until after that to minimize the number of things that could go wrong. Do you recommend otherwise? Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov On 7 Oct 2011, at 6:46 PM, George Bosilca wrote: > The may_alias attribute was part of a forward-looking attribute checking, > at a time where few compiler supported them. This explains why they are > not widely used in the library itself. Moreover, as they do not affect > the compilation itself (as your test highlights this is not the issue > with the icc 2011.6.233 compiler), there is no urge to remove the > may_alias support. > > I just got that particular version of the compiler installed on one of > our machines. I'll give it a try over the weekend. > > george. > > On Oct 7, 2011, at 20:21 , Larry Baker wrote: > >> The test for the __may_alias_ attribute uses the following short code >> snippet: >> >>> int * p_value __attribute__ ((__may_alias__)); >>> int >>> main () >>> { >>> >>> ; >>> return 0; >>> } >> >> Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a >> warning: >> >>> root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220 >>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c >>> may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored >>> int * p_value __attribute__ ((__may_alias__)); >>> ^ >>> >>> [root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220 >> >>> [root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233 >>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c >> >> I modified ./configure to force >> >>> ompi_cv___attribute__may_alias=0 >> >> Then I compiled and tested the library. Unfortunately, the results were >> exactly the same: >>
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
I've been wrestling with something from this commit, and I'm unsure of the right answer. So please consider this a general design question for the community. This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we used to declare OMPI-prefixed equivalents to every ORTE-prefixed constant. I understand the thinking (or at least, what I suspect was the thought), but it creates an issue. Suppose I have an ompi-level function (A) that calls another ompi-level function (B). Invisible to A is that B calls an orte-level function. B dutifully checks the error return from the orte-level function against an ORTE-prefixed constant. However, if that return isn't "success", what does B return up to A? It cannot return the OMPI equivalent to the orte error constant because it no longer exists. It could return the orte error code, but A has no way of knowing it is going to get a non-OMPI constant, and therefore won't be able to understand it - it will be an "unrecognized error". I guess one option is to require that B "translate" the return code and pass some OMPI error up the chain, but this prevents anything upwards from understanding the nature of the problem and potentially taking corrective and/or alternative action. Seems awfully limiting, as most of the time the only option will be the vanilla "OMPI_ERROR". Thoughts? On Oct 18, 2011, at 9:51 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) > New Revision: 25323 > URL: https://svn.open-mpi.org/trac/ompi/changeset/25323 > > Log: > Cleanup the error codes. Get rid of all the useless ones, and > mark the distinction between ORTE and OMPI errors. > > Text files modified: > trunk/ompi/errhandler/errcode-internal.c |32 --- > > trunk/ompi/include/ompi/constants.h |80 > +--- > trunk/ompi/mca/common/sm/common_sm_rml.c | 6 +- > > trunk/ompi/mca/pml/dr/pml_dr_sendreq.c | 5 -- > > trunk/ompi/mpiext/cr/c/quiesce_start.c | 5 ++ > > 5 files changed, 43 insertions(+), 85 deletions(-) > > Modified: trunk/ompi/errhandler/errcode-internal.c > == > --- trunk/ompi/errhandler/errcode-internal.c (original) > +++ trunk/ompi/errhandler/errcode-internal.c 2011-10-18 23:51:53 EDT (Tue, > 18 Oct 2011) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2007 The University of Tennessee and The University > + * Copyright (c) 2004-2011 The University of Tennessee and The University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, > @@ -35,9 +35,6 @@ > static ompi_errcode_intern_t ompi_err_temp_out_of_resource; > static ompi_errcode_intern_t ompi_err_resource_busy; > static ompi_errcode_intern_t ompi_err_bad_param; > -static ompi_errcode_intern_t ompi_err_recv_less_than_posted; > -static ompi_errcode_intern_t ompi_err_recv_more_than_posted; > -static ompi_errcode_intern_t ompi_err_no_match_yet; > static ompi_errcode_intern_t ompi_err_fatal; > static ompi_errcode_intern_t ompi_err_not_implemented; > static ompi_errcode_intern_t ompi_err_not_supported; > @@ -115,30 +112,6 @@ > opal_pointer_array_set_item(_errcodes_intern, > ompi_err_bad_param.index, > _err_bad_param); > > -OBJ_CONSTRUCT(_err_recv_less_than_posted, ompi_errcode_intern_t); > -ompi_err_recv_less_than_posted.code = OMPI_ERR_RECV_LESS_THAN_POSTED; > -ompi_err_recv_less_than_posted.mpi_code = MPI_SUCCESS; > -ompi_err_recv_less_than_posted.index = pos++; > -strncpy(ompi_err_recv_less_than_posted.errstring, > "OMPI_ERR_RECV_LESS_THAN_POSTED", OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_recv_less_than_posted.index, > -_err_recv_less_than_posted); > - > -OBJ_CONSTRUCT(_err_recv_more_than_posted, ompi_errcode_intern_t); > -ompi_err_recv_more_than_posted.code = OMPI_ERR_RECV_MORE_THAN_POSTED; > -ompi_err_recv_more_than_posted.mpi_code = MPI_ERR_TRUNCATE; > -ompi_err_recv_more_than_posted.index = pos++; > -strncpy(ompi_err_recv_more_than_posted.errstring, > "OMPI_ERR_RECV_MORE_THAN_POSTED", OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_recv_more_than_posted.index, > -_err_recv_more_than_posted); > - > -OBJ_CONSTRUCT(_err_no_match_yet, ompi_errcode_intern_t); > -
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25323
Indeed, I removed some of the OMPI level error codes. As you can see in the patch they were defined but never used. I don't think they were worth an RFC, as they are not only never used in the trunk, but on 1.5 and 1.4. And I did check it because I was wondering why they existed in the first place. If [by some miracle] they are used by people working on non-trunk branches, I do apologize for the inconvenience to them. george. On Oct 19, 2011, at 10:37 , Jeff Squyres wrote: > George -- > > Did you actually remove some of the error codes? > > I think that should have been worthy of a (quick) RFC first, just to let > people know who are working in non-trunk branches who might have been using > them. > > > On Oct 18, 2011, at 11:51 PM, bosi...@osl.iu.edu wrote: > >> Author: bosilca >> Date: 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) >> New Revision: 25323 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/25323 >> >> Log: >> Cleanup the error codes. Get rid of all the useless ones, and >> mark the distinction between ORTE and OMPI errors. >> >> Text files modified: >> trunk/ompi/errhandler/errcode-internal.c |32 --- >> >> trunk/ompi/include/ompi/constants.h |80 >> +--- >> trunk/ompi/mca/common/sm/common_sm_rml.c | 6 +- >> >> trunk/ompi/mca/pml/dr/pml_dr_sendreq.c | 5 -- >> >> trunk/ompi/mpiext/cr/c/quiesce_start.c | 5 ++ >> >> 5 files changed, 43 insertions(+), 85 deletions(-) >> >> Modified: trunk/ompi/errhandler/errcode-internal.c >> == >> --- trunk/ompi/errhandler/errcode-internal.c (original) >> +++ trunk/ompi/errhandler/errcode-internal.c 2011-10-18 23:51:53 EDT (Tue, >> 18 Oct 2011) >> @@ -3,7 +3,7 @@ >> * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana >> * University Research and Technology >> * Corporation. All rights reserved. >> - * Copyright (c) 2004-2007 The University of Tennessee and The University >> + * Copyright (c) 2004-2011 The University of Tennessee and The University >> * of Tennessee Research Foundation. All rights >> * reserved. >> * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, >> @@ -35,9 +35,6 @@ >> static ompi_errcode_intern_t ompi_err_temp_out_of_resource; >> static ompi_errcode_intern_t ompi_err_resource_busy; >> static ompi_errcode_intern_t ompi_err_bad_param; >> -static ompi_errcode_intern_t ompi_err_recv_less_than_posted; >> -static ompi_errcode_intern_t ompi_err_recv_more_than_posted; >> -static ompi_errcode_intern_t ompi_err_no_match_yet; >> static ompi_errcode_intern_t ompi_err_fatal; >> static ompi_errcode_intern_t ompi_err_not_implemented; >> static ompi_errcode_intern_t ompi_err_not_supported; >> @@ -115,30 +112,6 @@ >>opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_bad_param.index, >>_err_bad_param); >> >> -OBJ_CONSTRUCT(_err_recv_less_than_posted, ompi_errcode_intern_t); >> -ompi_err_recv_less_than_posted.code = OMPI_ERR_RECV_LESS_THAN_POSTED; >> -ompi_err_recv_less_than_posted.mpi_code = MPI_SUCCESS; >> -ompi_err_recv_less_than_posted.index = pos++; >> -strncpy(ompi_err_recv_less_than_posted.errstring, >> "OMPI_ERR_RECV_LESS_THAN_POSTED", OMPI_MAX_ERROR_STRING); >> -opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_recv_less_than_posted.index, >> -_err_recv_less_than_posted); >> - >> -OBJ_CONSTRUCT(_err_recv_more_than_posted, ompi_errcode_intern_t); >> -ompi_err_recv_more_than_posted.code = OMPI_ERR_RECV_MORE_THAN_POSTED; >> -ompi_err_recv_more_than_posted.mpi_code = MPI_ERR_TRUNCATE; >> -ompi_err_recv_more_than_posted.index = pos++; >> -strncpy(ompi_err_recv_more_than_posted.errstring, >> "OMPI_ERR_RECV_MORE_THAN_POSTED", OMPI_MAX_ERROR_STRING); >> -opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_recv_more_than_posted.index, >> -_err_recv_more_than_posted); >> - >> -OBJ_CONSTRUCT(_err_no_match_yet, ompi_errcode_intern_t); >> -ompi_err_no_match_yet.code = OMPI_ERR_NO_MATCH_YET; >> -ompi_err_no_match_yet.mpi_code = MPI_ERR_PENDING; >> -ompi_err_no_match_yet.index = pos++; >> -strncpy(ompi_err_no_match_yet.errstring, "OMPI_ERR_NO_MATCH_YET", >> OMPI_MAX_ERROR_STRING); >> -opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_no_match_yet.index, >> -_err_no_match_yet); >> - >>OBJ_CONSTRUCT(_err_fatal, ompi_errcode_intern_t); >>ompi_err_fatal.code = OMPI_ERR_FATAL; >>ompi_err_fatal.mpi_code = MPI_ERR_INTERN; >> @@ -232,9 +205,6 @@ >>
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25323
George -- Did you actually remove some of the error codes? I think that should have been worthy of a (quick) RFC first, just to let people know who are working in non-trunk branches who might have been using them. On Oct 18, 2011, at 11:51 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) > New Revision: 25323 > URL: https://svn.open-mpi.org/trac/ompi/changeset/25323 > > Log: > Cleanup the error codes. Get rid of all the useless ones, and > mark the distinction between ORTE and OMPI errors. > > Text files modified: > trunk/ompi/errhandler/errcode-internal.c |32 --- > > trunk/ompi/include/ompi/constants.h |80 > +--- > trunk/ompi/mca/common/sm/common_sm_rml.c | 6 +- > > trunk/ompi/mca/pml/dr/pml_dr_sendreq.c | 5 -- > > trunk/ompi/mpiext/cr/c/quiesce_start.c | 5 ++ > > 5 files changed, 43 insertions(+), 85 deletions(-) > > Modified: trunk/ompi/errhandler/errcode-internal.c > == > --- trunk/ompi/errhandler/errcode-internal.c (original) > +++ trunk/ompi/errhandler/errcode-internal.c 2011-10-18 23:51:53 EDT (Tue, > 18 Oct 2011) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2007 The University of Tennessee and The University > + * Copyright (c) 2004-2011 The University of Tennessee and The University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, > @@ -35,9 +35,6 @@ > static ompi_errcode_intern_t ompi_err_temp_out_of_resource; > static ompi_errcode_intern_t ompi_err_resource_busy; > static ompi_errcode_intern_t ompi_err_bad_param; > -static ompi_errcode_intern_t ompi_err_recv_less_than_posted; > -static ompi_errcode_intern_t ompi_err_recv_more_than_posted; > -static ompi_errcode_intern_t ompi_err_no_match_yet; > static ompi_errcode_intern_t ompi_err_fatal; > static ompi_errcode_intern_t ompi_err_not_implemented; > static ompi_errcode_intern_t ompi_err_not_supported; > @@ -115,30 +112,6 @@ > opal_pointer_array_set_item(_errcodes_intern, > ompi_err_bad_param.index, > _err_bad_param); > > -OBJ_CONSTRUCT(_err_recv_less_than_posted, ompi_errcode_intern_t); > -ompi_err_recv_less_than_posted.code = OMPI_ERR_RECV_LESS_THAN_POSTED; > -ompi_err_recv_less_than_posted.mpi_code = MPI_SUCCESS; > -ompi_err_recv_less_than_posted.index = pos++; > -strncpy(ompi_err_recv_less_than_posted.errstring, > "OMPI_ERR_RECV_LESS_THAN_POSTED", OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_recv_less_than_posted.index, > -_err_recv_less_than_posted); > - > -OBJ_CONSTRUCT(_err_recv_more_than_posted, ompi_errcode_intern_t); > -ompi_err_recv_more_than_posted.code = OMPI_ERR_RECV_MORE_THAN_POSTED; > -ompi_err_recv_more_than_posted.mpi_code = MPI_ERR_TRUNCATE; > -ompi_err_recv_more_than_posted.index = pos++; > -strncpy(ompi_err_recv_more_than_posted.errstring, > "OMPI_ERR_RECV_MORE_THAN_POSTED", OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_recv_more_than_posted.index, > -_err_recv_more_than_posted); > - > -OBJ_CONSTRUCT(_err_no_match_yet, ompi_errcode_intern_t); > -ompi_err_no_match_yet.code = OMPI_ERR_NO_MATCH_YET; > -ompi_err_no_match_yet.mpi_code = MPI_ERR_PENDING; > -ompi_err_no_match_yet.index = pos++; > -strncpy(ompi_err_no_match_yet.errstring, "OMPI_ERR_NO_MATCH_YET", > OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_no_match_yet.index, > -_err_no_match_yet); > - > OBJ_CONSTRUCT(_err_fatal, ompi_errcode_intern_t); > ompi_err_fatal.code = OMPI_ERR_FATAL; > ompi_err_fatal.mpi_code = MPI_ERR_INTERN; > @@ -232,9 +205,6 @@ > OBJ_DESTRUCT(_err_temp_out_of_resource); > OBJ_DESTRUCT(_err_resource_busy); > OBJ_DESTRUCT(_err_bad_param); > -OBJ_DESTRUCT(_err_recv_less_than_posted); > -OBJ_DESTRUCT(_err_recv_more_than_posted); > -OBJ_DESTRUCT(_err_no_match_yet); > OBJ_DESTRUCT(_err_fatal); > OBJ_DESTRUCT(_err_not_implemented); > OBJ_DESTRUCT(_err_not_supported); > > Modified: trunk/ompi/include/ompi/constants.h > == > --- trunk/ompi/include/ompi/constants.h (original) > +++
[OMPI devel] Removing error message
George -- Can you put this back? I don't think the error message is meaningless. It's there because people typically copy-n-paste the error message to the user's list (or whatever their support channel is). That error message will mean something to an OMPI developer; (I'm guessing/assuming) that's why it was there. On Oct 19, 2011, at 9:04 AM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2011-10-19 09:04:46 EDT (Wed, 19 Oct 2011) > New Revision: 25324 > URL: https://svn.open-mpi.org/trac/ompi/changeset/25324 > > Log: > The error here is meaningless. > > Text files modified: > trunk/ompi/debuggers/ompi_debuggers.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > Modified: trunk/ompi/debuggers/ompi_debuggers.c > == > --- trunk/ompi/debuggers/ompi_debuggers.c (original) > +++ trunk/ompi/debuggers/ompi_debuggers.c 2011-10-19 09:04:46 EDT (Wed, > 19 Oct 2011) > @@ -260,8 +260,8 @@ > /* if it failed for some reason, then we are in trouble - > * for now, just report the problem and give up waiting > */ > -opal_output(0, "Debugger_attach[rank=%ld]: could not wait for > debugger - error %s!", > -(long)ORTE_PROC_MY_NAME->vpid, ORTE_ERROR_NAME(rc)); > +opal_output(0, "Debugger_attach[rank=%ld]: could not wait for > debugger!", > +(long)ORTE_PROC_MY_NAME->vpid); > } > } > #endif > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
It's not just my components, George - there are people with branches out there that have OMPI components and changes in them. If you are going to gripe when others make changes without warning, then you should abide by your own rules. :-) On Oct 19, 2011, at 8:16 AM, George Bosilca wrote: > OK, just saw your commit. It make sense, an OMPI component should return OMPI > error codes. Thanks for the fix. > > george. > > On Oct 19, 2011, at 10:12 , George Bosilca wrote: > >> I run an entire battery of tests on these without any issues. Moreover it is >> an OMPI related thing, and these error messages were never used. Anyway, >> please let me know what exactly failed, I'll fix it asap. >> >> Thanks, >> george. >> >> On Oct 19, 2011, at 10:06 , Ralph Castain wrote: >> >>> If you are going to make such sweeping changes, could you please provide a >>> little warning as per our usual methods? This broke several things which >>> can be repaired, but would have been nice to know that we were going to >>> make such a change. >>> >>> Thx >>> >>> >>> On Oct 18, 2011, at 9:51 PM, bosi...@osl.iu.edu wrote: >>> Author: bosilca Date: 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) New Revision: 25323 URL: https://svn.open-mpi.org/trac/ompi/changeset/25323 Log: Cleanup the error codes. Get rid of all the useless ones, and mark the distinction between ORTE and OMPI errors. Text files modified: trunk/ompi/errhandler/errcode-internal.c |32 --- trunk/ompi/include/ompi/constants.h |80 +--- trunk/ompi/mca/common/sm/common_sm_rml.c | 6 +- trunk/ompi/mca/pml/dr/pml_dr_sendreq.c | 5 -- trunk/ompi/mpiext/cr/c/quiesce_start.c | 5 ++ 5 files changed, 43 insertions(+), 85 deletions(-) Modified: trunk/ompi/errhandler/errcode-internal.c == --- trunk/ompi/errhandler/errcode-internal.c (original) +++ trunk/ompi/errhandler/errcode-internal.c 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2011 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -35,9 +35,6 @@ static ompi_errcode_intern_t ompi_err_temp_out_of_resource; static ompi_errcode_intern_t ompi_err_resource_busy; static ompi_errcode_intern_t ompi_err_bad_param; -static ompi_errcode_intern_t ompi_err_recv_less_than_posted; -static ompi_errcode_intern_t ompi_err_recv_more_than_posted; -static ompi_errcode_intern_t ompi_err_no_match_yet; static ompi_errcode_intern_t ompi_err_fatal; static ompi_errcode_intern_t ompi_err_not_implemented; static ompi_errcode_intern_t ompi_err_not_supported; @@ -115,30 +112,6 @@ opal_pointer_array_set_item(_errcodes_intern, ompi_err_bad_param.index, _err_bad_param); -OBJ_CONSTRUCT(_err_recv_less_than_posted, ompi_errcode_intern_t); -ompi_err_recv_less_than_posted.code = OMPI_ERR_RECV_LESS_THAN_POSTED; -ompi_err_recv_less_than_posted.mpi_code = MPI_SUCCESS; -ompi_err_recv_less_than_posted.index = pos++; -strncpy(ompi_err_recv_less_than_posted.errstring, "OMPI_ERR_RECV_LESS_THAN_POSTED", OMPI_MAX_ERROR_STRING); -opal_pointer_array_set_item(_errcodes_intern, ompi_err_recv_less_than_posted.index, -_err_recv_less_than_posted); - -OBJ_CONSTRUCT(_err_recv_more_than_posted, ompi_errcode_intern_t); -ompi_err_recv_more_than_posted.code = OMPI_ERR_RECV_MORE_THAN_POSTED; -ompi_err_recv_more_than_posted.mpi_code = MPI_ERR_TRUNCATE; -ompi_err_recv_more_than_posted.index = pos++; -strncpy(ompi_err_recv_more_than_posted.errstring, "OMPI_ERR_RECV_MORE_THAN_POSTED", OMPI_MAX_ERROR_STRING); -opal_pointer_array_set_item(_errcodes_intern, ompi_err_recv_more_than_posted.index, -_err_recv_more_than_posted); - -OBJ_CONSTRUCT(_err_no_match_yet, ompi_errcode_intern_t); -ompi_err_no_match_yet.code = OMPI_ERR_NO_MATCH_YET; -ompi_err_no_match_yet.mpi_code =
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
I run an entire battery of tests on these without any issues. Moreover it is an OMPI related thing, and these error messages were never used. Anyway, please let me know what exactly failed, I'll fix it asap. Thanks, george. On Oct 19, 2011, at 10:06 , Ralph Castain wrote: > If you are going to make such sweeping changes, could you please provide a > little warning as per our usual methods? This broke several things which can > be repaired, but would have been nice to know that we were going to make such > a change. > > Thx > > > On Oct 18, 2011, at 9:51 PM, bosi...@osl.iu.edu wrote: > >> Author: bosilca >> Date: 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) >> New Revision: 25323 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/25323 >> >> Log: >> Cleanup the error codes. Get rid of all the useless ones, and >> mark the distinction between ORTE and OMPI errors. >> >> Text files modified: >> trunk/ompi/errhandler/errcode-internal.c |32 --- >> >> trunk/ompi/include/ompi/constants.h |80 >> +--- >> trunk/ompi/mca/common/sm/common_sm_rml.c | 6 +- >> >> trunk/ompi/mca/pml/dr/pml_dr_sendreq.c | 5 -- >> >> trunk/ompi/mpiext/cr/c/quiesce_start.c | 5 ++ >> >> 5 files changed, 43 insertions(+), 85 deletions(-) >> >> Modified: trunk/ompi/errhandler/errcode-internal.c >> == >> --- trunk/ompi/errhandler/errcode-internal.c (original) >> +++ trunk/ompi/errhandler/errcode-internal.c 2011-10-18 23:51:53 EDT (Tue, >> 18 Oct 2011) >> @@ -3,7 +3,7 @@ >> * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana >> * University Research and Technology >> * Corporation. All rights reserved. >> - * Copyright (c) 2004-2007 The University of Tennessee and The University >> + * Copyright (c) 2004-2011 The University of Tennessee and The University >> * of Tennessee Research Foundation. All rights >> * reserved. >> * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, >> @@ -35,9 +35,6 @@ >> static ompi_errcode_intern_t ompi_err_temp_out_of_resource; >> static ompi_errcode_intern_t ompi_err_resource_busy; >> static ompi_errcode_intern_t ompi_err_bad_param; >> -static ompi_errcode_intern_t ompi_err_recv_less_than_posted; >> -static ompi_errcode_intern_t ompi_err_recv_more_than_posted; >> -static ompi_errcode_intern_t ompi_err_no_match_yet; >> static ompi_errcode_intern_t ompi_err_fatal; >> static ompi_errcode_intern_t ompi_err_not_implemented; >> static ompi_errcode_intern_t ompi_err_not_supported; >> @@ -115,30 +112,6 @@ >>opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_bad_param.index, >>_err_bad_param); >> >> -OBJ_CONSTRUCT(_err_recv_less_than_posted, ompi_errcode_intern_t); >> -ompi_err_recv_less_than_posted.code = OMPI_ERR_RECV_LESS_THAN_POSTED; >> -ompi_err_recv_less_than_posted.mpi_code = MPI_SUCCESS; >> -ompi_err_recv_less_than_posted.index = pos++; >> -strncpy(ompi_err_recv_less_than_posted.errstring, >> "OMPI_ERR_RECV_LESS_THAN_POSTED", OMPI_MAX_ERROR_STRING); >> -opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_recv_less_than_posted.index, >> -_err_recv_less_than_posted); >> - >> -OBJ_CONSTRUCT(_err_recv_more_than_posted, ompi_errcode_intern_t); >> -ompi_err_recv_more_than_posted.code = OMPI_ERR_RECV_MORE_THAN_POSTED; >> -ompi_err_recv_more_than_posted.mpi_code = MPI_ERR_TRUNCATE; >> -ompi_err_recv_more_than_posted.index = pos++; >> -strncpy(ompi_err_recv_more_than_posted.errstring, >> "OMPI_ERR_RECV_MORE_THAN_POSTED", OMPI_MAX_ERROR_STRING); >> -opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_recv_more_than_posted.index, >> -_err_recv_more_than_posted); >> - >> -OBJ_CONSTRUCT(_err_no_match_yet, ompi_errcode_intern_t); >> -ompi_err_no_match_yet.code = OMPI_ERR_NO_MATCH_YET; >> -ompi_err_no_match_yet.mpi_code = MPI_ERR_PENDING; >> -ompi_err_no_match_yet.index = pos++; >> -strncpy(ompi_err_no_match_yet.errstring, "OMPI_ERR_NO_MATCH_YET", >> OMPI_MAX_ERROR_STRING); >> -opal_pointer_array_set_item(_errcodes_intern, >> ompi_err_no_match_yet.index, >> -_err_no_match_yet); >> - >>OBJ_CONSTRUCT(_err_fatal, ompi_errcode_intern_t); >>ompi_err_fatal.code = OMPI_ERR_FATAL; >>ompi_err_fatal.mpi_code = MPI_ERR_INTERN; >> @@ -232,9 +205,6 @@ >>OBJ_DESTRUCT(_err_temp_out_of_resource); >>OBJ_DESTRUCT(_err_resource_busy); >>OBJ_DESTRUCT(_err_bad_param); >> -OBJ_DESTRUCT(_err_recv_less_than_posted); >> -
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323
If you are going to make such sweeping changes, could you please provide a little warning as per our usual methods? This broke several things which can be repaired, but would have been nice to know that we were going to make such a change. Thx On Oct 18, 2011, at 9:51 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2011-10-18 23:51:53 EDT (Tue, 18 Oct 2011) > New Revision: 25323 > URL: https://svn.open-mpi.org/trac/ompi/changeset/25323 > > Log: > Cleanup the error codes. Get rid of all the useless ones, and > mark the distinction between ORTE and OMPI errors. > > Text files modified: > trunk/ompi/errhandler/errcode-internal.c |32 --- > > trunk/ompi/include/ompi/constants.h |80 > +--- > trunk/ompi/mca/common/sm/common_sm_rml.c | 6 +- > > trunk/ompi/mca/pml/dr/pml_dr_sendreq.c | 5 -- > > trunk/ompi/mpiext/cr/c/quiesce_start.c | 5 ++ > > 5 files changed, 43 insertions(+), 85 deletions(-) > > Modified: trunk/ompi/errhandler/errcode-internal.c > == > --- trunk/ompi/errhandler/errcode-internal.c (original) > +++ trunk/ompi/errhandler/errcode-internal.c 2011-10-18 23:51:53 EDT (Tue, > 18 Oct 2011) > @@ -3,7 +3,7 @@ > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > * Corporation. All rights reserved. > - * Copyright (c) 2004-2007 The University of Tennessee and The University > + * Copyright (c) 2004-2011 The University of Tennessee and The University > * of Tennessee Research Foundation. All rights > * reserved. > * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, > @@ -35,9 +35,6 @@ > static ompi_errcode_intern_t ompi_err_temp_out_of_resource; > static ompi_errcode_intern_t ompi_err_resource_busy; > static ompi_errcode_intern_t ompi_err_bad_param; > -static ompi_errcode_intern_t ompi_err_recv_less_than_posted; > -static ompi_errcode_intern_t ompi_err_recv_more_than_posted; > -static ompi_errcode_intern_t ompi_err_no_match_yet; > static ompi_errcode_intern_t ompi_err_fatal; > static ompi_errcode_intern_t ompi_err_not_implemented; > static ompi_errcode_intern_t ompi_err_not_supported; > @@ -115,30 +112,6 @@ > opal_pointer_array_set_item(_errcodes_intern, > ompi_err_bad_param.index, > _err_bad_param); > > -OBJ_CONSTRUCT(_err_recv_less_than_posted, ompi_errcode_intern_t); > -ompi_err_recv_less_than_posted.code = OMPI_ERR_RECV_LESS_THAN_POSTED; > -ompi_err_recv_less_than_posted.mpi_code = MPI_SUCCESS; > -ompi_err_recv_less_than_posted.index = pos++; > -strncpy(ompi_err_recv_less_than_posted.errstring, > "OMPI_ERR_RECV_LESS_THAN_POSTED", OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_recv_less_than_posted.index, > -_err_recv_less_than_posted); > - > -OBJ_CONSTRUCT(_err_recv_more_than_posted, ompi_errcode_intern_t); > -ompi_err_recv_more_than_posted.code = OMPI_ERR_RECV_MORE_THAN_POSTED; > -ompi_err_recv_more_than_posted.mpi_code = MPI_ERR_TRUNCATE; > -ompi_err_recv_more_than_posted.index = pos++; > -strncpy(ompi_err_recv_more_than_posted.errstring, > "OMPI_ERR_RECV_MORE_THAN_POSTED", OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_recv_more_than_posted.index, > -_err_recv_more_than_posted); > - > -OBJ_CONSTRUCT(_err_no_match_yet, ompi_errcode_intern_t); > -ompi_err_no_match_yet.code = OMPI_ERR_NO_MATCH_YET; > -ompi_err_no_match_yet.mpi_code = MPI_ERR_PENDING; > -ompi_err_no_match_yet.index = pos++; > -strncpy(ompi_err_no_match_yet.errstring, "OMPI_ERR_NO_MATCH_YET", > OMPI_MAX_ERROR_STRING); > -opal_pointer_array_set_item(_errcodes_intern, > ompi_err_no_match_yet.index, > -_err_no_match_yet); > - > OBJ_CONSTRUCT(_err_fatal, ompi_errcode_intern_t); > ompi_err_fatal.code = OMPI_ERR_FATAL; > ompi_err_fatal.mpi_code = MPI_ERR_INTERN; > @@ -232,9 +205,6 @@ > OBJ_DESTRUCT(_err_temp_out_of_resource); > OBJ_DESTRUCT(_err_resource_busy); > OBJ_DESTRUCT(_err_bad_param); > -OBJ_DESTRUCT(_err_recv_less_than_posted); > -OBJ_DESTRUCT(_err_recv_more_than_posted); > -OBJ_DESTRUCT(_err_no_match_yet); > OBJ_DESTRUCT(_err_fatal); > OBJ_DESTRUCT(_err_not_implemented); > OBJ_DESTRUCT(_err_not_supported); > > Modified: trunk/ompi/include/ompi/constants.h > == > --- trunk/ompi/include/ompi/constants.h (original) >
Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)
Thanks Larry, Will forward this info upstream. george. On Oct 18, 2011, at 21:56 , Larry Baker wrote: > George, > > Thanks for the update. FYI, here's all the version numbers reported by the > compiler releases I have installed: > >> [baker@hydra ~]$ module load compilers/intel/11.1.080 >> [baker@hydra ~]$ icc -v >> Version 11.1 >> [baker@hydra ~]$ module unload compilers/intel/11.1.080 > >> [baker@hydra ~]$ module load compilers/intel/2011.3.174 >> [baker@hydra ~]$ icc -v >> Version 12.0.3 >> [baker@hydra ~]$ module unload compilers/intel/2011.3.174 > >> [baker@hydra ~]$ module load compilers/intel/2011.4.191 >> [baker@hydra ~]$ icc -v >> Version 12.0.4 >> [baker@hydra ~]$ module unload compilers/intel/2011.4.191 > >> [baker@hydra ~]$ module load compilers/intel/2011.5.220 >> [baker@hydra ~]$ icc -v >> Version 12.0.5 >> [baker@hydra ~]$ module unload compilers/intel/2011.5.220 > >> [baker@hydra ~]$ module load compilers/intel/2011.6.233 >> [baker@hydra ~]$ icc -v >> icc version 12.1.0 (gcc version 4.1.2 compatibility) >> [baker@hydra ~]$ module unload compilers/intel/2011.6.233 > > > Another problem I found with the Intel 12.1.0 compiler: I started to look at > adding a test for the Intel compiler version around the #pragma that disables > optimization for OpenMPI and I found the __ICC and __INTEL_COMPILER > predefined macros (compiler version no.) are not properly defined: > > $ icc -E -dD hello.c | grep __INTEL_COMPILER > #define __INTEL_COMPILER > #define __INTEL_COMPILER_BUILD_DATE 20110811 > > $ icc -E -dD hello.c | grep __ICC > #define __ICC > > $ icc -v > icc version 12.1.0 (gcc version 4.1.2 compatibility) > > I do not know if there is code in OpenMPI that looks at __ICC and > __INTEL_COMPILER, but that could cause problems. (Pass this on upstream to > the libtool people?) > > Larry Baker > US Geological Survey > 650-329-5608 > ba...@usgs.gov > > On 17 Oct 2011, at 8:18 PM, George Bosilca wrote: > >> Larry, >> >> Sorry for not updating this thread. The issue was identified and fixed by >> Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). >> Please read the comments and the linked thread on the Intel forum for more >> info about. >> >> I couldn't find a trace of this being fixed in the 1.4 series, so I would >> wait upgrading until this issue gets resolved. >> >> Thanks, >> george. >> >> On Oct 17, 2011, at 23:00 , Larry Baker wrote: >> >>> George, >>> >>> I have not had time to look over the 1.4.3 make check failure for Intel >>> 2011.6.233 compilers. Have you? >>> >>> I had planned to get 1.4.3 compiled on all six of our compilers using the >>> latest compiler releases. I was putting off upgrading to 1.4.4 or 1.5.x >>> until after that to minimize the number of things that could go wrong. Do >>> you recommend otherwise? >>> >>> Larry Baker >>> US Geological Survey >>> 650-329-5608 >>> ba...@usgs.gov >>> >>> On 7 Oct 2011, at 6:46 PM, George Bosilca wrote: >>> The may_alias attribute was part of a forward-looking attribute checking, at a time where few compiler supported them. This explains why they are not widely used in the library itself. Moreover, as they do not affect the compilation itself (as your test highlights this is not the issue with the icc 2011.6.233 compiler), there is no urge to remove the may_alias support. I just got that particular version of the compiler installed on one of our machines. I'll give it a try over the weekend. george. On Oct 7, 2011, at 20:21 , Larry Baker wrote: > The test for the __may_alias_ attribute uses the following short code > snippet: > >> int * p_value __attribute__ ((__may_alias__)); >> int >> main () >> { >> >> ; >> return 0; >> } > > Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a > warning: > >> root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220 >> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c >> may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored >> int * p_value __attribute__ ((__may_alias__)); >> ^ >> >> [root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220 > >> [root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233 >> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c > > > I modified ./configure to force > >> ompi_cv___attribute__may_alias=0 > > > Then I compiled and tested the library. Unfortunately, the results were > exactly the same: > >> make check-TESTS >> make[3]: Entering directory >> `/state/partition1/root/src/openmpi-1.4.3/test/datatype' >> /bin/sh: line 4: 26326 Segmentation fault ${dir}$tst >> FAIL: checksum >> /bin/sh: line 4: 26359