Re: [OMPI devel] FreeBSD timer_base_open error?
I was working off-list with Brad on this. Brian is right, the logic in configure.m4 is wrong. It overwrite the timer_linux_happy to yes if the host match "i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*". On FreeBSD host is i386-unknown-freebsd6.2. Here is a quick and dirty patch. I just move the selection logic a little bit around, without any major modifications. george. Index: configure.m4 === --- configure.m4(revision 17970) +++ configure.m4(working copy) @@ -40,14 +40,12 @@ [timer_linux_happy="yes"], [timer_linux_happy="no"])]) -AS_IF([test "$timer_linux_happy" = "yes"], - [AS_IF([test -r "/proc/cpuinfo"], - [timer_linux_happy="yes"], - [timer_linux_happy="no"])]) - case "${host}" in i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*) -timer_linux_happy="yes" +AS_IF([test "$timer_linux_happy" = "yes"], + [AS_IF([test -r "/proc/cpuinfo"], + [timer_linux_happy="yes"], + [timer_linux_happy="no"])]) ;; *) timer_linux_happy="no" On Mar 25, 2008, at 10:31 PM, Brian Barrett wrote: On Mar 25, 2008, at 6:16 PM, Jeff Squyres wrote: "linux" is the name of the component. It looks like opal/mca/timer/ linux/timer_linux_component.c is doing some checks during component open() and returning an error if it can't be used (e.g,. if it's not on linux). The timer components are a little different than normal MCA frameworks; they *must* be compiled in libopen-pal statically, and there will only be one of them built. In this case, I'm guessing that linux was built simply because nothing else was selected to be built, but then its component_open() function failed because it didn't find /proc/cpuinfo. This is actually incorrect. The linux component looks for /proc/ cpuinfo and builds if it founds that file. There's a base component that's built if nothing else is found. The configure logic for the linux component is probably not the right thing to do -- it should probably be modified to check both for that file (there are systems that call themselves "linux" but don't have a /proc/cpuinfo) is readable and that we're actually on Linux. Brian -- Brian Barrett There is an art . . . to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams, 'The Hitchhikers Guide to the Galaxy' ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] RMAPS rank_file component patch and modifications for review
Hi, all Attached patch for modified Rank_File RMAPS component. 1.introduced new general purpose debug flags mpi_debug opal_debug 2.introduced new mca parameter opal_paffinity_slot_list 3.ompi_mpi_init cleaned from opal paffinity functions 4.opal paffinity functions moved to new file opal/mca/paffinity/base/paffinity_base_service.c 5.rank_file component files were renamed according to prefix policy 6.global variables renamed as well. 7.few bug fixes that were brought during previous discussions. 8.If user defines opal_paffinity_alone and rmaps_rank_file_path or opal_paffinity_slot_list, then he gets a Warning that only opal_paffinity_alone will be used. . Best Regards, Lenny. rank_file.patch Description: rank_file.patch
Re: [OMPI devel] RMAPS rank_file component patch and modifications for review
Hi Lenny, This looks good. But I have a couple of suggestions (which others may disagree with): 1. You register an opal mca parameter, but look it up in ompi, then call a opal function with the result. What if you had a function opal_paffinity_base_set_slots(long rank) (or some other name, I don't care) which looked up the mca parameter and then setup the slots as you are doing if it is fount. This would make things a bit cleaner IMHO. 2. the functions in the paffinety base should be prefixed with 'opal_paffinity_base_' 3. Why was the ompi_debug_flag added? It is not used anywhere. 4. You probably do not need to add the opal debug flag. There is already a 'paffinity_base_verbose' flag which should suit your purposes fine. So you should just be able to replace all of the conditional output statements in paffinity with something like opal_output_verbose(10, opal_paffinity_base_output, ...), where 10 is the verbosity level number. Tim Lenny Verkhovsky wrote: Hi, all Attached patch for modified Rank_File RMAPS component. 1.introduced new general purpose debug flags mpi_debug opal_debug 2.introduced new mca parameter opal_paffinity_slot_list 3.ompi_mpi_init cleaned from opal paffinity functions 4.opal paffinity functions moved to new file opal/mca/paffinity/base/paffinity_base_service.c 5.rank_file component files were renamed according to prefix policy 6.global variables renamed as well. 7.few bug fixes that were brought during previous discussions. 8.If user defines opal_paffinity_alone and rmaps_rank_file_path or opal_paffinity_slot_list, then he gets a Warning that only opal_paffinity_alone will be used. . Best Regards, Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Debug output
My apologies - some timing debug output of mine inadvertently was included in a commit. I am working now to correct that, but need to rebuild the system to ensure the fix is correct. Will remove that detailed output shortly. Ralph
Re: [OMPI devel] RMAPS rank_file component patch and modifications for review
I would tend to echo Tim's suggestions. I note that you do lookup that opal mca param in orte as well. I know you sent me a note about that off-list - I apologize for not getting to it yet, but was swamped yesterday. I think the solution suggested in #1 below is the right approach. Looking up opal params in orte or ompi is probably not a good idea. We have had problems in the past where params were looked up in multiple places as people -do- sometimes change the names (ahem...). Also, I would suggest using the macro version of verbose OPAL_OUTPUT_VERBOSE so that it compiles out for non-debug builds - up to you. Many of us use it as we don't need the output from optimized builds. Other than that, I think this looks fine. I do truly appreciate the cleanup of ompi_mpi_init. Ralph On 3/26/08 6:09 AM, "Tim Prins" wrote: > Hi Lenny, > > This looks good. But I have a couple of suggestions (which others may > disagree with): > > 1. You register an opal mca parameter, but look it up in ompi, then call > a opal function with the result. What if you had a function > opal_paffinity_base_set_slots(long rank) (or some other name, I don't > care) which looked up the mca parameter and then setup the slots as you > are doing if it is fount. This would make things a bit cleaner IMHO. > > 2. the functions in the paffinety base should be prefixed with > 'opal_paffinity_base_' > > 3. Why was the ompi_debug_flag added? It is not used anywhere. > > 4. You probably do not need to add the opal debug flag. There is already > a 'paffinity_base_verbose' flag which should suit your purposes fine. So > you should just be able to replace all of the conditional output > statements in paffinity with something like > opal_output_verbose(10, opal_paffinity_base_output, ...), > where 10 is the verbosity level number. > > Tim > > > Lenny Verkhovsky wrote: >> >> >> Hi, all >> >> Attached patch for modified Rank_File RMAPS component. >> >> >> >> 1.introduced new general purpose debug flags >> >> mpi_debug >> >> opal_debug >> >> >> >> 2.introduced new mca parameter opal_paffinity_slot_list >> >> 3.ompi_mpi_init cleaned from opal paffinity functions >> >> 4.opal paffinity functions moved to new file >> opal/mca/paffinity/base/paffinity_base_service.c >> >> 5.rank_file component files were renamed according to prefix policy >> >> 6.global variables renamed as well. >> >> 7.few bug fixes that were brought during previous discussions. >> >> 8.If user defines opal_paffinity_alone and rmaps_rank_file_path or >> opal_paffinity_slot_list, >> >> then he gets a Warning that only opal_paffinity_alone will be used. >> >> >> >> . >> >> Best Regards, >> >> Lenny. >> >> >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Debug output
Fixed with r17977 - again, my apologies On 3/26/08 6:42 AM, "Ralph H Castain" wrote: > My apologies - some timing debug output of mine inadvertently was included > in a commit. I am working now to correct that, but need to rebuild the > system to ensure the fix is correct. > > Will remove that detailed output shortly. > Ralph > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] trunk segfault
Hi, all I compiled and builded source from trunk and it causes segfault /home/USERS/lenny/OMPI_ORTE_NEW/bin/mpirun -np 1 -H witch17 /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW -t lt -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_mpi_register_params() failed --> Returned "Error" (-1) instead of "Success" (0) -- [witch17:01220] *** Process received signal *** [witch17:01220] Signal: Segmentation fault (11) [witch17:01220] Signal code: (128) [witch17:01220] Failing at address: (nil) [witch17:01220] [ 0] /lib64/libpthread.so.0 [0x2aadf7072c10] [witch17:01220] [ 1] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libopen-pal.so.0(free+0x56) [0x2aadf6acb6d6] [witch17:01220] [ 2] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libopen-pal.so.0(opal_argv_free+0x25) [0x2aadf6ab9635] [witch17:01220] [ 3] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libmpi.so.0 [0x2aadf67f4206] [witch17:01220] [ 4] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libmpi.so.0(MPI_Init+0xf0) [0x2aadf68117c0] [witch17:01220] [ 5] /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW(main+0xef) [0x40109f] [witch17:01220] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aadf7199154] [witch17:01220] [ 7] /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW [0x400ee9] [witch17:01220] *** End of error message *** -- mpirun noticed that process rank 0 with PID 1220 on node witch17 exited on signal 11 (Segmentation fault).
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17956
Jeff, I think this commit is not quite correct. I'm working on a patch to fix it at the moment, but just wanted to give a heads up for anyone that is experience the same problem I am. Before this commit I could set "opal_event_include=select" in my .openmpi/mca-params.conf file and the event engine would only use 'select' for all OMPI/ORTE processes. This commit overrides this selection by forcing that all MPI apps use "all". I noticed the break since the FT builds (which require 'select' at the moment) were failing. The fix might be as easy as checking to see if the user specified anything other than the default then forcing only if the user did not define anything. Thoughts? -- Josh On Mar 25, 2008, at 1:18 PM, jsquy...@osl.iu.edu wrote: Author: jsquyres Date: 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) New Revision: 17956 URL: https://svn.open-mpi.org/trac/ompi/changeset/17956 Log: Fix #1253: default libevent to use select/poll and only use the other mechanisms (such as epoll) if someone (ompi_mpi_init()) requests otherwise. See big comment in opal/event/event.c for a full explanation. Text files modified: trunk/ompi/runtime/ompi_mpi_init.c |32 +++- trunk/opal/event/event.c |37 - trunk/orte/orted/orted_main.c |31 --- 3 files changed, 59 insertions(+), 41 deletions(-) Modified: trunk/ompi/runtime/ompi_mpi_init.c = = = = = = = = == --- trunk/ompi/runtime/ompi_mpi_init.c (original) +++ trunk/ompi/runtime/ompi_mpi_init.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -234,15 +234,37 @@ /* see comment below about sched_yield */ int num_processors; #endif - -/* Join the run-time environment - do the things that don't hit - the registry */ -if (ORTE_SUCCESS != (ret = opal_init())) { -error = "ompi_mpi_init: opal_init failed"; +/* Setup enough to check get/set MCA params */ + +if (ORTE_SUCCESS != (ret = opal_init_util())) { +error = "ompi_mpi_init: opal_init_util failed"; goto error; } +/* _After_ opal_init_util() but _before_ orte_init(), we need to + set an MCA param that tells libevent that it's ok to use any + mechanism in libevent that si available on this platform (e.g., + epoll and friends). Per opal/event/event.s, we default to + select/poll -- but we know that MPI processes won't be using + pty's with the event engine, so it's ok to relax this + constraint and let any fd-monitoring mechanism be used. */ +ret = mca_base_param_reg_string_name("opal", "event_include", + "Internal orted MCA param: tell opal_init() to use a specific mechanism in libevent", + false, false, "all", NULL); +if (ret >= 0) { +/* We have to explicitly "set" the MCA param value here + because libevent initialization will re-register the MCA + param and therefore override the default. Setting the value + here puts the desired value ("all") in different storage + that is not overwritten if/when the MCA param is + re-registered. Note that we do *NOT* set this value as an + environment variable, just so that it won't be inherited by + any spawned processes and potentially cause unintented + side-effects with launching ORTE tools... */ +mca_base_param_set_string(ret, "all"); +} + /* check to see if we want timing information */ param = mca_base_param_reg_int_name("ompi", "timing", "Request that critical timing loops be measured", Modified: trunk/opal/event/event.c = = = = = = = = == --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -256,15 +256,42 @@ #if OPAL_HAVE_WORKING_EVENTOPS -/** - * Retrieve the upper level specified event system, if any. +/* Retrieve the upper level specified event system, if any. + * Default to select() on OS X and poll() everywhere else because + * various parts of OMPI / ORTE use libevent with pty's. pty's + * *only* work with select on OS X (tested on Tiger and Leopard); + * we *know* that both select and poll works with pty's everywhere + * else we care about (other mechansisms such as epoll *may* work + * with pty's -- we have not tested comprehensively with newer + * versions of Linux, etc.). So the safe thing to do is: + * + * - On OS X, default to using "select" only + * - Everywhere else, default to using "poll" only (because poll + * is more scalable than select) + * + * An upper layer may override
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17956
My fix is in r17980. I did some limited testing with and without C/R and things look fine. Wider testing may be in order, but I think MTT should take care of that this evening. Cheers, Josh On Mar 26, 2008, at 10:40 AM, Josh Hursey wrote: Jeff, I think this commit is not quite correct. I'm working on a patch to fix it at the moment, but just wanted to give a heads up for anyone that is experience the same problem I am. Before this commit I could set "opal_event_include=select" in my .openmpi/mca-params.conf file and the event engine would only use 'select' for all OMPI/ORTE processes. This commit overrides this selection by forcing that all MPI apps use "all". I noticed the break since the FT builds (which require 'select' at the moment) were failing. The fix might be as easy as checking to see if the user specified anything other than the default then forcing only if the user did not define anything. Thoughts? -- Josh On Mar 25, 2008, at 1:18 PM, jsquy...@osl.iu.edu wrote: Author: jsquyres Date: 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) New Revision: 17956 URL: https://svn.open-mpi.org/trac/ompi/changeset/17956 Log: Fix #1253: default libevent to use select/poll and only use the other mechanisms (such as epoll) if someone (ompi_mpi_init()) requests otherwise. See big comment in opal/event/event.c for a full explanation. Text files modified: trunk/ompi/runtime/ompi_mpi_init.c |32 +++- trunk/opal/event/event.c |37 - trunk/orte/orted/orted_main.c |31 --- 3 files changed, 59 insertions(+), 41 deletions(-) Modified: trunk/ompi/runtime/ompi_mpi_init.c = = = = = = = = = = --- trunk/ompi/runtime/ompi_mpi_init.c (original) +++ trunk/ompi/runtime/ompi_mpi_init.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -234,15 +234,37 @@ /* see comment below about sched_yield */ int num_processors; #endif - -/* Join the run-time environment - do the things that don't hit - the registry */ -if (ORTE_SUCCESS != (ret = opal_init())) { -error = "ompi_mpi_init: opal_init failed"; +/* Setup enough to check get/set MCA params */ + +if (ORTE_SUCCESS != (ret = opal_init_util())) { +error = "ompi_mpi_init: opal_init_util failed"; goto error; } +/* _After_ opal_init_util() but _before_ orte_init(), we need to + set an MCA param that tells libevent that it's ok to use any + mechanism in libevent that si available on this platform (e.g., + epoll and friends). Per opal/event/event.s, we default to + select/poll -- but we know that MPI processes won't be using + pty's with the event engine, so it's ok to relax this + constraint and let any fd-monitoring mechanism be used. */ +ret = mca_base_param_reg_string_name("opal", "event_include", + "Internal orted MCA param: tell opal_init() to use a specific mechanism in libevent", + false, false, "all", NULL); +if (ret >= 0) { +/* We have to explicitly "set" the MCA param value here + because libevent initialization will re-register the MCA + param and therefore override the default. Setting the value + here puts the desired value ("all") in different storage + that is not overwritten if/when the MCA param is + re-registered. Note that we do *NOT* set this value as an + environment variable, just so that it won't be inherited by + any spawned processes and potentially cause unintented + side-effects with launching ORTE tools... */ +mca_base_param_set_string(ret, "all"); +} + /* check to see if we want timing information */ param = mca_base_param_reg_int_name("ompi", "timing", "Request that critical timing loops be measured", Modified: trunk/opal/event/event.c = = = = = = = = = = --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -256,15 +256,42 @@ #if OPAL_HAVE_WORKING_EVENTOPS -/** - * Retrieve the upper level specified event system, if any. +/* Retrieve the upper level specified event system, if any. + * Default to select() on OS X and poll() everywhere else because + * various parts of OMPI / ORTE use libevent with pty's. pty's + * *only* work with select on OS X (tested on Tiger and Leopard); + * we *know* that both select and poll works with pty's everywhere + * else we care about (other mechansisms such as epoll *may* work + * with pty's -- we have not tested comprehensively with newer + * versions of Linux, etc.). So the safe thing to do is: + * + *
Re: [OMPI devel] FreeBSD timer_base_open error?
George - Good catch -- that's going to cause a problem :). But I think we should add yet another check to also make sure that we're on Linux. So the three tests would be: 1) Am I on a platform that we have timer assembly support for? (That's the long list of architectures that we recently, and incorrectly, added). 2) Am I on Linux (since we really only know how to parse /proc/cpuinfo on Linux) 3) Is /proc/cpuinfo readable (Because we have a couple architectures that are reported by config.guess as Linux, but don't have /proc/cpuinfo). Make sense? Brian On Wed, 26 Mar 2008, George Bosilca wrote: I was working off-list with Brad on this. Brian is right, the logic in configure.m4 is wrong. It overwrite the timer_linux_happy to yes if the host match "i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*". On FreeBSD host is i386-unknown-freebsd6.2. Here is a quick and dirty patch. I just move the selection logic a little bit around, without any major modifications. george. Index: configure.m4 === --- configure.m4(revision 17970) +++ configure.m4(working copy) @@ -40,14 +40,12 @@ [timer_linux_happy="yes"], [timer_linux_happy="no"])]) -AS_IF([test "$timer_linux_happy" = "yes"], - [AS_IF([test -r "/proc/cpuinfo"], - [timer_linux_happy="yes"], - [timer_linux_happy="no"])]) - case "${host}" in i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*) -timer_linux_happy="yes" +AS_IF([test "$timer_linux_happy" = "yes"], + [AS_IF([test -r "/proc/cpuinfo"], + [timer_linux_happy="yes"], + [timer_linux_happy="no"])]) ;; *) timer_linux_happy="no" On Mar 25, 2008, at 10:31 PM, Brian Barrett wrote: On Mar 25, 2008, at 6:16 PM, Jeff Squyres wrote: "linux" is the name of the component. It looks like opal/mca/timer/ linux/timer_linux_component.c is doing some checks during component open() and returning an error if it can't be used (e.g,. if it's not on linux). The timer components are a little different than normal MCA frameworks; they *must* be compiled in libopen-pal statically, and there will only be one of them built. In this case, I'm guessing that linux was built simply because nothing else was selected to be built, but then its component_open() function failed because it didn't find /proc/cpuinfo. This is actually incorrect. The linux component looks for /proc/ cpuinfo and builds if it founds that file. There's a base component that's built if nothing else is found. The configure logic for the linux component is probably not the right thing to do -- it should probably be modified to check both for that file (there are systems that call themselves "linux" but don't have a /proc/cpuinfo) is readable and that we're actually on Linux. Brian -- Brian Barrett There is an art . . . to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams, 'The Hitchhikers Guide to the Galaxy' ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17956
Sorry about that, Josh -- thanks for fixing it. I added one more very minor change on top of r17980. On Mar 26, 2008, at 10:55 AM, Josh Hursey wrote: My fix is in r17980. I did some limited testing with and without C/R and things look fine. Wider testing may be in order, but I think MTT should take care of that this evening. Cheers, Josh On Mar 26, 2008, at 10:40 AM, Josh Hursey wrote: Jeff, I think this commit is not quite correct. I'm working on a patch to fix it at the moment, but just wanted to give a heads up for anyone that is experience the same problem I am. Before this commit I could set "opal_event_include=select" in my .openmpi/mca-params.conf file and the event engine would only use 'select' for all OMPI/ORTE processes. This commit overrides this selection by forcing that all MPI apps use "all". I noticed the break since the FT builds (which require 'select' at the moment) were failing. The fix might be as easy as checking to see if the user specified anything other than the default then forcing only if the user did not define anything. Thoughts? -- Josh On Mar 25, 2008, at 1:18 PM, jsquy...@osl.iu.edu wrote: Author: jsquyres Date: 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) New Revision: 17956 URL: https://svn.open-mpi.org/trac/ompi/changeset/17956 Log: Fix #1253: default libevent to use select/poll and only use the other mechanisms (such as epoll) if someone (ompi_mpi_init()) requests otherwise. See big comment in opal/event/event.c for a full explanation. Text files modified: trunk/ompi/runtime/ompi_mpi_init.c |32 +++- trunk/opal/event/event.c |37 - trunk/orte/orted/orted_main.c |31 --- 3 files changed, 59 insertions(+), 41 deletions(-) Modified: trunk/ompi/runtime/ompi_mpi_init.c = = = = = = = = = = --- trunk/ompi/runtime/ompi_mpi_init.c (original) +++ trunk/ompi/runtime/ompi_mpi_init.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -234,15 +234,37 @@ /* see comment below about sched_yield */ int num_processors; #endif - -/* Join the run-time environment - do the things that don't hit - the registry */ -if (ORTE_SUCCESS != (ret = opal_init())) { -error = "ompi_mpi_init: opal_init failed"; +/* Setup enough to check get/set MCA params */ + +if (ORTE_SUCCESS != (ret = opal_init_util())) { +error = "ompi_mpi_init: opal_init_util failed"; goto error; } +/* _After_ opal_init_util() but _before_ orte_init(), we need to + set an MCA param that tells libevent that it's ok to use any + mechanism in libevent that si available on this platform (e.g., + epoll and friends). Per opal/event/event.s, we default to + select/poll -- but we know that MPI processes won't be using + pty's with the event engine, so it's ok to relax this + constraint and let any fd-monitoring mechanism be used. */ +ret = mca_base_param_reg_string_name("opal", "event_include", + "Internal orted MCA param: tell opal_init() to use a specific mechanism in libevent", + false, false, "all", NULL); +if (ret >= 0) { +/* We have to explicitly "set" the MCA param value here + because libevent initialization will re-register the MCA + param and therefore override the default. Setting the value + here puts the desired value ("all") in different storage + that is not overwritten if/when the MCA param is + re-registered. Note that we do *NOT* set this value as an + environment variable, just so that it won't be inherited by + any spawned processes and potentially cause unintented + side-effects with launching ORTE tools... */ +mca_base_param_set_string(ret, "all"); +} + /* check to see if we want timing information */ param = mca_base_param_reg_int_name("ompi", "timing", "Request that critical timing loops be measured", Modified: trunk/opal/event/event.c = = = = = = = = = = --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -256,15 +256,42 @@ #if OPAL_HAVE_WORKING_EVENTOPS -/** - * Retrieve the upper level specified event system, if any. +/* Retrieve the upper level specified event system, if any. + * Default to select() on OS X and poll() everywhere else because + * various parts of OMPI / ORTE use libevent with pty's. pty's + * *only* work with select on OS X (tested on Tiger and Leopard); + * we *know* that both select and poll works with pty's everywhere + * else we care about (other mechansisms such as epoll *may* work + * with
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17956
No worries. Thanks for the adjustment. Cheers, Josh On Mar 26, 2008, at 3:10 PM, Jeff Squyres wrote: Sorry about that, Josh -- thanks for fixing it. I added one more very minor change on top of r17980. On Mar 26, 2008, at 10:55 AM, Josh Hursey wrote: My fix is in r17980. I did some limited testing with and without C/R and things look fine. Wider testing may be in order, but I think MTT should take care of that this evening. Cheers, Josh On Mar 26, 2008, at 10:40 AM, Josh Hursey wrote: Jeff, I think this commit is not quite correct. I'm working on a patch to fix it at the moment, but just wanted to give a heads up for anyone that is experience the same problem I am. Before this commit I could set "opal_event_include=select" in my .openmpi/mca-params.conf file and the event engine would only use 'select' for all OMPI/ORTE processes. This commit overrides this selection by forcing that all MPI apps use "all". I noticed the break since the FT builds (which require 'select' at the moment) were failing. The fix might be as easy as checking to see if the user specified anything other than the default then forcing only if the user did not define anything. Thoughts? -- Josh On Mar 25, 2008, at 1:18 PM, jsquy...@osl.iu.edu wrote: Author: jsquyres Date: 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) New Revision: 17956 URL: https://svn.open-mpi.org/trac/ompi/changeset/17956 Log: Fix #1253: default libevent to use select/poll and only use the other mechanisms (such as epoll) if someone (ompi_mpi_init()) requests otherwise. See big comment in opal/event/event.c for a full explanation. Text files modified: trunk/ompi/runtime/ompi_mpi_init.c |32 +++- trunk/opal/event/event.c |37 - trunk/orte/orted/orted_main.c |31 --- 3 files changed, 59 insertions(+), 41 deletions(-) Modified: trunk/ompi/runtime/ompi_mpi_init.c = = = = = = = = = = === = --- trunk/ompi/runtime/ompi_mpi_init.c (original) +++ trunk/ompi/runtime/ompi_mpi_init.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -234,15 +234,37 @@ /* see comment below about sched_yield */ int num_processors; #endif - -/* Join the run-time environment - do the things that don't hit - the registry */ -if (ORTE_SUCCESS != (ret = opal_init())) { -error = "ompi_mpi_init: opal_init failed"; +/* Setup enough to check get/set MCA params */ + +if (ORTE_SUCCESS != (ret = opal_init_util())) { +error = "ompi_mpi_init: opal_init_util failed"; goto error; } +/* _After_ opal_init_util() but _before_ orte_init(), we need to + set an MCA param that tells libevent that it's ok to use any + mechanism in libevent that si available on this platform (e.g., + epoll and friends). Per opal/event/event.s, we default to + select/poll -- but we know that MPI processes won't be using + pty's with the event engine, so it's ok to relax this + constraint and let any fd-monitoring mechanism be used. */ +ret = mca_base_param_reg_string_name("opal", "event_include", + "Internal orted MCA param: tell opal_init() to use a specific mechanism in libevent", + false, false, "all", NULL); +if (ret >= 0) { +/* We have to explicitly "set" the MCA param value here + because libevent initialization will re-register the MCA + param and therefore override the default. Setting the value + here puts the desired value ("all") in different storage + that is not overwritten if/when the MCA param is + re-registered. Note that we do *NOT* set this value as an + environment variable, just so that it won't be inherited by + any spawned processes and potentially cause unintented + side-effects with launching ORTE tools... */ +mca_base_param_set_string(ret, "all"); +} + /* check to see if we want timing information */ param = mca_base_param_reg_int_name("ompi", "timing", "Request that critical timing loops be measured", Modified: trunk/opal/event/event.c = = = = = = = = = = === = --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008) @@ -256,15 +256,42 @@ #if OPAL_HAVE_WORKING_EVENTOPS -/** - * Retrieve the upper level specified event system, if any. +/* Retrieve the upper level specified event system, if any. + * Default to select() on OS X and poll() everywhere else because + * various parts of OMPI / ORTE use libevent with pty's. pty's + * *only* work with select on OS X (tested on Tiger and Leopard); + * we *know* that bot
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17983
What's Interix? On Mar 26, 2008, at 7:20 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) New Revision: 17983 URL: https://svn.open-mpi.org/trac/ompi/changeset/17983 Log: Add support for Interix. Added: trunk/config/ompi_interix.m4 (contents, props changed) Text files modified: trunk/acinclude.m4 | 1 + trunk/configure.ac | 3 +++ 2 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/acinclude.m4 = = = = = = = = == --- trunk/acinclude.m4 (original) +++ trunk/acinclude.m4 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) @@ -108,6 +108,7 @@ # Include the macros for Windows checking # m4_include(config/ompi_microsoft.m4) +m4_include(config/ompi_interix.m4) # # The config/mca_no_configure_components.m4 file is generated by Added: trunk/config/ompi_interix.m4 = = = = = = = = == --- (empty file) +++ trunk/config/ompi_interix.m4 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) @@ -0,0 +1,56 @@ +dnl -*- shell-script -*- +dnl +dnl Copyright (c) 2008 The University of Tennessee and The University +dnl of Tennessee Research Foundation. All rights +dnl reserved. +dnl $COPYRIGHT$ +dnl +dnl Additional copyrights may follow +dnl +dnl $HEADER$ +dnl + + ## +# +# OMPI_INTERIX +# +# Detect if the environment is SUA/SFU (i.e. Interix) and modify +# the compiling environment accordingly. +# +# USAGE: +# OMPI_INTERIX() +# + ## +AC_DEFUN([OMPI_INTERIX],[ + +AC_MSG_CHECKING(for Interix environment) +AC_TRY_COMPILE([], + [#if !defined(__INTERIX) +#error Normal Unix environment +#endif], + is_interix=yes, + is_interix=no) +AC_MSG_RESULT([$is_interix]) +if test "$is_interix" = "yes"; then + +ompi_show_subtitle "Interix detection" + +if ! test -d /usr/include/port; then +AC_MSG_WARN([Compiling Open MPI under Interix require an up-to-date]) +AC_MSG_WARN([version of libport. Please ask your system administrator]) +AC_MSG_WARN([to install it (pkg_update -L libport).]) +AC_MSG_ERROR([*** Cannot continue]) +fi +# +# These are the minimum requirements for Interix ... +# +AC_MSG_WARN([-lport was added to the linking flags]) +LDFLAGS="-lport $LDFLAGS" +AC_MSG_WARN([-D_ALL_SOURCE -D_USE_LIBPORT was added to the compilation flags]) +CFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port $CFLAGS" +CPPFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port $CPPFLAGS" +CXXFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port $CXXFLAGS" + +fi + +]) Modified: trunk/configure.ac = = = = = = = = == --- trunk/configure.ac (original) +++ trunk/configure.ac 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) @@ -192,6 +192,9 @@ AM_CONDITIONAL(OMPI_NEED_WINDOWS_REPLACEMENTS, test "$ompi_cv_c_compiler_vendor" = "microsoft" ) +# Do all Interix detections if necessary +OMPI_INTERIX + # Does the compiler support "ident"-like constructs? OMPI_CHECK_IDENT([CC], [CFLAGS], [c], [C]) ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17983
Interix or SUA or SFU is the POSIX layer integrated with the latest versions of Windows (such as Vista, and Server 2003). It provide fork, rsh basically most of the tools we need. george. Jeff Squyres wrote: What's Interix? On Mar 26, 2008, at 7:20 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) New Revision: 17983 URL: https://svn.open-mpi.org/trac/ompi/changeset/17983 Log: Add support for Interix. Added: trunk/config/ompi_interix.m4 (contents, props changed) Text files modified: trunk/acinclude.m4 | 1 + trunk/configure.ac | 3 +++ 2 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/acinclude.m4 = = = = = = = = == --- trunk/acinclude.m4 (original) +++ trunk/acinclude.m4 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) @@ -108,6 +108,7 @@ # Include the macros for Windows checking # m4_include(config/ompi_microsoft.m4) +m4_include(config/ompi_interix.m4) # # The config/mca_no_configure_components.m4 file is generated by Added: trunk/config/ompi_interix.m4 = = = = = = = = == --- (empty file) +++ trunk/config/ompi_interix.m4 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) @@ -0,0 +1,56 @@ +dnl -*- shell-script -*- +dnl +dnl Copyright (c) 2008 The University of Tennessee and The University +dnl of Tennessee Research Foundation. All rights +dnl reserved. +dnl $COPYRIGHT$ +dnl +dnl Additional copyrights may follow +dnl +dnl $HEADER$ +dnl + + ## +# +# OMPI_INTERIX +# +# Detect if the environment is SUA/SFU (i.e. Interix) and modify +# the compiling environment accordingly. +# +# USAGE: +# OMPI_INTERIX() +# + ## +AC_DEFUN([OMPI_INTERIX],[ + +AC_MSG_CHECKING(for Interix environment) +AC_TRY_COMPILE([], + [#if !defined(__INTERIX) +#error Normal Unix environment +#endif], + is_interix=yes, + is_interix=no) +AC_MSG_RESULT([$is_interix]) +if test "$is_interix" = "yes"; then + +ompi_show_subtitle "Interix detection" + +if ! test -d /usr/include/port; then +AC_MSG_WARN([Compiling Open MPI under Interix require an up-to-date]) +AC_MSG_WARN([version of libport. Please ask your system administrator]) +AC_MSG_WARN([to install it (pkg_update -L libport).]) +AC_MSG_ERROR([*** Cannot continue]) +fi +# +# These are the minimum requirements for Interix ... +# +AC_MSG_WARN([-lport was added to the linking flags]) +LDFLAGS="-lport $LDFLAGS" +AC_MSG_WARN([-D_ALL_SOURCE -D_USE_LIBPORT was added to the compilation flags]) +CFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port $CFLAGS" +CPPFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port $CPPFLAGS" +CXXFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port $CXXFLAGS" + +fi + +]) Modified: trunk/configure.ac = = = = = = = = == --- trunk/configure.ac (original) +++ trunk/configure.ac 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008) @@ -192,6 +192,9 @@ AM_CONDITIONAL(OMPI_NEED_WINDOWS_REPLACEMENTS, test "$ompi_cv_c_compiler_vendor" = "microsoft" ) +# Do all Interix detections if necessary +OMPI_INTERIX + # Does the compiler support "ident"-like constructs? OMPI_CHECK_IDENT([CC], [CFLAGS], [c], [C]) ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full