Re: [OMPI devel] orte-dvm / orte-submit race condition
Okay, please try the attached patch. It will cause two messages to be output for each job: one indicating the job has been marked terminated, and the other reporting that the completion message was sent to the requestor. Let's see what that tells us. Thanks Ralph On Wed, Oct 14, 2015 at 3:44 PM, Mark Santcrooswrote: > Hi Ralph, > > > On 15 Oct 2015, at 0:26 , Ralph Castain wrote: > > Okay, so each orte-submit is reporting job has launched, which means the > hang is coming while waiting to hear the job completed. Are you sure that > orte-dvm believes the job has completed? > > No, I'm not. > > > In other words, when you say that you observe the job as completing, are > you basing that on some output from orte-dvm, or because the procs have > exited, or...? > > ... because the tasks have created their output. > > > I can send you a patch tonight that would cause orte-dvm to emit a "job > completed" message when it determines each job has terminated - might help > us take the next step. > > Great. > > > I'm wondering if orte-dvm thinks the job is still running, and the race > condition is in that area (as opposed to being in orte-submit itself) > > Do some counts from the output of orte-dvm provide some hints? > > > $ grep "Releasing job data.*INVALID" dvm_output.txt |wc -l > 42 > > $ grep "ORTE_DAEMON_SPAWN_JOB_CMD" dvm_output.txt |wc -l > 42 > > $ grep "ORTE_DAEMON_ADD_LOCAL_PROCS" dvm_output.txt |wc -l > 42 > > $ grep "sess_dir_finalize" dvm_output.txt |wc -l > 35 > > > In other words, the "[netbook:] sess_dir_finalize: proc session dir > does not exist" message doesn't show up for the hanging ones, which could > support your question that the orte-dvm is at fault. > > Gr, > > Mark > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18171.php > diff --git a/orte/mca/state/dvm/state_dvm.c b/orte/mca/state/dvm/state_dvm.c index 0e7309c..5b1a841 100644 --- a/orte/mca/state/dvm/state_dvm.c +++ b/orte/mca/state/dvm/state_dvm.c @@ -267,6 +267,7 @@ void check_complete(int fd, short args, void *cbdata) if (jdata->state < ORTE_JOB_STATE_UNTERMINATED) { jdata->state = ORTE_JOB_STATE_TERMINATED; } +opal_output(0, "%s JOB %s HAS TERMINATED", ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_JOBID_PRINT(jdata->jobid)); } /* tell the IOF that the job is complete */ diff --git a/orte/tools/orte-dvm/orte-dvm.c b/orte/tools/orte-dvm/orte-dvm.c index 3cdf585..003f93a 100644 --- a/orte/tools/orte-dvm/orte-dvm.c +++ b/orte/tools/orte-dvm/orte-dvm.c @@ -442,18 +442,6 @@ int main(int argc, char *argv[]) exit(orte_exit_status); } -static void send_callback(int status, orte_process_name_t *peer, - opal_buffer_t* buffer, orte_rml_tag_t tag, - void* cbdata) - -{ -orte_job_t *jdata = (orte_job_t*)cbdata; - -OBJ_RELEASE(buffer); -/* cleanup the job object */ -opal_pointer_array_set_item(orte_job_data, ORTE_LOCAL_JOBID(jdata->jobid), NULL); -OBJ_RELEASE(jdata); -} static void notify_requestor(int sd, short args, void *cbdata) { orte_state_caddy_t *caddy = (orte_state_caddy_t*)cbdata; @@ -462,6 +450,11 @@ static void notify_requestor(int sd, short args, void *cbdata) int ret; opal_buffer_t *reply; +opal_output(0, "%s NOTIFYING %s OF JOB %s COMPLETION", +ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), +ORTE_NAME_PRINT(>originator), +ORTE_JOBID_PRINT(jdata->jobid)); + /* notify the requestor */ reply = OBJ_NEW(opal_buffer_t); /* see if there was any problem */ @@ -471,11 +464,13 @@ static void notify_requestor(int sd, short args, void *cbdata) ret = 0; } opal_dss.pack(reply, , 1, OPAL_INT); -orte_rml.send_buffer_nb(>originator, reply, ORTE_RML_TAG_TOOL, send_callback, jdata); +orte_rml.send_buffer_nb(>originator, reply, ORTE_RML_TAG_TOOL, orte_rml_send_callback, NULL); + +/* flag that we were notified */ +jdata->state = ORTE_JOB_STATE_NOTIFIED; +/* send us back thru job complete */ +ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_TERMINATED); -/* we cannot cleanup the job object as we might - * hit an error during transmission, so clean it - * up in the send callback */ OBJ_RELEASE(caddy); }
Re: [OMPI devel] [OMPI users] fatal error: openmpi-v2.x-dev-415-g5c9b192 andopenmpi-dev-2696-gd579a07
Folks, i made PR #1028 https://github.com/open-mpi/ompi/pull/1028 it is not 100% clean (so i will not commit it before a review) since opal/mca/pmix/pmix1xx/pmix/configure is now invoked with two CPPFLAGS=... on the command line: - first one comes from the ompi configure command line - second one (this is the one used) is set by opal/mca/pmix/pmix1xx/configure.m4) Cheers, Gilles On 10/14/2015 3:37 PM, Gilles Gouaillardet wrote: Folks, i was able to reproduce the issue by adding CPPFLAGS=-I/tmp to my configure command line. here is what happens : opal/mca/pmix/pmix1xx/configure.m4 set the CPPFLAGS environment variable with -I/tmp and include paths for hwloc and libevent then opal/mca/pmix/pmix1xx/pmix/configure is invoked with CPPFLAGS=-I/tmp on the command line the CPPFLAGS environment variable is simply ignored, and only -I/tmp is used, which causes the compilation failure reported by Siegmar. at this stage, i do not know the best way to solve this issue : one option is not to pass CPPFLAGS=-I/tmp to the sub configure an other option is not to set the CPPFLAGS environment variable but invoke the sub configure with "CPPFLAGS=$CPPFLAGS" note this issue might not be limited to CPPFLAGS handling could you please advise on how to move forward ? Cheers, Gilles On Wed, Oct 7, 2015 at 4:42 PM, Siegmar Grosswrote: Hi, I tried to build openmpi-v2.x-dev-415-g5c9b192 and openmpi-dev-2696-gd579a07 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. I got the following error on all platforms with gcc and with Sun C only on my Linux machine. I've already reported the problem September 8th for the master trunk (at that time I didn't have the problem for the v2.x trunk. I use the following configure command. ../openmpi-dev-2696-gd579a07/configure \ --prefix=/usr/local/openmpi-master_64_gcc \ --libdir=/usr/local/openmpi-master_64_gcc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ --with-jdk-headers=/usr/local/jdk1.8.0/include \ JAVA_HOME=/usr/local/jdk1.8.0 \ LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \ CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ CPPFLAGS="" CXXCPPFLAGS="" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-std=c11 -m64" \ --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc openmpi-v2.x-dev-415-g5c9b192: == linpc1 openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc 135 tail -15 log.make.Linux.x86_64.64_gcc CC src/class/pmix_pointer_array.lo CC src/class/pmix_hash_table.lo CC src/include/pmix_globals.lo In file included from ../../../../../../openmpi-v2.x-dev-415-g5c9b192/opal/mca/pmix/pmix1xx/pmix/src/include/pmix_globals.c:19:0: /export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192/opal/mca/pmix/pmix1xx/pmix/include/private/types.h:43:27: fatal error: opal/mca/event/libevent2022/libevent2022.h: No such file or directory compilation terminated. make[4]: *** [src/include/pmix_globals.lo] Error 1 make[4]: Leaving directory `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal' make: *** [all-recursive] Error 1 linpc1 openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc 135 openmpi-dev-2696-gd579a07: == linpc1 openmpi-dev-2696-gd579a07-Linux.x86_64.64_gcc 158 tail -15 log.make.Linux.x86_64.64_gcc CC src/class/pmix_pointer_array.lo CC src/class/pmix_hash_table.lo CC src/include/pmix_globals.lo In file included from ../../../../../../openmpi-dev-2696-gd579a07/opal/mca/pmix/pmix1xx/pmix/src/include/pmix_globals.c:19:0: /export2/src/openmpi-master/openmpi-dev-2696-gd579a07/opal/mca/pmix/pmix1xx/pmix/include/private/types.h:43:27: fatal error: opal/mca/event/libevent2022/libevent2022.h: No such file or directory compilation terminated. make[4]: *** [src/include/pmix_globals.lo] Error 1 make[4]: Leaving directory `/export2/src/openmpi-master/openmpi-dev-2696-gd579a07-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory
[hwloc-devel] Create success (hwloc git 1.11.0-91-g010b4b6)
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.11.0-91-g010b4b6 Start time: Wed Oct 14 21:06:24 EDT 2015 End time: Wed Oct 14 21:08:02 EDT 2015 Your friendly daemon, Cyrador
[hwloc-devel] Create success (hwloc git 1.10.1-71-g48f9ddd)
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.10.1-71-g48f9ddd Start time: Wed Oct 14 21:04:51 EDT 2015 End time: Wed Oct 14 21:06:23 EDT 2015 Your friendly daemon, Cyrador
[hwloc-devel] Create success (hwloc git 1.9.1-66-ga20252d)
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.9.1-66-ga20252d Start time: Wed Oct 14 21:03:05 EDT 2015 End time: Wed Oct 14 21:04:51 EDT 2015 Your friendly daemon, Cyrador
[hwloc-devel] Create success (hwloc git dev-811-gdaaf59f)
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-811-gdaaf59f Start time: Wed Oct 14 21:01:02 EDT 2015 End time: Wed Oct 14 21:02:55 EDT 2015 Your friendly daemon, Cyrador
Re: [OMPI devel] 16 byte real in Fortran
The INTEGER*n, LOGICAL*n, REAL*n, etc., syntax has never been legal Fortran. Fortran originally had only INTEGER, REAL, DOUBLE PRECISION, and COMPLEX numeric types. Fortran 90 added the notion of a KIND of numeric, but left unspecified the mapping of numeric KINDs to processor-specific storage. KIND can be thought of as an opaque identifier. There is no requirement, for example that KIND n means a variable occupies n bytes of storage, though this is commonly done. (As is the association of KIND=1 to REAL and KIND=2 to DOUBLE PRECISION.) Instead, the language provides portable means of specifying the desired behavior of an available KIND, such as digits of precision. Unfortunately, when marshalling data for interchange, bits matter—the number and their meaning. High-level languages don't support such concepts very well. Starting with C99 (Section 7.18.1), C forces the compiler implementation to define macros for supported integer widths (in bits). However, like Fortran, there is no requirement that any exact number of bits be supported (Section 7.18.1.1); the standard only requires integer types with a minimum of 8, 16, 32, and 64 bits (Section 7.18.1.2). Nothing is said at all about floating-point data types and the correspondence with the integer types. This is what APIs like OpenMPI have to struggle with in the real world. Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov On 14 Oct 2015, at 3:38 PM, Jeff Squyres (jsquyres) wrote: > On Oct 14, 2015, at 5:53 PM, Vladimír Fukawrote: >> >>> As that ticket notes if REAL*16 <> long double Open MPI should be >>> disabling redutions on MPI_REAL16. I can take a look and see if I can >>> determine why that is not working as expected. >> >> Does it really need to be just disabled when the `real(real128)` is >> actually equivalent to c_long_double? Wouldn't making the explicit >> interfaces to MPI_Send and others to accept `real(real128)` make more >> sense? As I wrote in the stackoverflow post, the MPI standard (3.1, >> pages 628 and 674) is not very clear if MPI_REAL16 corresponds to >> real*16 or real(real128) if these differ, but making it correspond to >> real(real128) might be reasonable. > > As I understand it, real*16 is not a real type -- it's a commonly-used type > and supported by many (all?) compilers, but it's not actually defined in the > Fortran spec. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18170.php
Re: [OMPI devel] orte-dvm / orte-submit race condition
Hi Ralph, > On 15 Oct 2015, at 0:26 , Ralph Castainwrote: > Okay, so each orte-submit is reporting job has launched, which means the hang > is coming while waiting to hear the job completed. Are you sure that orte-dvm > believes the job has completed? No, I'm not. > In other words, when you say that you observe the job as completing, are you > basing that on some output from orte-dvm, or because the procs have exited, > or...? ... because the tasks have created their output. > I can send you a patch tonight that would cause orte-dvm to emit a "job > completed" message when it determines each job has terminated - might help us > take the next step. Great. > I'm wondering if orte-dvm thinks the job is still running, and the race > condition is in that area (as opposed to being in orte-submit itself) Do some counts from the output of orte-dvm provide some hints? $ grep "Releasing job data.*INVALID" dvm_output.txt |wc -l 42 $ grep "ORTE_DAEMON_SPAWN_JOB_CMD" dvm_output.txt |wc -l 42 $ grep "ORTE_DAEMON_ADD_LOCAL_PROCS" dvm_output.txt |wc -l 42 $ grep "sess_dir_finalize" dvm_output.txt |wc -l 35 In other words, the "[netbook:] sess_dir_finalize: proc session dir does not exist" message doesn't show up for the hanging ones, which could support your question that the orte-dvm is at fault. Gr, Mark
Re: [OMPI devel] 16 byte real in Fortran
On Oct 14, 2015, at 5:53 PM, Vladimír Fukawrote: > >> As that ticket notes if REAL*16 <> long double Open MPI should be >> disabling redutions on MPI_REAL16. I can take a look and see if I can >> determine why that is not working as expected. > > Does it really need to be just disabled when the `real(real128)` is > actually equivalent to c_long_double? Wouldn't making the explicit > interfaces to MPI_Send and others to accept `real(real128)` make more > sense? As I wrote in the stackoverflow post, the MPI standard (3.1, > pages 628 and 674) is not very clear if MPI_REAL16 corresponds to > real*16 or real(real128) if these differ, but making it correspond to > real(real128) might be reasonable. As I understand it, real*16 is not a real type -- it's a commonly-used type and supported by many (all?) compilers, but it's not actually defined in the Fortran spec. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] orte-dvm / orte-submit race condition
Okay, so each orte-submit is reporting job has launched, which means the hang is coming while waiting to hear the job completed. Are you sure that orte-dvm believes the job has completed? In other words, when you say that you observe the job as completing, are you basing that on some output from orte-dvm, or because the procs have exited, or...? I can send you a patch tonight that would cause orte-dvm to emit a "job completed" message when it determines each job has terminated - might help us take the next step. I'm wondering if orte-dvm thinks the job is still running, and the race condition is in that area (as opposed to being in orte-submit itself) On Wed, Oct 14, 2015 at 1:01 PM, Mark Santcrooswrote: > Hi Ralph, > > On 14 Oct 2015, at 21:50 , Ralph Castain wrote: > > I wonder if they might be getting duplicate process names if started > quickly enough. Do you get the "job has launched" message (orte-submit > outputs a message after orte-dvm responds that the job launched)? > > Based on the output below I would say that both columns with IDs are > unique. > > Thanks > > Mark > > $ head orte-log.txt > [netbook:90327] Job [24532,1] has launched > [netbook:90326] Job [24532,2] has launched > [netbook:90331] Job [24532,3] has launched > [netbook:90330] Job [24532,4] has launched > [netbook:90332] Job [24532,5] has launched > [netbook:90328] Job [24532,6] has launched > [netbook:90329] Job [24532,7] has launched > [netbook:90325] Job [24532,8] has launched > [netbook:90335] Job [24532,9] has launched > [netbook:90333] Job [24532,10] has launched > > $ cat orte-log.txt | cut -f1 -d" "| sort | uniq -c | wc -l > 42 > $ cat orte-log.txt | cut -f3 -d" "| sort | uniq -c | wc -l > 42 > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18167.php >
Re: [OMPI devel] 16 byte real in Fortran
> As that ticket notes if REAL*16 <> long double Open MPI should be > disabling redutions on MPI_REAL16. I can take a look and see if I can > determine why that is not working as expected. Does it really need to be just disabled when the `real(real128)` is actually equivalent to c_long_double? Wouldn't making the explicit interfaces to MPI_Send and others to accept `real(real128)` make more sense? As I wrote in the stackoverflow post, the MPI standard (3.1, pages 628 and 674) is not very clear if MPI_REAL16 corresponds to real*16 or real(real128) if these differ, but making it correspond to real(real128) might be reasonable. Vladimir 2015-10-14 14:40 GMT+01:00 Vladimír Fuka: > Hello, > > I have a problem with using the quadruple (128bit) or extended > (80bit) precision reals in Fortran. I did my tests with gfortran-4.8.5 > and OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2), but others confirmed > this behaviour for more recent versions at > http://stackoverflow.com/questions/33109040/strange-result-of-mpi-allreduce-for-16-byte-real?noredirect=1#comment54060649_33109040 > . > > When I try to use REAL*16 variables (or equivalent kind-based > definition) and MPI_REAL16 the reductions don't give correct results > (see the link for the exact code). I was pointed to this issue ticket > https://github.com/open-mpi/ompi/issues/63. > > I thought, maybe the underlying long double is 80-bit extended > precision then and I tried to use REAL*10 variables and MPI_REAL16. I > actually received a correct answer from the reduction, but when I > tried to use REAL*10 or REAL(10) I am getting > > Error: There is no specific subroutine for the generic 'mpi_recv' at (1) > Error: There is no specific subroutine for the generic 'mpi_ssend' at (1) > > That is strange, because I should be able to use even types and array > ranks which I construct myself in point to point send/receives and > which are unknown to the MPI library, so the explicit interface should > not be required. > > Is there a correct way how to use the extended or quadruple precision > in OpenMPI? My intended usage is mainly checking if differences seen > numerical computations are getting smaller with increasing precision > and can therefore be attributed to rounding errors. If not they could > be a sign of a bug. > >Best regards, > > Vladimir
Re: [OMPI devel] orte-dvm / orte-submit race condition
Hi Ralph, > On 14 Oct 2015, at 21:50 , Ralph Castainwrote: > I wonder if they might be getting duplicate process names if started quickly > enough. Do you get the "job has launched" message (orte-submit outputs a > message after orte-dvm responds that the job launched)? Based on the output below I would say that both columns with IDs are unique. Thanks Mark $ head orte-log.txt [netbook:90327] Job [24532,1] has launched [netbook:90326] Job [24532,2] has launched [netbook:90331] Job [24532,3] has launched [netbook:90330] Job [24532,4] has launched [netbook:90332] Job [24532,5] has launched [netbook:90328] Job [24532,6] has launched [netbook:90329] Job [24532,7] has launched [netbook:90325] Job [24532,8] has launched [netbook:90335] Job [24532,9] has launched [netbook:90333] Job [24532,10] has launched $ cat orte-log.txt | cut -f1 -d" "| sort | uniq -c | wc -l 42 $ cat orte-log.txt | cut -f3 -d" "| sort | uniq -c | wc -l 42
Re: [OMPI devel] orte-dvm / orte-submit race condition
I wonder if they might be getting duplicate process names if started quickly enough. Do you get the "job has launched" message (orte-submit outputs a message after orte-dvm responds that the job launched)? On Wed, Oct 14, 2015 at 12:04 PM, Mark Santcrooswrote: > Hi, > > By hammering on a DVM with orte-submit I can reproducibly make orte-submit > not return, but hang instead. > The task is executed correctly though. > > It can be reproduced using the small snippet below. > Switching from sequential to "concurrent" execution of the orte-submit's > triggers the effect. > > Note that when I ctrl-c the orte-submit, I can re-use the DVM, so my hunch > would be that its a client-side issue. > > What MCA logging parameters might give more insight of whats happening? > > Thanks! > > Mark > > > > $ cat > orte_test.sh > #!/bin/sh > > for i in $(seq 42): > do > # GOOD > #orte-submit --hnp file:dvm_uri -np 1 /bin/date > > # BAD > orte-submit --hnp file:dvm_uri -np 1 /bin/date & > done > wait > ^D > $ chmod +x orte_test.sh > $ orte-dvm --report-uri dvm_uri & > DVM ready > $ ./orte_test.sh > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18165.php >
[OMPI devel] orte-dvm / orte-submit race condition
Hi, By hammering on a DVM with orte-submit I can reproducibly make orte-submit not return, but hang instead. The task is executed correctly though. It can be reproduced using the small snippet below. Switching from sequential to "concurrent" execution of the orte-submit's triggers the effect. Note that when I ctrl-c the orte-submit, I can re-use the DVM, so my hunch would be that its a client-side issue. What MCA logging parameters might give more insight of whats happening? Thanks! Mark $ cat > orte_test.sh #!/bin/sh for i in $(seq 42): do # GOOD #orte-submit --hnp file:dvm_uri -np 1 /bin/date # BAD orte-submit --hnp file:dvm_uri -np 1 /bin/date & done wait ^D $ chmod +x orte_test.sh $ orte-dvm --report-uri dvm_uri & DVM ready $ ./orte_test.sh
Re: [OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x
On Oct 14, 2015, at 12:48 PM, Nathan Hjelmwrote: > > I think this is from a known issue. Try applying this and run again: > > https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch The good news is that if this fixes your problem, the fix is already included in the upcoming v1.10.1 release. > -Nathan > > On Wed, Oct 14, 2015 at 06:33:07PM +0200, Paul Kapinos wrote: >> Dear Open MPI developer, >> >> We're puzzled by reproducible performance (bandwidth) penalty observed when >> comparing measurements via InfibiBand between two nodes, OpenMPI/1.10.0 >> compiled with *GCC/5.2* instead of GCC 4.8 and Intel compiler. >> >> Take a look at the attached picture of two measurements of NetPIPE >> http://bitspjoule.org/netpipe/ benchmark done with one MPI rank per node, >> communicating via QDR InfiniBand (y axis: Mbps, y axis: sample number) >> >> Up to sample 64 (8195 bytes message size) the achieved performance is >> virtually the same; from sample 65 (12285 bytes, *less* than 12k) the >> version of GCC compiled using GCC 5.2 suffer form 20%+ penalty in bandwidth. >> >> The result is reproducible and independent from nodes and ever linux >> distribution (both Scientific Linux 6 and CentOS 7 have the same results). >> Both C and Fortran benchmarks offer the very same behaviour so it is *not* >> an f08 issue. >> >> The acchieved bandwidth is definitely IB-range (gigabytes per second), the >> communication is running via InfinfiBand in all cases (no failback to IP, >> huh). >> >> The compile line is the same; the output of ompi_info --all and --params is >> the very same (cf. attachments) up to added support for fortran-08 in /5.2 >> version. >> >> We know about existence of 'eager_limit' parameter, which is *not* changed >> and is 12288 in both versions (this is *less* that the first distinguishing >> sample). >> >> Again, for us the *only* difference is usage of other (new) GCC release. >> >> Any idea about this 20%+ bandwidth loss? >> >> Best >> >> Paul Kapinos >> -- >> Dipl.-Inform. Paul Kapinos - High Performance Computing, >> RWTH Aachen University, IT Center >> Seffenter Weg 23, D 52074 Aachen (Germany) >> Tel: +49 241/80-24915 > > >> MCA btl: parameter "btl_openib_verbose" (current value: >> "false", data source: default, level: 9 dev/all, type: bool) >> Output some verbose OpenIB BTL information (0 = no >> output, nonzero = output) >> Valid values: 0: f|false|disabled, 1: t|true|enabled >> MCA btl: parameter "btl_openib_warn_no_device_params_found" >> (current value: "true", data source: default, level: 9 dev/all, type: bool, >> synonyms: btl_openib_warn_no_hca_params_found) >> Warn when no device-specific parameters are found >> in the INI file specified by the btl_openib_device_param_files MCA parameter >> (0 = do not warn; any other value = warn) >> Valid values: 0: f|false|disabled, 1: t|true|enabled >> MCA btl: parameter "btl_openib_warn_no_hca_params_found" >> (current value: "true", data source: default, level: 9 dev/all, type: bool, >> deprecated, synonym of: btl_openib_warn_no_device_params_found) >> Warn when no device-specific parameters are found >> in the INI file specified by the btl_openib_device_param_files MCA parameter >> (0 = do not warn; any other value = warn) >> Valid values: 0: f|false|disabled, 1: t|true|enabled >> MCA btl: parameter "btl_openib_warn_default_gid_prefix" >> (current value: "true", data source: default, level: 9 dev/all, type: bool) >> Warn when there is more than one active ports and >> at least one of them connected to the network with only default GID prefix >> configured (0 = do not warn; any other value = warn) >> Valid values: 0: f|false|disabled, 1: t|true|enabled >> MCA btl: parameter "btl_openib_warn_nonexistent_if" (current >> value: "true", data source: default, level: 9 dev/all, type: bool) >> Warn if non-existent devices and/or ports are >> specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not warn; >> any other value = warn) >> Valid values: 0: f|false|disabled, 1: t|true|enabled >> MCA btl: parameter "btl_openib_abort_not_enough_reg_mem" >> (current value: "false", data source: default, level: 9 dev/all, type: bool) >> If there is not enough registered memory available >> on the system for Open MPI to function properly, Open MPI will issue a >> warning. If this MCA parameter is set to true, then Open MPI will also >> abort all MPI jobs (0 = warn, but do not abort; any other value = warn and >> abort) >> Valid values: 0: f|false|disabled, 1:
Re: [OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x
I think this is from a known issue. Try applying this and run again: https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch -Nathan On Wed, Oct 14, 2015 at 06:33:07PM +0200, Paul Kapinos wrote: > Dear Open MPI developer, > > We're puzzled by reproducible performance (bandwidth) penalty observed when > comparing measurements via InfibiBand between two nodes, OpenMPI/1.10.0 > compiled with *GCC/5.2* instead of GCC 4.8 and Intel compiler. > > Take a look at the attached picture of two measurements of NetPIPE > http://bitspjoule.org/netpipe/ benchmark done with one MPI rank per node, > communicating via QDR InfiniBand (y axis: Mbps, y axis: sample number) > > Up to sample 64 (8195 bytes message size) the achieved performance is > virtually the same; from sample 65 (12285 bytes, *less* than 12k) the > version of GCC compiled using GCC 5.2 suffer form 20%+ penalty in bandwidth. > > The result is reproducible and independent from nodes and ever linux > distribution (both Scientific Linux 6 and CentOS 7 have the same results). > Both C and Fortran benchmarks offer the very same behaviour so it is *not* > an f08 issue. > > The acchieved bandwidth is definitely IB-range (gigabytes per second), the > communication is running via InfinfiBand in all cases (no failback to IP, > huh). > > The compile line is the same; the output of ompi_info --all and --params is > the very same (cf. attachments) up to added support for fortran-08 in /5.2 > version. > > We know about existence of 'eager_limit' parameter, which is *not* changed > and is 12288 in both versions (this is *less* that the first distinguishing > sample). > > Again, for us the *only* difference is usage of other (new) GCC release. > > Any idea about this 20%+ bandwidth loss? > > Best > > Paul Kapinos > -- > Dipl.-Inform. Paul Kapinos - High Performance Computing, > RWTH Aachen University, IT Center > Seffenter Weg 23, D 52074 Aachen (Germany) > Tel: +49 241/80-24915 > MCA btl: parameter "btl_openib_verbose" (current value: > "false", data source: default, level: 9 dev/all, type: bool) > Output some verbose OpenIB BTL information (0 = no > output, nonzero = output) > Valid values: 0: f|false|disabled, 1: t|true|enabled > MCA btl: parameter "btl_openib_warn_no_device_params_found" > (current value: "true", data source: default, level: 9 dev/all, type: bool, > synonyms: btl_openib_warn_no_hca_params_found) > Warn when no device-specific parameters are found > in the INI file specified by the btl_openib_device_param_files MCA parameter > (0 = do not warn; any other value = warn) > Valid values: 0: f|false|disabled, 1: t|true|enabled > MCA btl: parameter "btl_openib_warn_no_hca_params_found" > (current value: "true", data source: default, level: 9 dev/all, type: bool, > deprecated, synonym of: btl_openib_warn_no_device_params_found) > Warn when no device-specific parameters are found > in the INI file specified by the btl_openib_device_param_files MCA parameter > (0 = do not warn; any other value = warn) > Valid values: 0: f|false|disabled, 1: t|true|enabled > MCA btl: parameter "btl_openib_warn_default_gid_prefix" > (current value: "true", data source: default, level: 9 dev/all, type: bool) > Warn when there is more than one active ports and > at least one of them connected to the network with only default GID prefix > configured (0 = do not warn; any other value = warn) > Valid values: 0: f|false|disabled, 1: t|true|enabled > MCA btl: parameter "btl_openib_warn_nonexistent_if" (current > value: "true", data source: default, level: 9 dev/all, type: bool) > Warn if non-existent devices and/or ports are > specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not warn; > any other value = warn) > Valid values: 0: f|false|disabled, 1: t|true|enabled > MCA btl: parameter "btl_openib_abort_not_enough_reg_mem" > (current value: "false", data source: default, level: 9 dev/all, type: bool) > If there is not enough registered memory available > on the system for Open MPI to function properly, Open MPI will issue a > warning. If this MCA parameter is set to true, then Open MPI will also abort > all MPI jobs (0 = warn, but do not abort; any other value = warn and abort) > Valid values: 0: f|false|disabled, 1: t|true|enabled > MCA btl: parameter "btl_openib_poll_cq_batch" (current > value: "256", data source: default, level: 9 dev/all, type: unsigned) > Retrieve up to poll_cq_batch completions from CQ > MCA btl: parameter
[OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x
Dear Open MPI developer, We're puzzled by reproducible performance (bandwidth) penalty observed when comparing measurements via InfibiBand between two nodes, OpenMPI/1.10.0 compiled with *GCC/5.2* instead of GCC 4.8 and Intel compiler. Take a look at the attached picture of two measurements of NetPIPE http://bitspjoule.org/netpipe/ benchmark done with one MPI rank per node, communicating via QDR InfiniBand (y axis: Mbps, y axis: sample number) Up to sample 64 (8195 bytes message size) the achieved performance is virtually the same; from sample 65 (12285 bytes, *less* than 12k) the version of GCC compiled using GCC 5.2 suffer form 20%+ penalty in bandwidth. The result is reproducible and independent from nodes and ever linux distribution (both Scientific Linux 6 and CentOS 7 have the same results). Both C and Fortran benchmarks offer the very same behaviour so it is *not* an f08 issue. The acchieved bandwidth is definitely IB-range (gigabytes per second), the communication is running via InfinfiBand in all cases (no failback to IP, huh). The compile line is the same; the output of ompi_info --all and --params is the very same (cf. attachments) up to added support for fortran-08 in /5.2 version. We know about existence of 'eager_limit' parameter, which is *not* changed and is 12288 in both versions (this is *less* that the first distinguishing sample). Again, for us the *only* difference is usage of other (new) GCC release. Any idea about this 20%+ bandwidth loss? Best Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 MCA btl: parameter "btl_openib_verbose" (current value: "false", data source: default, level: 9 dev/all, type: bool) Output some verbose OpenIB BTL information (0 = no output, nonzero = output) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_no_device_params_found" (current value: "true", data source: default, level: 9 dev/all, type: bool, synonyms: btl_openib_warn_no_hca_params_found) Warn when no device-specific parameters are found in the INI file specified by the btl_openib_device_param_files MCA parameter (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_no_hca_params_found" (current value: "true", data source: default, level: 9 dev/all, type: bool, deprecated, synonym of: btl_openib_warn_no_device_params_found) Warn when no device-specific parameters are found in the INI file specified by the btl_openib_device_param_files MCA parameter (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_default_gid_prefix" (current value: "true", data source: default, level: 9 dev/all, type: bool) Warn when there is more than one active ports and at least one of them connected to the network with only default GID prefix configured (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_warn_nonexistent_if" (current value: "true", data source: default, level: 9 dev/all, type: bool) Warn if non-existent devices and/or ports are specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not warn; any other value = warn) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_abort_not_enough_reg_mem" (current value: "false", data source: default, level: 9 dev/all, type: bool) If there is not enough registered memory available on the system for Open MPI to function properly, Open MPI will issue a warning. If this MCA parameter is set to true, then Open MPI will also abort all MPI jobs (0 = warn, but do not abort; any other value = warn and abort) Valid values: 0: f|false|disabled, 1: t|true|enabled MCA btl: parameter "btl_openib_poll_cq_batch" (current value: "256", data source: default, level: 9 dev/all, type: unsigned) Retrieve up to poll_cq_batch completions from CQ MCA btl: parameter "btl_openib_device_param_files" (current value: "/opt/MPI/openmpi-1.10.0/linux/gcc/share/openmpi/mca-btl-openib-device-params.ini", data source: default, level: 9 dev/all, type: string, synonyms: btl_openib_hca_param_files) Colon-delimited list of INI-style files that contain device vendor/part-specific parameters (use semicolon for Windows)
Re: [OMPI devel] 16 byte real in Fortran
On Wed, Oct 14, 2015 at 02:40:00PM +0100, Vladimír Fuka wrote: > Hello, > > I have a problem with using the quadruple (128bit) or extended > (80bit) precision reals in Fortran. I did my tests with gfortran-4.8.5 > and OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2), but others confirmed > this behaviour for more recent versions at > http://stackoverflow.com/questions/33109040/strange-result-of-mpi-allreduce-for-16-byte-real?noredirect=1#comment54060649_33109040 > . > > When I try to use REAL*16 variables (or equivalent kind-based > definition) and MPI_REAL16 the reductions don't give correct results > (see the link for the exact code). I was pointed to this issue ticket > https://github.com/open-mpi/ompi/issues/63. As that ticket notes if REAL*16 <> long double Open MPI should be disabling redutions on MPI_REAL16. I can take a look and see if I can determine why that is not working as expected. > Is there a correct way how to use the extended or quadruple precision > in OpenMPI? My intended usage is mainly checking if differences seen > numerical computations are getting smaller with increasing precision > and can therefore be attributed to rounding errors. If not they could > be a sign of a bug. Take a look at the following article: http://dl.acm.org/citation.cfm?id=1988419=553203244=11814269 You may be able to use the method described to get the enhanced precision you need. -Nathan HPC-5, LANL pgp3p5D1g27uS.pgp Description: PGP signature
[OMPI devel] 16 byte real in Fortran
Hello, I have a problem with using the quadruple (128bit) or extended (80bit) precision reals in Fortran. I did my tests with gfortran-4.8.5 and OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2), but others confirmed this behaviour for more recent versions at http://stackoverflow.com/questions/33109040/strange-result-of-mpi-allreduce-for-16-byte-real?noredirect=1#comment54060649_33109040 . When I try to use REAL*16 variables (or equivalent kind-based definition) and MPI_REAL16 the reductions don't give correct results (see the link for the exact code). I was pointed to this issue ticket https://github.com/open-mpi/ompi/issues/63. I thought, maybe the underlying long double is 80-bit extended precision then and I tried to use REAL*10 variables and MPI_REAL16. I actually received a correct answer from the reduction, but when I tried to use REAL*10 or REAL(10) I am getting Error: There is no specific subroutine for the generic 'mpi_recv' at (1) Error: There is no specific subroutine for the generic 'mpi_ssend' at (1) That is strange, because I should be able to use even types and array ranks which I construct myself in point to point send/receives and which are unknown to the MPI library, so the explicit interface should not be required. Is there a correct way how to use the extended or quadruple precision in OpenMPI? My intended usage is mainly checking if differences seen numerical computations are getting smaller with increasing precision and can therefore be attributed to rounding errors. If not they could be a sign of a bug. Best regards, Vladimir
Re: [OMPI devel] [OMPI users] fatal error: openmpi-v2.x-dev-415-g5c9b192 andopenmpi-dev-2696-gd579a07
Folks, i was able to reproduce the issue by adding CPPFLAGS=-I/tmp to my configure command line. here is what happens : opal/mca/pmix/pmix1xx/configure.m4 set the CPPFLAGS environment variable with -I/tmp and include paths for hwloc and libevent then opal/mca/pmix/pmix1xx/pmix/configure is invoked with CPPFLAGS=-I/tmp on the command line the CPPFLAGS environment variable is simply ignored, and only -I/tmp is used, which causes the compilation failure reported by Siegmar. at this stage, i do not know the best way to solve this issue : one option is not to pass CPPFLAGS=-I/tmp to the sub configure an other option is not to set the CPPFLAGS environment variable but invoke the sub configure with "CPPFLAGS=$CPPFLAGS" note this issue might not be limited to CPPFLAGS handling could you please advise on how to move forward ? Cheers, Gilles On Wed, Oct 7, 2015 at 4:42 PM, Siegmar Grosswrote: > Hi, > > I tried to build openmpi-v2.x-dev-415-g5c9b192 and > openmpi-dev-2696-gd579a07 on my machines (Solaris 10 Sparc, Solaris 10 > x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. > I got the following error on all platforms with gcc and with Sun C only > on my Linux machine. I've already reported the problem September 8th > for the master trunk (at that time I didn't have the problem for the > v2.x trunk. I use the following configure command. > > ../openmpi-dev-2696-gd579a07/configure \ > --prefix=/usr/local/openmpi-master_64_gcc \ > --libdir=/usr/local/openmpi-master_64_gcc/lib64 \ > --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ > --with-jdk-headers=/usr/local/jdk1.8.0/include \ > JAVA_HOME=/usr/local/jdk1.8.0 \ > LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \ > CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \ > CPP="cpp" CXXCPP="cpp" \ > CPPFLAGS="" CXXCPPFLAGS="" \ > --enable-mpi-cxx \ > --enable-cxx-exceptions \ > --enable-mpi-java \ > --enable-heterogeneous \ > --enable-mpi-thread-multiple \ > --with-hwloc=internal \ > --without-verbs \ > --with-wrapper-cflags="-std=c11 -m64" \ > --with-wrapper-cxxflags="-m64" \ > --with-wrapper-fcflags="-m64" \ > --enable-debug \ > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc > > > openmpi-v2.x-dev-415-g5c9b192: > == > > linpc1 openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc 135 tail -15 > log.make.Linux.x86_64.64_gcc > CC src/class/pmix_pointer_array.lo > CC src/class/pmix_hash_table.lo > CC src/include/pmix_globals.lo > In file included from > ../../../../../../openmpi-v2.x-dev-415-g5c9b192/opal/mca/pmix/pmix1xx/pmix/src/include/pmix_globals.c:19:0: > /export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192/opal/mca/pmix/pmix1xx/pmix/include/private/types.h:43:27: > fatal error: opal/mca/event/libevent2022/libevent2022.h: No such file or > directory > compilation terminated. > make[4]: *** [src/include/pmix_globals.lo] Error 1 > make[4]: Leaving directory > `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' > make[3]: *** [all-recursive] Error 1 > make[3]: Leaving directory > `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' > make[2]: *** [all-recursive] Error 1 > make[2]: Leaving directory > `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory > `/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc/opal' > make: *** [all-recursive] Error 1 > linpc1 openmpi-v2.x-dev-415-g5c9b192-Linux.x86_64.64_gcc 135 > > > openmpi-dev-2696-gd579a07: > == > > linpc1 openmpi-dev-2696-gd579a07-Linux.x86_64.64_gcc 158 tail -15 > log.make.Linux.x86_64.64_gcc > CC src/class/pmix_pointer_array.lo > CC src/class/pmix_hash_table.lo > CC src/include/pmix_globals.lo > In file included from > ../../../../../../openmpi-dev-2696-gd579a07/opal/mca/pmix/pmix1xx/pmix/src/include/pmix_globals.c:19:0: > /export2/src/openmpi-master/openmpi-dev-2696-gd579a07/opal/mca/pmix/pmix1xx/pmix/include/private/types.h:43:27: > fatal error: opal/mca/event/libevent2022/libevent2022.h: No such file or > directory > compilation terminated. > make[4]: *** [src/include/pmix_globals.lo] Error 1 > make[4]: Leaving directory > `/export2/src/openmpi-master/openmpi-dev-2696-gd579a07-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' > make[3]: *** [all-recursive] Error 1 > make[3]: Leaving directory > `/export2/src/openmpi-master/openmpi-dev-2696-gd579a07-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx/pmix' > make[2]: *** [all-recursive] Error 1 > make[2]: Leaving directory > `/export2/src/openmpi-master/openmpi-dev-2696-gd579a07-Linux.x86_64.64_gcc/opal/mca/pmix/pmix1xx' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory >