Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
Hello, * Jeff Squyres wrote on Tue, Feb 17, 2009 at 07:01:01PM CET: > On Feb 17, 2009, at 11:18 AM, George Bosilca wrote: > >> I guess that if the free function supports the NULL pointer we should >> do the same... > > I'll agree with that if we know for sure that free(NULL) is universally > supported. You mentioned "a few man pages" -- how universal is this > support? The Cray T3E used to barf on free(NULL). I haven't personally encountered anything since. C89 requires it to work, and GNU software generally assumes it to work since a couple of years. Cheers, Ralf
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
On Feb 17, 2009, at 2:05 PM, George Bosilca wrote: I can confirm it at least on MAC OS X and Linux. Based on IEEE Std 1003.1-2008, if the argument of the free function is null, no action should occur. So I guess most POSIX compliant environments support the NULL argument. Sounds good. Terry just confirmed for me that it's also true for all versions of Solaris. It's amusing, though -- this check has been in the OMPI code base literally since the very beginning, and this is the first notable time it has been tripped. Should we leave it in there (and leave in my libevent change)? I.e., free(NULL) may be harmless, but it may signify a bug. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
I can confirm it at least on MAC OS X and Linux. Based on IEEE Std 1003.1-2008, if the argument of the free function is null, no action should occur. So I guess most POSIX compliant environments support the NULL argument. george. On Feb 17, 2009, at 13:01 , Jeff Squyres wrote: On Feb 17, 2009, at 11:18 AM, George Bosilca wrote: I guess that if the free function supports the NULL pointer we should do the same... I'll agree with that if we know for sure that free(NULL) is universally supported. You mentioned "a few man pages" -- how universal is this support? -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
On Feb 17, 2009, at 11:18 AM, George Bosilca wrote: I guess that if the free function supports the NULL pointer we should do the same... I'll agree with that if we know for sure that free(NULL) is universally supported. You mentioned "a few man pages" -- how universal is this support? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
I guess that if the free function supports the NULL pointer we should do the same... george. On Feb 17, 2009, at 07:35 , Jeff Squyres wrote: On Feb 16, 2009, at 9:16 PM, George Bosilca wrote: Based on several man pages, free is capable of handling a NULL argument. What is really puzzling is that on your system it doesn't ... I tried on two system a 64 bits Debian and on my MAC OS X with all memory allocator options on, and I'm unable to get such a warning :( Remember that the warning is in our code -- opal/util/malloc.c: - void opal_free(void *addr, const char *file, int line) { #if OMPI_ENABLE_DEBUG if (opal_malloc_debug_level > 1 && NULL == addr) { opal_output(opal_malloc_output, "Invalid free (%s, %d)", file, line); return; } #endif /* OMPI_ENABLE_DEBUG */ free(addr); } - Are you saying that we should remove this warning? -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
On Feb 16, 2009, at 9:16 PM, George Bosilca wrote: Based on several man pages, free is capable of handling a NULL argument. What is really puzzling is that on your system it doesn't ... I tried on two system a 64 bits Debian and on my MAC OS X with all memory allocator options on, and I'm unable to get such a warning :( Remember that the warning is in our code -- opal/util/malloc.c: - void opal_free(void *addr, const char *file, int line) { #if OMPI_ENABLE_DEBUG if (opal_malloc_debug_level > 1 && NULL == addr) { opal_output(opal_malloc_output, "Invalid free (%s, %d)", file, line); return; } #endif /* OMPI_ENABLE_DEBUG */ free(addr); } - Are you saying that we should remove this warning? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
Based on several man pages, free is capable of handling a NULL argument. What is really puzzling is that on your system it doesn't ... I tried on two system a 64 bits Debian and on my MAC OS X with all memory allocator options on, and I'm unable to get such a warning :( george. On Feb 16, 2009, at 20:13 , Jeff Squyres wrote: r20569 fixes the problem, but I'm not 100% sure it's the Right Way. Short version: now that we're guaranteeing to free the event base, we're exercising a code path that was never used before. Apparently the orted initializes the ev->timebase min_heap_t structure, but then never uses it. So the pointer to the array of events in the heap is still NULL when we get to the destructor. Previously, the destructor just unconditionally freed the array. I put in a NULL check, which avoids the problem. But it begs the question -- why is that data structure being initialized/freed if we're never using it? Is it something inherent in libevent? On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote: Unfortunately, this doesn't fully fix the problem -- I'm still getting bad frees: [16:47] svbu-mpi:~/mpi % ./hello stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) malloc debug: Invalid free (min_heap.h, 58) [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello [svbu-mpi001:27549] ** Parsing receive_queues stdout: Hello, world! I am 0 of 1 (svbu-mpi001) stderr: Hello, world! I am 0 of 1 (svbu-mpi001) malloc debug: Invalid free (min_heap.h, 58) On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) > New Revision: 20568 > URL: https://svn.open-mpi.org/trac/ompi/changeset/20568 > > Log: > Make sure we correctly unregister all persistent events > and signal handlers. > > Text files modified: > trunk/orte/orted/orted_main.c | 8 > trunk/orte/runtime/orte_wait.c | 4 ++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > Modified: trunk/orte/orted/orted_main.c > = > = > = > = > = > = > = > = > = = > --- trunk/orte/orted/orted_main.c (original) > +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -754,6 +754,14 @@ > exit(ORTE_ERROR_DEFAULT_EXIT_CODE); > } > > +/* Release all local signal handlers */ > +opal_event_del(_handler); > +opal_event_del(_handler); > +#ifndef __WINDOWS__ > +opal_signal_del(_handler); > +opal_signal_del(_handler); > +#endif /* __WINDOWS__ */ > + > /* Finalize and clean up ourselves */ > ret = orte_finalize(); > exit(ret); > > Modified: trunk/orte/runtime/orte_wait.c > = > = > = > = > = > = > = > = > = = > --- trunk/orte/runtime/orte_wait.c(original) > +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -517,8 +517,8 @@ > /* define the event to fire when someone writes to the pipe */ > opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL); > > - /* Add it to the active events, without a timeout */ > - opal_event_add(*event, NULL); > +/* Add it to the active events, without a timeout */ > +opal_event_add(*event, NULL); > > /* all done */ > return ORTE_SUCCESS; > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
r20569 fixes the problem, but I'm not 100% sure it's the Right Way. Short version: now that we're guaranteeing to free the event base, we're exercising a code path that was never used before. Apparently the orted initializes the ev->timebase min_heap_t structure, but then never uses it. So the pointer to the array of events in the heap is still NULL when we get to the destructor. Previously, the destructor just unconditionally freed the array. I put in a NULL check, which avoids the problem. But it begs the question -- why is that data structure being initialized/freed if we're never using it? Is it something inherent in libevent? On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote: Unfortunately, this doesn't fully fix the problem -- I'm still getting bad frees: [16:47] svbu-mpi:~/mpi % ./hello stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) malloc debug: Invalid free (min_heap.h, 58) [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello [svbu-mpi001:27549] ** Parsing receive_queues stdout: Hello, world! I am 0 of 1 (svbu-mpi001) stderr: Hello, world! I am 0 of 1 (svbu-mpi001) malloc debug: Invalid free (min_heap.h, 58) On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) > New Revision: 20568 > URL: https://svn.open-mpi.org/trac/ompi/changeset/20568 > > Log: > Make sure we correctly unregister all persistent events > and signal handlers. > > Text files modified: > trunk/orte/orted/orted_main.c | 8 > trunk/orte/runtime/orte_wait.c | 4 ++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > Modified: trunk/orte/orted/orted_main.c > = > = > = > = > = > = > = > = > == > --- trunk/orte/orted/orted_main.c (original) > +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -754,6 +754,14 @@ > exit(ORTE_ERROR_DEFAULT_EXIT_CODE); > } > > +/* Release all local signal handlers */ > +opal_event_del(_handler); > +opal_event_del(_handler); > +#ifndef __WINDOWS__ > +opal_signal_del(_handler); > +opal_signal_del(_handler); > +#endif /* __WINDOWS__ */ > + > /* Finalize and clean up ourselves */ > ret = orte_finalize(); > exit(ret); > > Modified: trunk/orte/runtime/orte_wait.c > = > = > = > = > = > = > = > = > == > --- trunk/orte/runtime/orte_wait.c(original) > +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -517,8 +517,8 @@ > /* define the event to fire when someone writes to the pipe */ > opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL); > > - /* Add it to the active events, without a timeout */ > - opal_event_add(*event, NULL); > +/* Add it to the active events, without a timeout */ > +opal_event_add(*event, NULL); > > /* all done */ > return ORTE_SUCCESS; > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
Unfortunately, this doesn't fully fix the problem -- I'm still getting bad frees: [16:47] svbu-mpi:~/mpi % ./hello stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) malloc debug: Invalid free (min_heap.h, 58) [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello [svbu-mpi001:27549] ** Parsing receive_queues stdout: Hello, world! I am 0 of 1 (svbu-mpi001) stderr: Hello, world! I am 0 of 1 (svbu-mpi001) malloc debug: Invalid free (min_heap.h, 58) On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) New Revision: 20568 URL: https://svn.open-mpi.org/trac/ompi/changeset/20568 Log: Make sure we correctly unregister all persistent events and signal handlers. Text files modified: trunk/orte/orted/orted_main.c | 8 trunk/orte/runtime/orte_wait.c | 4 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) Modified: trunk/orte/orted/orted_main.c = = = = = = = = == --- trunk/orte/orted/orted_main.c (original) +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) @@ -754,6 +754,14 @@ exit(ORTE_ERROR_DEFAULT_EXIT_CODE); } +/* Release all local signal handlers */ +opal_event_del(_handler); +opal_event_del(_handler); +#ifndef __WINDOWS__ +opal_signal_del(_handler); +opal_signal_del(_handler); +#endif /* __WINDOWS__ */ + /* Finalize and clean up ourselves */ ret = orte_finalize(); exit(ret); Modified: trunk/orte/runtime/orte_wait.c = = = = = = = = == --- trunk/orte/runtime/orte_wait.c (original) +++ trunk/orte/runtime/orte_wait.c 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) @@ -517,8 +517,8 @@ /* define the event to fire when someone writes to the pipe */ opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL); - /* Add it to the active events, without a timeout */ - opal_event_add(*event, NULL); +/* Add it to the active events, without a timeout */ +opal_event_add(*event, NULL); /* all done */ return ORTE_SUCCESS; ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems