Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread Ralf Wildenhues
Hello,

* Jeff Squyres wrote on Tue, Feb 17, 2009 at 07:01:01PM CET:
> On Feb 17, 2009, at 11:18 AM, George Bosilca wrote:
>
>> I guess that if the free function supports the NULL pointer we should 
>> do the same...
>
> I'll agree with that if we know for sure that free(NULL) is universally 
> supported.  You mentioned "a few man pages" -- how universal is this 
> support?

The Cray T3E used to barf on free(NULL).  I haven't personally
encountered anything since.  C89 requires it to work, and GNU
software generally assumes it to work since a couple of years.

Cheers,
Ralf


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread Jeff Squyres

On Feb 17, 2009, at 2:05 PM, George Bosilca wrote:

I can confirm it at least on MAC OS X and Linux. Based on IEEE Std  
1003.1-2008, if the argument of the free function is null, no action  
should occur. So I guess most POSIX compliant environments support  
the NULL argument.


Sounds good.  Terry just confirmed for me that it's also true for all  
versions of Solaris.


It's amusing, though -- this check has been in the OMPI code base  
literally since the very beginning, and this is the first notable time  
it has been tripped.  Should we leave it in there (and leave in my  
libevent change)?  I.e., free(NULL) may be harmless, but it may  
signify a bug.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread George Bosilca
I can confirm it at least on MAC OS X and Linux. Based on IEEE Std  
1003.1-2008, if the argument of the free function is null, no action  
should occur. So I guess most POSIX compliant environments support the  
NULL argument.


  george.

On Feb 17, 2009, at 13:01 , Jeff Squyres wrote:


On Feb 17, 2009, at 11:18 AM, George Bosilca wrote:

I guess that if the free function supports the NULL pointer we  
should do the same...


I'll agree with that if we know for sure that free(NULL) is  
universally supported.  You mentioned "a few man pages" -- how  
universal is this support?


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread Jeff Squyres

On Feb 17, 2009, at 11:18 AM, George Bosilca wrote:

I guess that if the free function supports the NULL pointer we  
should do the same...


I'll agree with that if we know for sure that free(NULL) is  
universally supported.  You mentioned "a few man pages" -- how  
universal is this support?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread George Bosilca
I guess that if the free function supports the NULL pointer we should  
do the same...


  george.

On Feb 17, 2009, at 07:35 , Jeff Squyres wrote:


On Feb 16, 2009, at 9:16 PM, George Bosilca wrote:

Based on several man pages, free is capable of handling a NULL  
argument. What is really puzzling is that on your system it  
doesn't ...


I tried on two system a 64 bits Debian and on my MAC OS X with all  
memory allocator options on, and I'm unable to get such a warning :(


Remember that the warning is in our code -- opal/util/malloc.c:

-
void opal_free(void *addr, const char *file, int line)
{
#if OMPI_ENABLE_DEBUG
   if (opal_malloc_debug_level > 1 && NULL == addr) {
   opal_output(opal_malloc_output, "Invalid free (%s, %d)",  
file, line);

   return;
   }
#endif  /* OMPI_ENABLE_DEBUG */
   free(addr);
}
-

Are you saying that we should remove this warning?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread Jeff Squyres

On Feb 16, 2009, at 9:16 PM, George Bosilca wrote:

Based on several man pages, free is capable of handling a NULL  
argument. What is really puzzling is that on your system it  
doesn't ...


I tried on two system a 64 bits Debian and on my MAC OS X with all  
memory allocator options on, and I'm unable to get such a warning :(


Remember that the warning is in our code -- opal/util/malloc.c:

-
void opal_free(void *addr, const char *file, int line)
{
#if OMPI_ENABLE_DEBUG
if (opal_malloc_debug_level > 1 && NULL == addr) {
opal_output(opal_malloc_output, "Invalid free (%s, %d)",  
file, line);

return;
}
#endif  /* OMPI_ENABLE_DEBUG */
free(addr);
}
-

Are you saying that we should remove this warning?

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread George Bosilca
Based on several man pages, free is capable of handling a NULL  
argument. What is really puzzling is that on your system it doesn't ...


I tried on two system a 64 bits Debian and on my MAC OS X with all  
memory allocator options on, and I'm unable to get such a warning :(


  george.

On Feb 16, 2009, at 20:13 , Jeff Squyres wrote:


r20569 fixes the problem, but I'm not 100% sure it's the Right Way.

Short version: now that we're guaranteeing to free the event base,  
we're exercising a code path that was never used before.  Apparently  
the orted initializes the ev->timebase min_heap_t structure, but  
then never uses it.  So the pointer to the array of events in the  
heap is still NULL when we get to the destructor.  Previously, the  
destructor just unconditionally freed the array.  I put in a NULL  
check, which avoids the problem.


But it begs the question -- why is that data structure being  
initialized/freed if we're never using it?  Is it something inherent  
in libevent?



On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote:

Unfortunately, this doesn't fully fix the problem -- I'm still  
getting

bad frees:

[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:

> Author: bosilca
> Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
> New Revision: 20568
> URL: https://svn.open-mpi.org/trac/ompi/changeset/20568
>
> Log:
> Make sure we correctly unregister all persistent events
> and signal handlers.
>
> Text files modified:
>   trunk/orte/orted/orted_main.c  | 8 
>   trunk/orte/runtime/orte_wait.c | 4 ++--
>   2 files changed, 10 insertions(+), 2 deletions(-)
>
> Modified: trunk/orte/orted/orted_main.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
= 
=

> --- trunk/orte/orted/orted_main.c (original)
> +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -754,6 +754,14 @@
> exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
> }
>
> +/* Release all local signal handlers */
> +opal_event_del(_handler);
> +opal_event_del(_handler);
> +#ifndef __WINDOWS__
> +opal_signal_del(_handler);
> +opal_signal_del(_handler);
> +#endif  /* __WINDOWS__ */
> +
> /* Finalize and clean up ourselves */
> ret = orte_finalize();
> exit(ret);
>
> Modified: trunk/orte/runtime/orte_wait.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
= 
=

> --- trunk/orte/runtime/orte_wait.c(original)
> +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -517,8 +517,8 @@
> /* define the event to fire when someone writes to the pipe */
> opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);
>
> - /* Add it to the active events, without a timeout */
> - opal_event_add(*event, NULL);
> +/* Add it to the active events, without a timeout */
> +opal_event_add(*event, NULL);
>
> /* all done */
> return ORTE_SUCCESS;
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread Jeff Squyres

r20569 fixes the problem, but I'm not 100% sure it's the Right Way.

Short version: now that we're guaranteeing to free the event base,  
we're exercising a code path that was never used before.  Apparently  
the orted initializes the ev->timebase min_heap_t structure, but then  
never uses it.  So the pointer to the array of events in the heap is  
still NULL when we get to the destructor.  Previously, the destructor  
just unconditionally freed the array.  I put in a NULL check, which  
avoids the problem.


But it begs the question -- why is that data structure being  
initialized/freed if we're never using it?  Is it something inherent  
in libevent?



On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote:


Unfortunately, this doesn't fully fix the problem -- I'm still getting
bad frees:

[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:

> Author: bosilca
> Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
> New Revision: 20568
> URL: https://svn.open-mpi.org/trac/ompi/changeset/20568
>
> Log:
> Make sure we correctly unregister all persistent events
> and signal handlers.
>
> Text files modified:
>   trunk/orte/orted/orted_main.c  | 8 
>   trunk/orte/runtime/orte_wait.c | 4 ++--
>   2 files changed, 10 insertions(+), 2 deletions(-)
>
> Modified: trunk/orte/orted/orted_main.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
==

> --- trunk/orte/orted/orted_main.c (original)
> +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -754,6 +754,14 @@
> exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
> }
>
> +/* Release all local signal handlers */
> +opal_event_del(_handler);
> +opal_event_del(_handler);
> +#ifndef __WINDOWS__
> +opal_signal_del(_handler);
> +opal_signal_del(_handler);
> +#endif  /* __WINDOWS__ */
> +
> /* Finalize and clean up ourselves */
> ret = orte_finalize();
> exit(ret);
>
> Modified: trunk/orte/runtime/orte_wait.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
==

> --- trunk/orte/runtime/orte_wait.c(original)
> +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -517,8 +517,8 @@
> /* define the event to fire when someone writes to the pipe */
> opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);
>
> - /* Add it to the active events, without a timeout */
> - opal_event_add(*event, NULL);
> +/* Add it to the active events, without a timeout */
> +opal_event_add(*event, NULL);
>
> /* all done */
> return ORTE_SUCCESS;
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread Jeff Squyres
Unfortunately, this doesn't fully fix the problem -- I'm still getting  
bad frees:


[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
New Revision: 20568
URL: https://svn.open-mpi.org/trac/ompi/changeset/20568

Log:
Make sure we correctly unregister all persistent events
and signal handlers.

Text files modified:
  trunk/orte/orted/orted_main.c  | 8 
  trunk/orte/runtime/orte_wait.c | 4 ++--
  2 files changed, 10 insertions(+), 2 deletions(-)

Modified: trunk/orte/orted/orted_main.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/orte/orted/orted_main.c   (original)
+++ trunk/orte/orted/orted_main.c	2009-02-16 19:20:05 EST (Mon, 16  
Feb 2009)

@@ -754,6 +754,14 @@
exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
}

+/* Release all local signal handlers */
+opal_event_del(_handler);
+opal_event_del(_handler);
+#ifndef __WINDOWS__
+opal_signal_del(_handler);
+opal_signal_del(_handler);
+#endif  /* __WINDOWS__ */
+
/* Finalize and clean up ourselves */
ret = orte_finalize();
exit(ret);

Modified: trunk/orte/runtime/orte_wait.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/orte/runtime/orte_wait.c  (original)
+++ trunk/orte/runtime/orte_wait.c	2009-02-16 19:20:05 EST (Mon, 16  
Feb 2009)

@@ -517,8 +517,8 @@
/* define the event to fire when someone writes to the pipe */
opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);

-   /* Add it to the active events, without a timeout */
-   opal_event_add(*event, NULL);
+/* Add it to the active events, without a timeout */
+opal_event_add(*event, NULL);

/* all done */
return ORTE_SUCCESS;
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full



--
Jeff Squyres
Cisco Systems