Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain
I don't recommend our solution as a general approach - we moved the object 
instance to the framework base so it never goes out of memory.

Regardless, it seems to me that proper cleanup is the better solution, although 
it means work. I've asked that it be added to next week's telecon agenda so we 
can reach some resolution.


On Jul 18, 2014, at 9:35 AM, Gilles Gouaillardet 
 wrote:

> It would make sense, though I guess I always thought that was part of what 
> happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that 
> DEREGISTER would be the counter to INSTANCE, and I do want to keep this from 
> getting even more clunky - so maybe renaming INSTANCE to be REGISTER and 
> completing the initialization inside it would be the way to go. Or renaming 
> DEREGISTER to something more obviously the counter to INSTANCE?
> 
> 
> just so we are clear :
> 
> on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of a 
> function :
> It *statically* initializes a struct.
> 
> on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked 
> inside a function.
> 
> using OBJ_CLASS_REGISTER is not only about renaming, it also requires to move 
> all these invokations into functions.
> 
> my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is :
> - we do not need to move OBJ_CLASS_INSTANCE into functions
> - we can have two behaviours depending on OPAL_ENABLE_DEBUG :
> OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero (and 
> opal_class_initialize would still be invoked in opal_obj_new). that could 
> also be a bit faster than having only one OBJ_CLASS_REGISTER macro in 
> optimized mode.
> 
> that being said, i am also fine with simplifying this, remove 
> OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER
> 
> 
> about the bug you hit, did you already solve it and how ?
> a trivial workaround is not to dlclose the dynamic library (ok, that's 
> cheating ...)
> a simple workaround (if it is even doable) is to declare the class "somewhere 
> else" so the (library containing the) class struct is not dlclose'd before it 
> is invoked (ok, that's ugly ...).
> 
> what i wrote earlier was misleading :
> OBJ_CLASS_INSTANCE(class);
> foo = OBJ_NEW(class);
> then
> opal_class_t class_class = {...};
> foo->super.obj_class = _class;
> 
> class_class is no more accessible when the OBJ_RELEASE is called since the 
> library was dlclose'd, so you do not even get a change to invoke the 
> destructor ...
>  
> a possible workaround could be to malloc a copy of class_class, have 
> foo->super.obj_class point to it after each OBJ_NEW, and finally have its 
> cls_destruct_array point to NULL when closing the framework/component.
> (of course that causes a leak ...)
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15198.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain

On Jul 18, 2014, at 10:24 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> 1. If I remember correctly, this topic has already been raised in the Forum. 
> And the decision was to maintain the current behavior (tools and MPI 
> init/fini are independent/disconnected).
> 
> 2. Having to manually set a global flag in order to correctly finalize a 
> library is HORRIBLE by any reasonable CS standards.

As I said in my original note, we don't have to set a global flag. All you have 
to do is decrement the already-existing reference counter that tracks how many 
times we called init_util, indicating that you are done with it so it can go 
ahead and truly finalize on next invocation. This is a typical symmetrical 
operation. All we are doing is correctly communicating to the library that we 
don't want it to actually tear things down at this time.

> 
> 3. Let's not go in shadowy corners of the MPI_T usage, and stay mainstream. 
> Here is a partial snippet of the most usual way the tool interface is 
> supposed to be used.
> 
> MPI_T_init_thread(MPI_THREAD_SINGLE, );
> ...
> MPI_Init(, );
> MPI_Finalize();
>   
>   With the proposed patch, we clean up all OPAL memory as soon as we reach 
> the MPI_Finalize (aka. without the call to MPI_T_finalize).

Are you referring to Nathan's patch? In that case, your statement isn't correct 
- the destructor only gets run at the end of the user's program, and thus the 
OPAL memory will not be cleaned up until that time.

>  All MPI_T calls after MPI_Finalize will trigger a segfault.
> 
>   George.
> 
> 
> 
> On Thu, Jul 17, 2014 at 10:55 PM, Ralph Castain <r...@open-mpi.org> wrote:
> As I said, I don't know which solution is the one to follow - they both have 
> significant "ick" factors, though I wouldn't go so far as to characterize 
> either of them as "horrible". Not being "clean" after calling MPI_Finalize 
> seems just as strange.
> 
> Nathan and I did discuss the init-after-finalize issue, and he intends to 
> raise it with the Forum as it doesn't seem a logical thing to do. So that 
> issue may go away. Still leaves us pondering the right solution, and 
> hopefully coming up with something better than either of the ones we have so 
> far.
> 
> 
> On Jul 17, 2014, at 7:48 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> I think Case #1 is only a partial solution, as it only solves the example 
>> attached to the ticket. Based on my reading the the tool chapter calling 
>> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the 
>> patch. But this is not the major issue I have with this patch. From a coding 
>> perspective, it makes the initialization of OPAL horribly unnatural, 
>> requiring any other layer using OPAL to make a horrible gymnastic just to 
>> tear it down correctly (setting opal_init_util_init_extra to the right 
>> value).
>> 
>>   George.
>> 
>> 
>> 
>> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r <howa...@lanl.gov> 
>> wrote:
>> HI Folks,
>> 
>> I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
>> door to becoming dependent on availability of
>> ctor/dtor feature in future toolchains.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Wednesday, July 16, 2014 9:08 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function 
>> to opal
>> 
>> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
>> > I discussed this over IM with Nathan to try and get a better understanding 
>> > of the options. Basically, we have two approaches available to us:
>> >
>> > 1. my solution resolves the segv problem and eliminates leaks so long as 
>> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize 
>> > functions. This method will still leak memory if the user doesn't use MPI 
>> > after calling the MPI_T functions, but does mean that all memory used by 
>> > MPI will be released upon MPI_Finalize. So if the user program continues 
>> > beyond MPI, they won't be carrying the MPI memory footprint with them. 
>> > This continues our current behavior.
>> >
>> > 2. the destructor method, which release the MPI memory footprint upon 
>> > final program termination instead of at MPI_Finalize. This also solves the 
>> > segv and leak problems, and ensures that someone calling only the MPI_T 
>&

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread George Bosilca
1. If I remember correctly, this topic has already been raised in the
Forum. And the decision was to maintain the current behavior (tools and MPI
init/fini are independent/disconnected).

2. Having to manually set a global flag in order to correctly finalize a
library is HORRIBLE by any reasonable CS standards.

3. Let's not go in shadowy corners of the MPI_T usage, and stay mainstream.
Here is a partial snippet of the most usual way the tool interface is
supposed to be used.

MPI_T_init_thread(MPI_THREAD_SINGLE, );
...
MPI_Init(, );
MPI_Finalize();

  With the proposed patch, we clean up all OPAL memory as soon as we reach
the MPI_Finalize (aka. without the call to MPI_T_finalize).  All MPI_T
calls after MPI_Finalize will trigger a segfault.

  George.



On Thu, Jul 17, 2014 at 10:55 PM, Ralph Castain <r...@open-mpi.org> wrote:

> As I said, I don't know which solution is the one to follow - they both
> have significant "ick" factors, though I wouldn't go so far as to
> characterize either of them as "horrible". Not being "clean" after calling
> MPI_Finalize seems just as strange.
>
> Nathan and I did discuss the init-after-finalize issue, and he intends to
> raise it with the Forum as it doesn't seem a logical thing to do. So that
> issue may go away. Still leaves us pondering the right solution, and
> hopefully coming up with something better than either of the ones we have
> so far.
>
>
> On Jul 17, 2014, at 7:48 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
> I think Case #1 is only a partial solution, as it only solves the example
> attached to the ticket. Based on my reading the the tool chapter calling
> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the
> patch. But this is not the major issue I have with this patch. From a
> coding perspective, it makes the initialization of OPAL horribly unnatural,
> requiring any other layer using OPAL to make a horrible gymnastic just to
> tear it down correctly (setting opal_init_util_init_extra to the right
> value).
>
>   George.
>
>
>
> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r <howa...@lanl.gov>
> wrote:
>
>> HI Folks,
>>
>> I vote for solution #1.  Doesn't change current behavior.  Doesn't open
>> the door to becoming dependent on availability of
>> ctor/dtor feature in future toolchains.
>>
>> Howard
>>
>>
>> -----Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Wednesday, July 16, 2014 9:08 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor))
>> function to opal
>>
>> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
>> > I discussed this over IM with Nathan to try and get a better
>> understanding of the options. Basically, we have two approaches available
>> to us:
>> >
>> > 1. my solution resolves the segv problem and eliminates leaks so long
>> as the user calls MPI_Init/Finalize after calling the MPI_T init/finalize
>> functions. This method will still leak memory if the user doesn't use MPI
>> after calling the MPI_T functions, but does mean that all memory used by
>> MPI will be released upon MPI_Finalize. So if the user program continues
>> beyond MPI, they won't be carrying the MPI memory footprint with them. This
>> continues our current behavior.
>> >
>> > 2. the destructor method, which release the MPI memory footprint upon
>> final program termination instead of at MPI_Finalize. This also solves the
>> segv and leak problems, and ensures that someone calling only the MPI_T
>> init/finalize functions will be valgrind-clean, but means that a user
>> program that runs beyond MPI will carry the MPI memory footprint with them.
>> This is a change in our current behavior.
>>
>> Correct. Though the only thing we will carry around until termination is
>> the memory associated with opal/mca/if, opal/mca/event, opal_net,
>> opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and
>> opal_class. Not sure how much memory this is.
>>
>> -Nathan
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15193.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15194.php
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Gilles Gouaillardet
>
> It would make sense, though I guess I always thought that was part of what
> happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that
> DEREGISTER would be the counter to INSTANCE, and I do want to keep this
> from getting even more clunky - so maybe renaming INSTANCE to be REGISTER
> and completing the initialization inside it would be the way to go. Or
> renaming DEREGISTER to something more obviously the counter to INSTANCE?
>
>
just so we are clear :

on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of
a function :
It *statically* initializes a struct.

on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked
inside a function.

using OBJ_CLASS_REGISTER is not only about renaming, it also requires to
move all these invokations into functions.

my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is :
- we do not need to move OBJ_CLASS_INSTANCE into functions
- we can have two behaviours depending on OPAL_ENABLE_DEBUG :
OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero
(and opal_class_initialize would still be invoked in opal_obj_new). that
could also be a bit faster than having only one OBJ_CLASS_REGISTER macro in
optimized mode.

that being said, i am also fine with simplifying this, remove
OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER


about the bug you hit, did you already solve it and how ?
a trivial workaround is not to dlclose the dynamic library (ok, that's
cheating ...)
a simple workaround (if it is even doable) is to declare the class
"somewhere else" so the (library containing the) class struct is not
dlclose'd before it is invoked (ok, that's ugly ...).

what i wrote earlier was misleading :
OBJ_CLASS_INSTANCE(class);
foo = OBJ_NEW(class);
then
opal_class_t class_class = {...};
foo->super.obj_class = _class;

class_class is no more accessible when the OBJ_RELEASE is called since the
library was dlclose'd, so you do not even get a change to invoke the
destructor ...

a possible workaround could be to malloc a copy of class_class, have
foo->super.obj_class point to it after each OBJ_NEW, and finally have its
cls_destruct_array point to NULL when closing the framework/component.
(of course that causes a leak ...)

Cheers,

Gilles


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain

On Jul 18, 2014, at 8:25 AM, Gilles Gouaillardet 
 wrote:

> +1 for the overall idea !
> 
> On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castain  wrote:
>> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be matched 
>> by deregister at close of the framework/component that instanced it. Of 
>> course, that requires that we protect the class system against someone 
>> releasing/deconstructing an object after the class was deregistered since we 
>> don't know who might be using that class outside of where it was created.
>> 
> 
> my understanding is that in theory, we already have an issue and fortunatly, 
> we do not hit it :
> let's consider a framework/component that instanciate a class 
> (OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class 
> (OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE)
> if this framework/component ends up in a dynamic library that is dlclose'd 
> when the framework/component is no more used, then OBJ_RELEASE will try to 
> call the destructor which is no more accessible (since the lib was dlclose'd)

FWIW: Intel has hit that exact scenario in our testing and got a glorious segv 
out of it. We now have an assert for NULL in the base object macro's to warn us 
to fix it (which I can provide for review if we want to include it in our repo).

> 
> i could not experience such a scenario yet, and of course, this does not mean 
> there is no problem. i experienced a "kind of" similar situation described in 
> http://www.open-mpi.org/community/lists/devel/2014/06/14937.php
> 
> back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to 
> make this symmetric and easier to debug ?
> 
> currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a 
> given class is allocated. from opal_obj_new :
> if (0 == cls->cls_initialized) opal_class_initialize(cls);
> 
> that could be replaced by an error if 0 == cls->cls_initialized
> and OBJ_CLASS_REGISTER would simply call opal_class_initialize

It would make sense, though I guess I always thought that was part of what 
happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that 
DEREGISTER would be the counter to INSTANCE, and I do want to keep this from 
getting even more clunky - so maybe renaming INSTANCE to be REGISTER and 
completing the initialization inside it would be the way to go. Or renaming 
DEREGISTER to something more obviously the counter to INSTANCE?


> 
> of course, this change could be implemented only when compiled
> with OPAL_ENABLE_DEBUG
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15196.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Gilles Gouaillardet
+1 for the overall idea !

On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castain  wrote:
>
> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be
> matched by deregister at close of the framework/component that instanced
> it. Of course, that requires that we protect the class system against
> someone releasing/deconstructing an object after the class was deregistered
> since we don't know who might be using that class outside of where it was
> created.
>
> my understanding is that in theory, we already have an issue and
fortunatly, we do not hit it :
let's consider a framework/component that instanciate a class
(OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class
(OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE)
if this framework/component ends up in a dynamic library that is dlclose'd
when the framework/component is no more used, then OBJ_RELEASE will try to
call the destructor which is no more accessible (since the lib was
dlclose'd)

i could not experience such a scenario yet, and of course, this does not
mean there is no problem. i experienced a "kind of" similar situation
described in http://www.open-mpi.org/community/lists/devel/2014/06/14937.php

back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to
make this symmetric and easier to debug ?

currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a
given class is allocated. from opal_obj_new :
if (0 == cls->cls_initialized) opal_class_initialize(cls);

that could be replaced by an error if 0 == cls->cls_initialized
and OBJ_CLASS_REGISTER would simply call opal_class_initialize

of course, this change could be implemented only when compiled
with OPAL_ENABLE_DEBUG

Cheers,

Gilles


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain
I'm going to resurface this suggestion. Is there some reason why this wouldn't 
be the way to resolve the problem?

> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be matched 
> by deregister at close of the framework/component that instanced it. Of 
> course, that requires that we protect the class system against someone 
> releasing/deconstructing an object after the class was deregistered since we 
> don't know who might be using that class outside of where it was created.
> 
> * ensure each framework/component "deregisters" every declared MCA param when 
> finalizing/closing
> 
> * ensure every framework close gets called, and that every framework properly 
> closes all its components. We especially need to ensure that components that 
> were opened but not selected get closed!

I'm asking because it is apparent that this issue of reinitializing is going to 
recur under a variety of scenarios. The two methods we've discussed so far are 
really just bandaids - all we are doing is avoiding actually "finalizing" via 
different mechanisms. The root problem is that we *don't* cleanly finalize OPAL.

So why not address that problem? There is no technical reason we can't cleanly 
finalize the OPAL layer - is the root issue that we're just unwilling to make 
the effort to do it?



On Jul 17, 2014, at 7:55 PM, Ralph Castain <r...@open-mpi.org> wrote:

> As I said, I don't know which solution is the one to follow - they both have 
> significant "ick" factors, though I wouldn't go so far as to characterize 
> either of them as "horrible". Not being "clean" after calling MPI_Finalize 
> seems just as strange.
> 
> Nathan and I did discuss the init-after-finalize issue, and he intends to 
> raise it with the Forum as it doesn't seem a logical thing to do. So that 
> issue may go away. Still leaves us pondering the right solution, and 
> hopefully coming up with something better than either of the ones we have so 
> far.
> 
> 
> On Jul 17, 2014, at 7:48 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> I think Case #1 is only a partial solution, as it only solves the example 
>> attached to the ticket. Based on my reading the the tool chapter calling 
>> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the 
>> patch. But this is not the major issue I have with this patch. From a coding 
>> perspective, it makes the initialization of OPAL horribly unnatural, 
>> requiring any other layer using OPAL to make a horrible gymnastic just to 
>> tear it down correctly (setting opal_init_util_init_extra to the right 
>> value).
>> 
>>   George.
>> 
>> 
>> 
>> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r <howa...@lanl.gov> 
>> wrote:
>> HI Folks,
>> 
>> I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
>> door to becoming dependent on availability of
>> ctor/dtor feature in future toolchains.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Wednesday, July 16, 2014 9:08 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function 
>> to opal
>> 
>> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
>> > I discussed this over IM with Nathan to try and get a better understanding 
>> > of the options. Basically, we have two approaches available to us:
>> >
>> > 1. my solution resolves the segv problem and eliminates leaks so long as 
>> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize 
>> > functions. This method will still leak memory if the user doesn't use MPI 
>> > after calling the MPI_T functions, but does mean that all memory used by 
>> > MPI will be released upon MPI_Finalize. So if the user program continues 
>> > beyond MPI, they won't be carrying the MPI memory footprint with them. 
>> > This continues our current behavior.
>> >
>> > 2. the destructor method, which release the MPI memory footprint upon 
>> > final program termination instead of at MPI_Finalize. This also solves the 
>> > segv and leak problems, and ensures that someone calling only the MPI_T 
>> > init/finalize functions will be valgrind-clean, but means that a user 
>> > program that runs beyond MPI will carry the MPI memory footprint with 
>> > them. This is a change in our current behavior.
>> 
>> Correct. Though the only thing we will carry around until termination is the 
>> memory associated

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-17 Thread Ralph Castain
As I said, I don't know which solution is the one to follow - they both have 
significant "ick" factors, though I wouldn't go so far as to characterize 
either of them as "horrible". Not being "clean" after calling MPI_Finalize 
seems just as strange.

Nathan and I did discuss the init-after-finalize issue, and he intends to raise 
it with the Forum as it doesn't seem a logical thing to do. So that issue may 
go away. Still leaves us pondering the right solution, and hopefully coming up 
with something better than either of the ones we have so far.


On Jul 17, 2014, at 7:48 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> I think Case #1 is only a partial solution, as it only solves the example 
> attached to the ticket. Based on my reading the the tool chapter calling 
> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the 
> patch. But this is not the major issue I have with this patch. From a coding 
> perspective, it makes the initialization of OPAL horribly unnatural, 
> requiring any other layer using OPAL to make a horrible gymnastic just to 
> tear it down correctly (setting opal_init_util_init_extra to the right value).
> 
>   George.
> 
> 
> 
> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r <howa...@lanl.gov> 
> wrote:
> HI Folks,
> 
> I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
> door to becoming dependent on availability of
> ctor/dtor feature in future toolchains.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> Sent: Wednesday, July 16, 2014 9:08 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to 
> opal
> 
> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
> > I discussed this over IM with Nathan to try and get a better understanding 
> > of the options. Basically, we have two approaches available to us:
> >
> > 1. my solution resolves the segv problem and eliminates leaks so long as 
> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize 
> > functions. This method will still leak memory if the user doesn't use MPI 
> > after calling the MPI_T functions, but does mean that all memory used by 
> > MPI will be released upon MPI_Finalize. So if the user program continues 
> > beyond MPI, they won't be carrying the MPI memory footprint with them. This 
> > continues our current behavior.
> >
> > 2. the destructor method, which release the MPI memory footprint upon final 
> > program termination instead of at MPI_Finalize. This also solves the segv 
> > and leak problems, and ensures that someone calling only the MPI_T 
> > init/finalize functions will be valgrind-clean, but means that a user 
> > program that runs beyond MPI will carry the MPI memory footprint with them. 
> > This is a change in our current behavior.
> 
> Correct. Though the only thing we will carry around until termination is the 
> memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, 
> opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not 
> sure how much memory this is.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15193.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-17 Thread George Bosilca
I think Case #1 is only a partial solution, as it only solves the example
attached to the ticket. Based on my reading the the tool chapter calling
MPI_T_init after MPI_Finalize is legit, and this case is not covered by the
patch. But this is not the major issue I have with this patch. From a
coding perspective, it makes the initialization of OPAL horribly unnatural,
requiring any other layer using OPAL to make a horrible gymnastic just to
tear it down correctly (setting opal_init_util_init_extra to the right
value).

  George.



On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r <howa...@lanl.gov>
wrote:

> HI Folks,
>
> I vote for solution #1.  Doesn't change current behavior.  Doesn't open
> the door to becoming dependent on availability of
> ctor/dtor feature in future toolchains.
>
> Howard
>
>
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> Sent: Wednesday, July 16, 2014 9:08 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function
> to opal
>
> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
> > I discussed this over IM with Nathan to try and get a better
> understanding of the options. Basically, we have two approaches available
> to us:
> >
> > 1. my solution resolves the segv problem and eliminates leaks so long as
> the user calls MPI_Init/Finalize after calling the MPI_T init/finalize
> functions. This method will still leak memory if the user doesn't use MPI
> after calling the MPI_T functions, but does mean that all memory used by
> MPI will be released upon MPI_Finalize. So if the user program continues
> beyond MPI, they won't be carrying the MPI memory footprint with them. This
> continues our current behavior.
> >
> > 2. the destructor method, which release the MPI memory footprint upon
> final program termination instead of at MPI_Finalize. This also solves the
> segv and leak problems, and ensures that someone calling only the MPI_T
> init/finalize functions will be valgrind-clean, but means that a user
> program that runs beyond MPI will carry the MPI memory footprint with them.
> This is a change in our current behavior.
>
> Correct. Though the only thing we will carry around until termination is
> the memory associated with opal/mca/if, opal/mca/event, opal_net,
> opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and
> opal_class. Not sure how much memory this is.
>
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-17 Thread Paul Hargrove
On Thu, Jul 17, 2014 at 5:55 PM, George Bosilca  wrote:

> Are these also called for shared libraries?
>
  George.
>

If you are asking specifically about Solaris w/ the vendor compilers, then
apparently Yes:

-bash-3.00$ cat test.c
#include 
int X = 0;
__attribute__((__constructor__)) void hello(void) { X = 42; }
__attribute__((__destructor__)) void goodbye(void) { printf("X = %d\n", X);
}
-bash-3.00$ cc -fPIC -c test.c
-bash-3.00$ cc -shared -o libtest.so test.o

-bash-3.00$ cat main.c
int main(void) { return 0; }
-bash-3.00$ cc main.c -L. -ltest
-bash-3.00$ ./a.out
X = 42

That is the ancient toolchain in /usr/bin:

-bash-3.00$ cc -V
cc: Sun C 5.9 SunOS_sparc 2007/05/03
usage: cc [ options] files.  Use 'cc -flags' for details
-bash-3.00$ ld -V
ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.489


Same result with Solaris Studio 12.3 compilers:

-bash-3.00$ cc -V
cc: Sun C 5.12 SunOS_sparc 2011/11/16
-bash-3.00$ cc -fPIC -c test.c
-bash-3.00$ cc -shared -o libtest.so test.o
-bash-3.00$ cc main.c -L. -ltest
-bash-3.00$ ./a.out
X = 42


-Paul

>
>
> On Wed, Jul 16, 2014 at 3:36 PM, Paul Hargrove  wrote:
>
>>
>> On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelm  wrote:
>>
>>> Correction. xlc does support the destructor function attribute. The odd
>>> one out is PGI.
>>>
>>
>> Are the Solaris Studio compilers still of interest to the Open MPI
>> community?
>> If so, I've confirmed support using the following 5-line test on a
>> Solaris-10/SPARC platform.
>>
>> #include 
>> int X = 0;
>> __attribute__((__constructor__)) void hello(void) { X = 42; }
>> __attribute__((__destructor__)) void goodbye(void) { printf("X = %d\n",
>> X); }
>> int main(void) { return 0; }
>>
>>
>>
>> -Paul
>>
>>
>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15183.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15191.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-17 Thread George Bosilca
Are these also called for shared libraries?

  George.



On Wed, Jul 16, 2014 at 3:36 PM, Paul Hargrove  wrote:

>
> On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelm  wrote:
>
>> Correction. xlc does support the destructor function attribute. The odd
>> one out is PGI.
>>
>
> Are the Solaris Studio compilers still of interest to the Open MPI
> community?
> If so, I've confirmed support using the following 5-line test on a
> Solaris-10/SPARC platform.
>
> #include 
> int X = 0;
> __attribute__((__constructor__)) void hello(void) { X = 42; }
> __attribute__((__destructor__)) void goodbye(void) { printf("X = %d\n",
> X); }
> int main(void) { return 0; }
>
>
>
> -Paul
>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15183.php
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Paul Hargrove
On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelm  wrote:

> Correction. xlc does support the destructor function attribute. The odd
> one out is PGI.
>

Are the Solaris Studio compilers still of interest to the Open MPI
community?
If so, I've confirmed support using the following 5-line test on a
Solaris-10/SPARC platform.

#include 
int X = 0;
__attribute__((__constructor__)) void hello(void) { X = 42; }
__attribute__((__destructor__)) void goodbye(void) { printf("X = %d\n", X);
}
int main(void) { return 0; }



-Paul




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Kenneth A. Lloyd
What about providing garbage collection for both POSIX and MPI threads? This
problem hints at several underlying layers of "programming faults".

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 16, 2014 8:59 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function
to opal

I discussed this over IM with Nathan to try and get a better understanding
of the options. Basically, we have two approaches available to us:

1. my solution resolves the segv problem and eliminates leaks so long as the
user calls MPI_Init/Finalize after calling the MPI_T init/finalize
functions. This method will still leak memory if the user doesn't use MPI
after calling the MPI_T functions, but does mean that all memory used by MPI
will be released upon MPI_Finalize. So if the user program continues beyond
MPI, they won't be carrying the MPI memory footprint with them. This
continues our current behavior.

2. the destructor method, which release the MPI memory footprint upon final
program termination instead of at MPI_Finalize. This also solves the segv
and leak problems, and ensures that someone calling only the MPI_T
init/finalize functions will be valgrind-clean, but means that a user
program that runs beyond MPI will carry the MPI memory footprint with them.
This is a change in our current behavior.

I'm not sure which approach is best, but I think this captures the heart of
the decision.


On Jul 16, 2014, at 7:36 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
>> A number of issues have been raised as part of this discussion. Here 
>> is what I have seen so far:
>> 
>> - contructor/destructor order not garaunteed: From an opal perspective
>>   this should not be a problem. Most components are unloaded by
>>   opal_finalize () not opal_finalize_util (). So opal components
>>   opal should already be finalized by the time the destructor is called
>>   (or we can finalize them in the destructor if necessary).
>> 
>> - portability: All the compilers most of us care about: gcc, intel,
>>   clang. The exceptions appear to be xlc and pgi. For these compilers
>>   we can fall back on Ralph's solution and just leak if
>>   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>>   implementation that does that (needs some adjustment).
> 
> Correction. xlc does support the destructor function attribute. The 
> odd one out is PGI.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15168.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15170.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7863 - Release Date: 07/16/14



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Pritchard, Howard r
HI Folks,

I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
door to becoming dependent on availability of
ctor/dtor feature in future toolchains.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
Sent: Wednesday, July 16, 2014 9:08 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to 
opal

On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
> I discussed this over IM with Nathan to try and get a better understanding of 
> the options. Basically, we have two approaches available to us:
> 
> 1. my solution resolves the segv problem and eliminates leaks so long as the 
> user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. 
> This method will still leak memory if the user doesn't use MPI after calling 
> the MPI_T functions, but does mean that all memory used by MPI will be 
> released upon MPI_Finalize. So if the user program continues beyond MPI, they 
> won't be carrying the MPI memory footprint with them. This continues our 
> current behavior.
> 
> 2. the destructor method, which release the MPI memory footprint upon final 
> program termination instead of at MPI_Finalize. This also solves the segv and 
> leak problems, and ensures that someone calling only the MPI_T init/finalize 
> functions will be valgrind-clean, but means that a user program that runs 
> beyond MPI will carry the MPI memory footprint with them. This is a change in 
> our current behavior.

Correct. Though the only thing we will carry around until termination is the 
memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, 
opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not sure 
how much memory this is.

-Nathan


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
> I discussed this over IM with Nathan to try and get a better understanding of 
> the options. Basically, we have two approaches available to us:
> 
> 1. my solution resolves the segv problem and eliminates leaks so long as the 
> user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. 
> This method will still leak memory if the user doesn't use MPI after calling 
> the MPI_T functions, but does mean that all memory used by MPI will be 
> released upon MPI_Finalize. So if the user program continues beyond MPI, they 
> won't be carrying the MPI memory footprint with them. This continues our 
> current behavior.
> 
> 2. the destructor method, which release the MPI memory footprint upon final 
> program termination instead of at MPI_Finalize. This also solves the segv and 
> leak problems, and ensures that someone calling only the MPI_T init/finalize 
> functions will be valgrind-clean, but means that a user program that runs 
> beyond MPI will carry the MPI memory footprint with them. This is a change in 
> our current behavior.

Correct. Though the only thing we will carry around until termination is
the memory associated with opal/mca/if, opal/mca/event, opal_net,
opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and
opal_class. Not sure how much memory this is.

-Nathan


pgpnPkl7xqqrj.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Ralph Castain
I discussed this over IM with Nathan to try and get a better understanding of 
the options. Basically, we have two approaches available to us:

1. my solution resolves the segv problem and eliminates leaks so long as the 
user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. 
This method will still leak memory if the user doesn't use MPI after calling 
the MPI_T functions, but does mean that all memory used by MPI will be released 
upon MPI_Finalize. So if the user program continues beyond MPI, they won't be 
carrying the MPI memory footprint with them. This continues our current 
behavior.

2. the destructor method, which release the MPI memory footprint upon final 
program termination instead of at MPI_Finalize. This also solves the segv and 
leak problems, and ensures that someone calling only the MPI_T init/finalize 
functions will be valgrind-clean, but means that a user program that runs 
beyond MPI will carry the MPI memory footprint with them. This is a change in 
our current behavior.

I'm not sure which approach is best, but I think this captures the heart of the 
decision.


On Jul 16, 2014, at 7:36 AM, Nathan Hjelm  wrote:

> On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
>> A number of issues have been raised as part of this discussion. Here is
>> what I have seen so far:
>> 
>> - contructor/destructor order not garaunteed: From an opal perspective
>>   this should not be a problem. Most components are unloaded by
>>   opal_finalize () not opal_finalize_util (). So opal components
>>   opal should already be finalized by the time the destructor is called
>>   (or we can finalize them in the destructor if necessary).
>> 
>> - portability: All the compilers most of us care about: gcc, intel,
>>   clang. The exceptions appear to be xlc and pgi. For these compilers
>>   we can fall back on Ralph's solution and just leak if
>>   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>>   implementation that does that (needs some adjustment).
> 
> Correction. xlc does support the destructor function attribute. The odd
> one out is PGI.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15168.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
> A number of issues have been raised as part of this discussion. Here is
> what I have seen so far:
> 
>  - contructor/destructor order not garaunteed: From an opal perspective
>this should not be a problem. Most components are unloaded by
>opal_finalize () not opal_finalize_util (). So opal components
>opal should already be finalized by the time the destructor is called
>(or we can finalize them in the destructor if necessary).
> 
>  - portability: All the compilers most of us care about: gcc, intel,
>clang. The exceptions appear to be xlc and pgi. For these compilers
>we can fall back on Ralph's solution and just leak if
>MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>implementation that does that (needs some adjustment).

Correction. xlc does support the destructor function attribute. The odd
one out is PGI.

-Nathan


pgpGxJ67igVdH.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
Forgot to attach the patch.

-Nathan

diff --git a/opal/runtime/opal_finalize.c b/opal/runtime/opal_finalize.c
index 318eba7..22b2e58 100644
--- a/opal/runtime/opal_finalize.c
+++ b/opal/runtime/opal_finalize.c
@@ -1,3 +1,4 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
 /*
  * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
  * University Research and Technology
@@ -10,7 +11,7 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  * All rights reserved.
  * Copyright (c) 2008-2012 Cisco Systems, Inc.  All rights reserved.
- * Copyright (c) 2010-2013 Los Alamos National Security, LLC.
+ * Copyright (c) 2010-2014 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved
  * $COPYRIGHT$
@@ -57,6 +58,7 @@
 
 extern int opal_initialized;
 extern int opal_util_initialized;
+extern bool opal_init_util_called;
 
 int
 opal_finalize_util(void)
@@ -65,9 +67,21 @@ opal_finalize_util(void)
 if( opal_util_initialized < 0 ) {
 return OPAL_ERROR;
 }
-return OPAL_SUCCESS;
 }
 
+/* do not do anything more if the destuctor attribute is available */
+#if OPAL_HAVE_ATTRIBUTE_DESTRUCTOR
+return OPAL_SUCCESS;
+}
+
+static void __attribute__((destructor)) opal_cleanup_resources (void) {
+if (!opal_init_util_called) {
+/* nothing to clean up */
+return;
+}
+#endif
+opal_init_util_called = false;
+
 /* close interfaces code. */
 if (opal_if_base_framework.framework_refcnt > 1) {
 /* opal if may have been opened many times -- FIXME */
@@ -89,6 +103,11 @@ opal_finalize_util(void)
 
 (void) mca_base_framework_close(_installdirs_base_framework);
 
+#if OPAL_HAVE_ATTRIBUTE_DESTRUCTOR
+/* there are issues with tearing down everything in opal_finalize_util. 
doing
+ * so will cause opal_init_util to segmentation fault if called after 
finalize.
+ * this cleanup is safe in a destructor function. */
+
 /* finalize the memory allocator */
 opal_malloc_finalize();
 
@@ -108,8 +127,9 @@ opal_finalize_util(void)
 
 /* finalize the class/object system */
 opal_class_finalize();
-
+#else
 return OPAL_SUCCESS;
+#endif
 }
 
 
diff --git a/opal/runtime/opal_init.c b/opal/runtime/opal_init.c
index 6567a9f..a48517b 100644
--- a/opal/runtime/opal_init.c
+++ b/opal/runtime/opal_init.c
@@ -71,6 +71,8 @@ const char opal_version_string[] = OPAL_IDENT_STRING;
 
 int opal_initialized = 0;
 int opal_util_initialized = 0;
+bool opal_init_util_called = false;
+
 /* We have to put a guess in here in case hwloc is not available.  If
hwloc is available, this value will be overwritten when the
hwloc data is loaded. */
@@ -247,13 +249,17 @@ opal_init_util(int* pargc, char*** pargv)
 int ret;
 char *error = NULL;
 
-if( ++opal_util_initialized != 1 ) {
+++opal_util_initialized;
+
+if (opal_init_util_called) {
 if( opal_util_initialized < 1 ) {
 return OPAL_ERROR;
 }
 return OPAL_SUCCESS;
 }
 
+opal_init_util_called = true;
+
 /* initialize the memory allocator */
 opal_malloc_init();
 


pgpQAhUei3oae.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
A number of issues have been raised as part of this discussion. Here is
what I have seen so far:

 - contructor/destructor order not garaunteed: From an opal perspective
   this should not be a problem. Most components are unloaded by
   opal_finalize () not opal_finalize_util (). So opal components
   opal should already be finalized by the time the destructor is called
   (or we can finalize them in the destructor if necessary).

 - portability: All the compilers most of us care about: gcc, intel,
   clang. The exceptions appear to be xlc and pgi. For these compilers
   we can fall back on Ralph's solution and just leak if
   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
   implementation that does that (needs some adjustment).


-Nathan





pgpcsLSFVFI1U.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Gilles Gouaillardet
Ralph and all,

my understanding is that

opal_finalize_util

agressively tries to free memory that would be still allocated otherwise.

an other way of saying "make valgrind happy" is "fully automake memory
leak detection"
(Joost pointed to the -fsanitize=leak feature of gcc 4.9 in
http://www.open-mpi.org/community/lists/devel/2014/05/14672.php)

the following simple program :

#include 

int main(int argc, char* argv[])
{
  int ret, provided;
  ret = MPI_T_init_thread(MPI_THREAD_SINGLE, );
  ret = MPI_T_finalize();
  return 0;
}

leaks a *lot* of objects (and might remove some environment variables as
well) which have been half destroyed by opal_finalize_util, for example :
- classes are still marked as initialized *but* the cls_contruct_array
has been free'd
- the oob framework was not unallocated, it is still marked as
MCA_BASE_FRAMEWORK_FLAG_REGISTERED
  but some mca variables were freed, and that will cause problems when
MPI_Init try to (re)start the tcp component

now my 0.02$ :

ideally, MPI_Finalize nor MPI_T_finalize would leak any memory and the
framework would be re-initializable.
this could be a goal and George gave some good explanations on why it is
hard to achieve.
from my pragmatic point of view, and for this test case only, i am very
happy with a simple working solution,
even if it means that MPI_T_finalize leaks way too much memory in order
to work around the non re-initializable framework.

Cheers,

Gilles

On 2014/07/16 12:49, Ralph Castain wrote:
> I've attached a solution that blocks the segfault without requiring any 
> gyrations. Can someone explain why this isn't adequate?
>
> Alternate solution was to simply decrement opal_util_initialized in 
> MPI_T_finalize rather than calling finalize itself. Either way resolves the 
> problem in a very simple manner.
>



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Ralph Castain
I've attached a solution that blocks the segfault without requiring any gyrations. Can someone explain why this isn't adequate?Alternate solution was to simply decrement opal_util_initialized in MPI_T_finalize rather than calling finalize itself. Either way resolves the problem in a very simple manner.

fix.diff
Description: Binary data


mpit.c
Description: Binary data
On Jul 15, 2014, at 6:10 PM, Ralph Castain  wrote:I'm unsure where Intel's compilers sit on that list.When you say it works except for reinit, are you saying that the only issue here is that MPI_T_Finalize is calling opal_finalize_util solely because of the valgrind cleanup? And if it didn't do that, we would leak but would otherwise be just fine?Just checking my understanding. Looking at the code, that would certainly appear to be true due to the reference counter in there, which would prevent us from eventually cleaning up because the counter wouldn't reach zero. However, couldn't we resolve that by (a) having MPI_T_Init set a global flag indicating it was called, and then (b) in opal_finalize, check the flag and add another call to opal_finalize_util if the flag is set?Seems like all we really need to do is ensure that the init/finalize calls match, and that is far easier to ensure than doing the rest of this stuff.On Jul 15, 2014, at 5:48 PM, George Bosilca  wrote:Enforcing the portability of this sounds like a huge [almost impossible] mess, without a clean portable solution (more about this below). However, few things should be considered:- Except for reinit, Open MPI works without it! If we provide such a capability it will be more a convenience capability to keep valgrind happy, than a necessity
- in case the constructor/destructor functionality is available we explicitly control the ordering in which the shared libraries are opened/closed as we control the dl_open/dl_close for most of the shared libraries.
  George.PS: Other cases about shared libraries constructor/destructor.The easy ones.Mac OS X: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html
Solaris: http://docs.oracle.com/cd/E18659_01/html/821-1383/bkamq.htmlAnd the others
PGI: http://www.pgroup.com/userforum/viewtopic.php?t=697=4efce7bfb4e914e42f48f219fc7e6a7e
XLC: beg for forgiveness (there is a -binitifini function but it must be specified at link time)
On Tue, Jul 15, 2014 at 8:06 PM, Paul Hargrove  wrote:
The priority appears to have been added in gcc 4.3.You'll note it is not described in https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html

I also don't think the presence of the priority argument fixes anything...An OpenMPI code author cannot change the "priority" of a ctor or dtor in a precompiled third-party library (libpmi comes to mind).  Nor can one know what value the third part chose (in order to be higher or lower than theirs).  You cannot even be assured the third-party didn't set priority to INT_MIN or INT_MAX (or whatever).

That text also says nothing about dl_open() and dl_close() which must be considered in Open MPI.Before assuming constructor/destructor attributes are going to save the world, wash your dog, and pick up the dry cleaning, one should probably verify some minimal level of support on non-gnu tool-chains including vendor compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris).

-PaulOn Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd  wrote:

According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

"constructor 
 destructor  constructor (priority) destructor (priority)


The constructor attribute causes the function to be called
automatically before execution enters main ().  Similarly, the
destructor attribute causes the function to be called
automatically after main () completes or exit () is
called.  Functions with these attributes are useful for
initializing data that is used implicitly during the execution of
the program.

 You may provide an optional integer priority to control the order in
which constructor and destructor functions are run.  A constructor
with a smaller priority number runs before a constructor with a larger
priority number; the opposite relationship holds for destructors.  So,
if you have a constructor that allocates a resource and a destructor
that deallocates the same resource, both functions typically have the
same priority.  The priorities for constructor and destructor
functions are the same as those specified for namespace-scope C++
objects (see C++ Attributes).

 These attributes are not currently implemented for Objective-C."On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove  wrote:


On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r  wrote:



I don't think there's anything wrong with using ctor/dtors in shared libraries,




but one does need to make 

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Paul Hargrove
On Tue, Jul 15, 2014 at 5:48 PM, George Bosilca  wrote:

> - Except for reinit, Open MPI works without it! If we provide such a
> capability it will be more a convenience capability to keep valgrind happy,
> than a necessity


A valgrid suppression file seems like the most appropriate tool for that
particular goal.

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Ralph Castain
I'm unsure where Intel's compilers sit on that list.

When you say it works except for reinit, are you saying that the only issue 
here is that MPI_T_Finalize is calling opal_finalize_util solely because of the 
valgrind cleanup? And if it didn't do that, we would leak but would otherwise 
be just fine?

Just checking my understanding. Looking at the code, that would certainly 
appear to be true due to the reference counter in there, which would prevent us 
from eventually cleaning up because the counter wouldn't reach zero. However, 
couldn't we resolve that by (a) having MPI_T_Init set a global flag indicating 
it was called, and then (b) in opal_finalize, check the flag and add another 
call to opal_finalize_util if the flag is set?

Seems like all we really need to do is ensure that the init/finalize calls 
match, and that is far easier to ensure than doing the rest of this stuff.


On Jul 15, 2014, at 5:48 PM, George Bosilca  wrote:

> Enforcing the portability of this sounds like a huge [almost impossible] 
> mess, without a clean portable solution (more about this below). However, few 
> things should be considered:
> - Except for reinit, Open MPI works without it! If we provide such a 
> capability it will be more a convenience capability to keep valgrind happy, 
> than a necessity
> - in case the constructor/destructor functionality is available we explicitly 
> control the ordering in which the shared libraries are opened/closed as we 
> control the dl_open/dl_close for most of the shared libraries.
> 
>   George.
> 
> PS: Other cases about shared libraries constructor/destructor.
> 
> The easy ones.
> Mac OS X: 
> https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html
> 
> Solaris: http://docs.oracle.com/cd/E18659_01/html/821-1383/bkamq.html
> 
> And the others
> 
> PGI: 
> http://www.pgroup.com/userforum/viewtopic.php?t=697=4efce7bfb4e914e42f48f219fc7e6a7e
> 
> XLC: beg for forgiveness (there is a -binitifini function but it must be 
> specified at link time)
> 
> 
> 
> 
> 
> 
> On Tue, Jul 15, 2014 at 8:06 PM, Paul Hargrove  wrote:
> The priority appears to have been added in gcc 4.3.
> You'll note it is not described in 
> https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html
> 
> I also don't think the presence of the priority argument fixes anything...
> 
> An OpenMPI code author cannot change the "priority" of a ctor or dtor in a 
> precompiled third-party library (libpmi comes to mind).  Nor can one know 
> what value the third part chose (in order to be higher or lower than theirs). 
>  You cannot even be assured the third-party didn't set priority to INT_MIN or 
> INT_MAX (or whatever).
> 
> That text also says nothing about dl_open() and dl_close() which must be 
> considered in Open MPI.
> 
> Before assuming constructor/destructor attributes are going to save the 
> world, wash your dog, and pick up the dry cleaning, one should probably 
> verify some minimal level of support on non-gnu tool-chains including vendor 
> compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris).
> 
> -Paul
> 
> 
> On Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd  wrote:
> According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
> 
> "constructor 
>  destructor 
>  constructor (priority)
>  destructor (priority)
> The constructor attribute causes the function to be called automatically 
> before execution enters main (). Similarly, the destructor attribute causes 
> the function to be called automatically after main () completes or exit () is 
> called. Functions with these attributes are useful for initializing data that 
> is used implicitly during the execution of the program.
> You may provide an optional integer priority to control the order in which 
> constructor and destructor functions are run. A constructor with a smaller 
> priority number runs before a constructor with a larger priority number; the 
> opposite relationship holds for destructors. So, if you have a constructor 
> that allocates a resource and a destructor that deallocates the same 
> resource, both functions typically have the same priority. The priorities for 
> constructor and destructor functions are the same as those specified for 
> namespace-scope C++ objects (see C++ Attributes).
> 
> These attributes are not currently implemented for Objective-C."
> 
> 
> 
> 
> On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove  wrote:
> 
> On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r  
> wrote:
> I don't think there's anything wrong with using ctor/dtors in shared 
> libraries,
> but one does need to make sure that in these functions there's no assumptions
> about ordering of them wrt to other ctors/dtors.
> 
> The ELF specification is clear that the order of execution of DT_INIT and 
> DT_FINI entries is 

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
Enforcing the portability of this sounds like a huge [almost impossible]
mess, without a clean portable solution (more about this below). However,
few things should be considered:
- Except for reinit, Open MPI works without it! If we provide such a
capability it will be more a convenience capability to keep valgrind happy,
than a necessity
- in case the constructor/destructor functionality is available we
explicitly control the ordering in which the shared libraries are
opened/closed as we control the dl_open/dl_close for most of the shared
libraries.

  George.

PS: Other cases about shared libraries constructor/destructor.

The easy ones.
Mac OS X:
https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html

Solaris: http://docs.oracle.com/cd/E18659_01/html/821-1383/bkamq.html

And the others

PGI:
http://www.pgroup.com/userforum/viewtopic.php?t=697=4efce7bfb4e914e42f48f219fc7e6a7e

XLC: beg for forgiveness (there is a -binitifini function but it must be
specified at link time)






On Tue, Jul 15, 2014 at 8:06 PM, Paul Hargrove  wrote:

> The priority appears to have been added in gcc 4.3.
> You'll note it is not described in
> https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html
>
> I also don't think the presence of the priority argument fixes anything...
>
> An OpenMPI code author cannot change the "priority" of a ctor or dtor in a
> precompiled third-party library (libpmi comes to mind).  Nor can one know
> what value the third part chose (in order to be higher or lower than
> theirs).  You cannot even be assured the third-party didn't set priority to
> INT_MIN or INT_MAX (or whatever).
>
> That text also says nothing about dl_open() and dl_close() which must be
> considered in Open MPI.
>
> Before assuming constructor/destructor attributes are going to save the
> world, wash your dog, and pick up the dry cleaning, one should probably
> verify some minimal level of support on non-gnu tool-chains including
> vendor compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris).
>
> -Paul
>
>
> On Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd  wrote:
>
>> According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
>>
>> *"constructor *
>> *  destructor *
>> * constructor (*priority*)** destructor (priority)* *The constructor
>> attribute causes the function to be called automatically before execution
>> enters main (). Similarly, the destructor attribute causes the function to
>> be called automatically after main () completes or exit () is called.
>> Functions with these attributes are useful for initializing data that is
>> used implicitly during the execution of the program. *
>>
>> *You may provide an optional integer priority to control the order in
>> which constructor and destructor functions are run. A constructor with a
>> smaller priority number runs before a constructor with a larger priority
>> number; the opposite relationship holds for destructors. So, if you have a
>> constructor that allocates a resource and a destructor that deallocates the
>> same resource, both functions typically have the same priority. The
>> priorities for constructor and destructor functions are the same as those
>> specified for namespace-scope C++ objects (see C++ Attributes
>> ).
>> *
>>
>> *These attributes are not currently implemented for Objective-C."*
>>
>>
>>
>> On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove 
>> wrote:
>>
>>>
>>> On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r 
>>> wrote:
>>>
 I don't think there's anything wrong with using ctor/dtors in shared
 libraries,
 but one does need to make sure that in these functions there's no
 assumptions
 about ordering of them wrt to other ctors/dtors.

>>>
>>> The ELF specification is clear that the order of execution of DT_INIT
>>> and DT_FINI entries is undefined.
>>> The .ctors and .dtors sections typically used by the GNU toolchain are,
>>> I believe, not part of any formal linker specification.
>>> So, I agree w/ Howard that one must take care not to assume anything
>>> about order.
>>>
>>> -Paul
>>>
>>>
>>> --
>>> Paul H. Hargrove  phhargr...@lbl.gov
>>> Future Technologies Group
>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15153.php
>>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: 

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Ralph Castain
I wonder if we aren't using a howitzer to swat a gnat. It seems to me that this 
is loaded with potential problems, as Paul describes, and I shudder to think of 
how hard this is going to be when we consider all the compiler/environment 
combinations we support and the range of libraries our various pieces build 
against.

Wouldn't it be easier for us to spend a little time on each framework and set 
it up to better handle init/fini/init cycles? It seems to me that this is going 
to be far more involved than just cleaning up class object instantiations, and 
indeed in some cases is going to take careful teardown of 3rd party libraries 
we link against.

I know that will be more work than creating some simple "destructor", but I 
suspect it has far more likelihood for success and a much lower degree of risk.


On Jul 15, 2014, at 5:06 PM, Paul Hargrove  wrote:

> The priority appears to have been added in gcc 4.3.
> You'll note it is not described in 
> https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html
> 
> I also don't think the presence of the priority argument fixes anything...
> 
> An OpenMPI code author cannot change the "priority" of a ctor or dtor in a 
> precompiled third-party library (libpmi comes to mind).  Nor can one know 
> what value the third part chose (in order to be higher or lower than theirs). 
>  You cannot even be assured the third-party didn't set priority to INT_MIN or 
> INT_MAX (or whatever).
> 
> That text also says nothing about dl_open() and dl_close() which must be 
> considered in Open MPI.
> 
> Before assuming constructor/destructor attributes are going to save the 
> world, wash your dog, and pick up the dry cleaning, one should probably 
> verify some minimal level of support on non-gnu tool-chains including vendor 
> compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris).
> 
> -Paul
> 
> 
> On Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd  wrote:
> According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
> 
> "constructor 
>  destructor 
>  constructor (priority)
>  destructor (priority)
> The constructor attribute causes the function to be called automatically 
> before execution enters main (). Similarly, the destructor attribute causes 
> the function to be called automatically after main () completes or exit () is 
> called. Functions with these attributes are useful for initializing data that 
> is used implicitly during the execution of the program.
> You may provide an optional integer priority to control the order in which 
> constructor and destructor functions are run. A constructor with a smaller 
> priority number runs before a constructor with a larger priority number; the 
> opposite relationship holds for destructors. So, if you have a constructor 
> that allocates a resource and a destructor that deallocates the same 
> resource, both functions typically have the same priority. The priorities for 
> constructor and destructor functions are the same as those specified for 
> namespace-scope C++ objects (see C++ Attributes).
> 
> These attributes are not currently implemented for Objective-C."
> 
> 
> 
> 
> On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove  wrote:
> 
> On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r  
> wrote:
> I don't think there's anything wrong with using ctor/dtors in shared 
> libraries,
> but one does need to make sure that in these functions there's no assumptions
> about ordering of them wrt to other ctors/dtors.
> 
> The ELF specification is clear that the order of execution of DT_INIT and 
> DT_FINI entries is undefined.
> The .ctors and .dtors sections typically used by the GNU toolchain are, I 
> believe, not part of any formal linker specification.
> So, I agree w/ Howard that one must take care not to assume anything about 
> order.
> 
> -Paul
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15153.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15155.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: 

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Paul Hargrove
The priority appears to have been added in gcc 4.3.
You'll note it is not described in
https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html

I also don't think the presence of the priority argument fixes anything...

An OpenMPI code author cannot change the "priority" of a ctor or dtor in a
precompiled third-party library (libpmi comes to mind).  Nor can one know
what value the third part chose (in order to be higher or lower than
theirs).  You cannot even be assured the third-party didn't set priority to
INT_MIN or INT_MAX (or whatever).

That text also says nothing about dl_open() and dl_close() which must be
considered in Open MPI.

Before assuming constructor/destructor attributes are going to save the
world, wash your dog, and pick up the dry cleaning, one should probably
verify some minimal level of support on non-gnu tool-chains including
vendor compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris).

-Paul


On Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd  wrote:

> According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
>
> *"constructor *
> *  destructor *
> * constructor (*priority*)** destructor (priority)* *The constructor
> attribute causes the function to be called automatically before execution
> enters main (). Similarly, the destructor attribute causes the function to
> be called automatically after main () completes or exit () is called.
> Functions with these attributes are useful for initializing data that is
> used implicitly during the execution of the program. *
>
> *You may provide an optional integer priority to control the order in
> which constructor and destructor functions are run. A constructor with a
> smaller priority number runs before a constructor with a larger priority
> number; the opposite relationship holds for destructors. So, if you have a
> constructor that allocates a resource and a destructor that deallocates the
> same resource, both functions typically have the same priority. The
> priorities for constructor and destructor functions are the same as those
> specified for namespace-scope C++ objects (see C++ Attributes
> ).
> *
>
> *These attributes are not currently implemented for Objective-C."*
>
>
>
> On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove  wrote:
>
>>
>> On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r 
>> wrote:
>>
>>> I don't think there's anything wrong with using ctor/dtors in shared
>>> libraries,
>>> but one does need to make sure that in these functions there's no
>>> assumptions
>>> about ordering of them wrt to other ctors/dtors.
>>>
>>
>> The ELF specification is clear that the order of execution of DT_INIT and
>> DT_FINI entries is undefined.
>> The .ctors and .dtors sections typically used by the GNU toolchain are, I
>> believe, not part of any formal linker specification.
>> So, I agree w/ Howard that one must take care not to assume anything
>> about order.
>>
>> -Paul
>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15153.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15155.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Joshua Ladd
According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

*"constructor *
* destructor *
* constructor (*priority*)** destructor (priority)**The constructor
attribute causes the function to be called automatically before execution
enters main (). Similarly, the destructor attribute causes the function to
be called automatically after main () completes or exit () is called.
Functions with these attributes are useful for initializing data that is
used implicitly during the execution of the program. *

*You may provide an optional integer priority to control the order in which
constructor and destructor functions are run. A constructor with a smaller
priority number runs before a constructor with a larger priority number;
the opposite relationship holds for destructors. So, if you have a
constructor that allocates a resource and a destructor that deallocates the
same resource, both functions typically have the same priority. The
priorities for constructor and destructor functions are the same as those
specified for namespace-scope C++ objects (see C++ Attributes
).
*

*These attributes are not currently implemented for Objective-C."*



On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove  wrote:

>
> On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r 
> wrote:
>
>> I don't think there's anything wrong with using ctor/dtors in shared
>> libraries,
>> but one does need to make sure that in these functions there's no
>> assumptions
>> about ordering of them wrt to other ctors/dtors.
>>
>
> The ELF specification is clear that the order of execution of DT_INIT and
> DT_FINI entries is undefined.
> The .ctors and .dtors sections typically used by the GNU toolchain are, I
> believe, not part of any formal linker specification.
> So, I agree w/ Howard that one must take care not to assume anything about
> order.
>
> -Paul
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15153.php
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Paul Hargrove
On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r 
wrote:

> I don't think there's anything wrong with using ctor/dtors in shared
> libraries,
> but one does need to make sure that in these functions there's no
> assumptions
> about ordering of them wrt to other ctors/dtors.
>

The ELF specification is clear that the order of execution of DT_INIT and
DT_FINI entries is undefined.
The .ctors and .dtors sections typically used by the GNU toolchain are, I
believe, not part of any formal linker specification.
So, I agree w/ Howard that one must take care not to assume anything about
order.

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Pritchard, Howard r
Hi Folks,

Is the opal library explicitly closed by a dlclose?  

I don't think there's anything wrong with using ctor/dtors in shared libraries,
but one does need to make sure that in these functions there's no assumptions
about ordering of them wrt to other ctors/dtors.shared libraries explicitly
loaded/unloaded by the executable should have less of an issue with respect
to these ordering issues. 

Also, for static linking, care needs to be taken.  It may be necessary to use
whole-archive etc. on the ld line to get the ctor/dtors actually loaded in the
executable.  

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
Sent: Tuesday, July 15, 2014 12:45 PM
To: Open MPI Developers; Hjelm, Nathan Thomas
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to 
opal

I withdraw my comment on this, it turns out I “misspoke” (or in other words I 
was wrong about the class cleanup). The base class structures are stored as 
objects in the corresponding shared library memory region, and these regions 
become unavailable once a shared library is unloaded. As a result we are 
utterly unable to cleanup the classes at the OPAL layer after the other shared 
libraries have been unloaded.

Moreover, Nathan was right in his proposal, the only possible cleanup approach 
is to use the destructor attribute of the OPAL library to cleanup the mess once 
all libraries are unloaded.

  George.



On July 15, 2014 at 1:17:26 AM, George Bosilca (bosi...@icl.utk.edu) wrote:
> Nathan,
>  
> Fixing the classes to correctly tear down everything was a two lines 
> patch. However, this doesn’t fix the bigger issue, which is related to 
> the fact that not all frameworks are correctly teared down, and when 
> they are they leave behind char* parameters not set to NULL, and that 
> the framework infrastructure is not keen of being reinitialized due to too 
> many globals not correctly handled.
>  
> If I correctly understand the meaning of the proposed destructor 
> approach, it is only called when the library is being unloaded or when 
> the application exit. Thus, adding the destructor is a bandaid, 
> addressing a marginal annoyance (partially keeping valgrind
> happy) without addressing the real issue (being able to call MPI_Init after 
> MPI_T_finalize).  
>  
> George.
>  
>  
>  
> On July 14, 2014 at 6:07:08 PM, Nathan Hjelm (hje...@lanl.gov) wrote:
> >
> > What: Add a library destructor function to OPAL. The new function 
> > would take care of cleaning up some of OPAL's state (closing 
> > frameworks, shutting down MCA, etc).
> >
> > Why: OPAL can not currently be re-initialized. There are numerous 
> > problems throughout the project that will make it difficult (but not
> > impossible) to get opal in a state where we can allow 
> > re-initialization. Additionally, there are probably arguments 
> > against making opal re-initable.
> >
> > opal not being re-initializable would not normally be a problem 
> > except that the following code sequence always crashes:
> >
> > MPI_T_Init_thread (); <-- Calls opal_init_util() MPI_T_Finalize (); 
> > <-- Calls opal_finalize_util()
> >
> > MPI_Init (); <-- SEGV
> >
> > This happens because MPI_T_Finalize() calls opal_finalize_util() to 
> > ensure maximum valgrind cleanness. This call causes OPAL to tear 
> > down OPAL classes (among other things) leading to the SEGV on the 
> > next call to opal_init()/opal_init_util(). There is an open ticket on this 
> > issue:
> >
> > https://svn.open-mpi.org/trac/ompi/ticket/4490
> >
> > To fix this problem I want to add a destructor function to OPAL. 
> > This function would take on some of the current functionality of 
> > opal_finalize_util(). This would solve the above issue without 
> > having to update OPAL to allow re-initialization.
> >
> > For those not familiar with destructor functions. They are always 
> > called at the end of execution or when the library is closed 
> > (dl_close). Multiple destructors functions can be defined. Marking a 
> > function as a destructor is simple:
> >
> > void __attribute__((destructor)) foo (void);
> >
> >
> > When: Setting a timeout for next Friday (July 25).
> >
> >
> > -Nathan
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15140.php
>  
>  

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15150.php


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
I withdraw my comment on this, it turns out I “misspoke” (or in other words I 
was wrong about the class cleanup). The base class structures are stored as 
objects in the corresponding shared library memory region, and these regions 
become unavailable once a shared library is unloaded. As a result we are 
utterly unable to cleanup the classes at the OPAL layer after the other shared 
libraries have been unloaded.

Moreover, Nathan was right in his proposal, the only possible cleanup approach 
is to use the destructor attribute of the OPAL library to cleanup the mess once 
all libraries are unloaded.

  George.



On July 15, 2014 at 1:17:26 AM, George Bosilca (bosi...@icl.utk.edu) wrote:
> Nathan,
>  
> Fixing the classes to correctly tear down everything was a two lines patch. 
> However,  
> this doesn’t fix the bigger issue, which is related to the fact that not all 
> frameworks  
> are correctly teared down, and when they are they leave behind char* 
> parameters not set  
> to NULL, and that the framework infrastructure is not keen of being 
> reinitialized due  
> to too many globals not correctly handled.
>  
> If I correctly understand the meaning of the proposed destructor approach, it 
> is only  
> called when the library is being unloaded or when the application exit. Thus, 
> adding  
> the destructor is a bandaid, addressing a marginal annoyance (partially 
> keeping valgrind  
> happy) without addressing the real issue (being able to call MPI_Init after 
> MPI_T_finalize).  
>  
> George.
>  
>  
>  
> On July 14, 2014 at 6:07:08 PM, Nathan Hjelm (hje...@lanl.gov) wrote:
> >
> > What: Add a library destructor function to OPAL. The new function would
> > take care of cleaning up some of OPAL's state (closing frameworks,
> > shutting down MCA, etc).
> >
> > Why: OPAL can not currently be re-initialized. There are numerous
> > problems throughout the project that will make it difficult (but not
> > impossible) to get opal in a state where we can allow
> > re-initialization. Additionally, there are probably arguments against
> > making opal re-initable.
> >
> > opal not being re-initializable would not normally be a problem except
> > that the following code sequence always crashes:
> >
> > MPI_T_Init_thread (); <-- Calls opal_init_util()
> > MPI_T_Finalize (); <-- Calls opal_finalize_util()
> >
> > MPI_Init (); <-- SEGV
> >
> > This happens because MPI_T_Finalize() calls opal_finalize_util() to
> > ensure maximum valgrind cleanness. This call causes OPAL to tear down
> > OPAL classes (among other things) leading to the SEGV on the next call
> > to opal_init()/opal_init_util(). There is an open ticket on this issue:
> >
> > https://svn.open-mpi.org/trac/ompi/ticket/4490
> >
> > To fix this problem I want to add a destructor function to OPAL. This
> > function would take on some of the current functionality of
> > opal_finalize_util(). This would solve the above issue without having to
> > update OPAL to allow re-initialization.
> >
> > For those not familiar with destructor functions. They are always called
> > at the end of execution or when the library is closed
> > (dl_close). Multiple destructors functions can be defined. Marking a
> > function as a destructor is simple:
> >
> > void __attribute__((destructor)) foo (void);
> >
> >
> > When: Setting a timeout for next Friday (July 25).
> >
> >
> > -Nathan
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15140.php  
>  
>  



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
Nathan,

Fixing the classes to correctly tear down everything was a two lines patch. 
However, this doesn’t fix the bigger issue, which is related to the fact that 
not all frameworks are correctly teared down, and when they are they leave 
behind char* parameters not set to NULL, and that the framework infrastructure 
is not keen of being reinitialized due to too many globals not correctly 
handled.

If I correctly understand the meaning of the proposed destructor approach, it 
is only called when the library is being unloaded or when the application exit. 
Thus, adding the destructor is a bandaid, addressing a marginal annoyance 
(partially keeping valgrind happy) without addressing the real issue (being 
able to call MPI_Init after MPI_T_finalize).

  George.



On July 14, 2014 at 6:07:08 PM, Nathan Hjelm (hje...@lanl.gov) wrote:
>  
> What: Add a library destructor function to OPAL. The new function would
> take care of cleaning up some of OPAL's state (closing frameworks,
> shutting down MCA, etc).
>  
> Why: OPAL can not currently be re-initialized. There are numerous
> problems throughout the project that will make it difficult (but not
> impossible) to get opal in a state where we can allow
> re-initialization. Additionally, there are probably arguments against
> making opal re-initable.
>  
> opal not being re-initializable would not normally be a problem except
> that the following code sequence always crashes:
>  
> MPI_T_Init_thread (); <-- Calls opal_init_util()
> MPI_T_Finalize (); <-- Calls opal_finalize_util()
>  
> MPI_Init (); <-- SEGV
>  
> This happens because MPI_T_Finalize() calls opal_finalize_util() to
> ensure maximum valgrind cleanness. This call causes OPAL to tear down
> OPAL classes (among other things) leading to the SEGV on the next call
> to opal_init()/opal_init_util(). There is an open ticket on this issue:
>  
> https://svn.open-mpi.org/trac/ompi/ticket/4490
>  
> To fix this problem I want to add a destructor function to OPAL. This
> function would take on some of the current functionality of
> opal_finalize_util(). This would solve the above issue without having to
> update OPAL to allow re-initialization.
>  
> For those not familiar with destructor functions. They are always called
> at the end of execution or when the library is closed
> (dl_close). Multiple destructors functions can be defined. Marking a
> function as a destructor is simple:
>  
> void __attribute__((destructor)) foo (void);
>  
>  
> When: Setting a timeout for next Friday (July 25).
>  
>  
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15140.php