Re: [OMPI devel] C/R code: opal_list_item_destruct: Assertion

2013-12-22 Thread Adrian Reber
That works. Thanks for your fix.

On Sun, Dec 22, 2013 at 12:23:44AM +0100, George Bosilca wrote:
> Adrian,
> 
> Yes, your patch is correct. However, I noticed that each framework clean it’s 
> modules differently, so I tried to enforce some level of consistency. Please 
> try r30045 and let me know if it fixes your issue.
> 
> George.
> 
> 
> On Dec 21, 2013, at 22:05 , Adrian Reber  wrote:
> 
> > Trying to run Open MPI with C/R enabled I get the following error
> > with --enable-debug:
> > 
> > [dcbz:20360] orte_rml_base_select: initializing rml component oob
> > [dcbz:20360] orte_rml_base_select: initializing rml component ftrm
> > [dcbz:20360] orte_rml_base_select: module ftrm unloaded
> > orterun: ../../opal/class/opal_list.c:69: opal_list_item_destruct: 
> > Assertion `0 == item->opal_list_item_refcount' failed.
> > [dcbz:20360] *** Process received signal ***
> > [dcbz:20360] Signal: Aborted (6)
> > [dcbz:20360] Signal code:  (-6)
> > 
> > I fixed it like this:
> > 
> > diff --git a/orte/mca/rml/base/rml_base_frame.c 
> > b/orte/mca/rml/base/rml_base_frame.c
> > index 8759180..968884f 100644
> > --- a/orte/mca/rml/base/rml_base_frame.c
> > +++ b/orte/mca/rml/base/rml_base_frame.c
> > @@ -181,6 +181,7 @@ int orte_rml_base_select(void)
> > component->rml_version.mca_component_name);
> > 
> > mca_base_component_repository_release((mca_base_component_t *) 
> > component);
> > +
> > opal_list_remove_item(&orte_rml_base_framework.framework_components, item);
> > OBJ_RELEASE(item);
> > }
> > item = next;
> > 
> > 
> > Is this the correct way to solve an error like this? And the
> > correct place.
> > 
> > Adrian
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] C/R code: opal_list_item_destruct: Assertion

2013-12-21 Thread George Bosilca
Adrian,

Yes, your patch is correct. However, I noticed that each framework clean it’s 
modules differently, so I tried to enforce some level of consistency. Please 
try r30045 and let me know if it fixes your issue.

George.


On Dec 21, 2013, at 22:05 , Adrian Reber  wrote:

> Trying to run Open MPI with C/R enabled I get the following error
> with --enable-debug:
> 
> [dcbz:20360] orte_rml_base_select: initializing rml component oob
> [dcbz:20360] orte_rml_base_select: initializing rml component ftrm
> [dcbz:20360] orte_rml_base_select: module ftrm unloaded
> orterun: ../../opal/class/opal_list.c:69: opal_list_item_destruct: Assertion 
> `0 == item->opal_list_item_refcount' failed.
> [dcbz:20360] *** Process received signal ***
> [dcbz:20360] Signal: Aborted (6)
> [dcbz:20360] Signal code:  (-6)
> 
> I fixed it like this:
> 
> diff --git a/orte/mca/rml/base/rml_base_frame.c 
> b/orte/mca/rml/base/rml_base_frame.c
> index 8759180..968884f 100644
> --- a/orte/mca/rml/base/rml_base_frame.c
> +++ b/orte/mca/rml/base/rml_base_frame.c
> @@ -181,6 +181,7 @@ int orte_rml_base_select(void)
> component->rml_version.mca_component_name);
> 
> mca_base_component_repository_release((mca_base_component_t *) 
> component);
> +
> opal_list_remove_item(&orte_rml_base_framework.framework_components, item);
> OBJ_RELEASE(item);
> }
> item = next;
> 
> 
> Is this the correct way to solve an error like this? And the
> correct place.
> 
>   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] C/R code: opal_list_item_destruct: Assertion

2013-12-21 Thread Ralph Castain
should be okay

On Dec 21, 2013, at 1:05 PM, Adrian Reber  wrote:

> Trying to run Open MPI with C/R enabled I get the following error
> with --enable-debug:
> 
> [dcbz:20360] orte_rml_base_select: initializing rml component oob
> [dcbz:20360] orte_rml_base_select: initializing rml component ftrm
> [dcbz:20360] orte_rml_base_select: module ftrm unloaded
> orterun: ../../opal/class/opal_list.c:69: opal_list_item_destruct: Assertion 
> `0 == item->opal_list_item_refcount' failed.
> [dcbz:20360] *** Process received signal ***
> [dcbz:20360] Signal: Aborted (6)
> [dcbz:20360] Signal code:  (-6)
> 
> I fixed it like this:
> 
> diff --git a/orte/mca/rml/base/rml_base_frame.c 
> b/orte/mca/rml/base/rml_base_frame.c
> index 8759180..968884f 100644
> --- a/orte/mca/rml/base/rml_base_frame.c
> +++ b/orte/mca/rml/base/rml_base_frame.c
> @@ -181,6 +181,7 @@ int orte_rml_base_select(void)
> component->rml_version.mca_component_name);
> 
> mca_base_component_repository_release((mca_base_component_t *) 
> component);
> +
> opal_list_remove_item(&orte_rml_base_framework.framework_components, item);
> OBJ_RELEASE(item);
> }
> item = next;
> 
> 
> Is this the correct way to solve an error like this? And the
> correct place.
> 
>   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] C/R code: opal_list_item_destruct: Assertion

2013-12-21 Thread Adrian Reber
Trying to run Open MPI with C/R enabled I get the following error
with --enable-debug:

[dcbz:20360] orte_rml_base_select: initializing rml component oob
[dcbz:20360] orte_rml_base_select: initializing rml component ftrm
[dcbz:20360] orte_rml_base_select: module ftrm unloaded
orterun: ../../opal/class/opal_list.c:69: opal_list_item_destruct: Assertion `0 
== item->opal_list_item_refcount' failed.
[dcbz:20360] *** Process received signal ***
[dcbz:20360] Signal: Aborted (6)
[dcbz:20360] Signal code:  (-6)

I fixed it like this:

diff --git a/orte/mca/rml/base/rml_base_frame.c 
b/orte/mca/rml/base/rml_base_frame.c
index 8759180..968884f 100644
--- a/orte/mca/rml/base/rml_base_frame.c
+++ b/orte/mca/rml/base/rml_base_frame.c
@@ -181,6 +181,7 @@ int orte_rml_base_select(void)
 component->rml_version.mca_component_name);

 mca_base_component_repository_release((mca_base_component_t *) 
component);
+
opal_list_remove_item(&orte_rml_base_framework.framework_components, item);
 OBJ_RELEASE(item);
 }
 item = next;


Is this the correct way to solve an error like this? And the
correct place.

Adrian