Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-10-01 Thread Gilles Gouaillardet
Thanks Ralph !

it did fix the problem

Cheers,

Gilles

On 2014/10/01 3:04, Ralph Castain wrote:
> I fixed this in r32818 - the components shouldn't be passing back success if 
> the requested info isn't found. Hope that fixes the problem.
>
>
> On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet 
>  wrote:
>
>> Folks,
>>
>> the dynamic/spawn test from the ibm test suite crashes if the openib btl
>> is detected
>> (the test can be ran on one node with an IB port)
>>
>> here is what happens :
>>
>> in mca_btl_openib_proc_create,
>> the macro
>>OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,
>>proc, , _size);
>> does not find any information *but*
>> rc is OPAL_SUCCESS
>> msg_size is not updated (e.g. left uninitialized)
>> message is not updated (e.g. left uninitialized)
>>
>> then, if msg_size is unitialized with a non zero value, and if message
>> is uninitialized with
>> a non valid address, a crash will occur when accessing message.
>>
>> /* i am not debating here the fact that there is no information returned,
>> i am simply discussing the crash */
>>
>> a simple workaround is to initialize msg_size to zero.
>>
>> that being said, is this the correct fix ?
>>
>> one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
>> like this :
>>
>> /* from opal/mca/pmix/pmix.h */
>> #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)  \
>>do {\
>>opal_value_t *kv;   \
>>if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,   \
>> (s), ))) {  \
>>if (NULL != kv)
>> {   \
>>*(d) =
>> kv->data.bo.bytes;   \
>>*(sz) =
>> kv->data.bo.size;   \
>>kv->data.bo.bytes = NULL; /* protect the data
>> */\
>>
>> OBJ_RELEASE(kv);\
>>} else {\
>>*(sz) = 0;\
>>(r) = OPAL_ERR_NOT_FOUND;
>>} \
>>}   \
>>} while(0);
>>
>> /*
>> *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
>> *or* (r) could be set
>> */
>>
>> and an other alternate fix is to update the end of the native_get
>> function like this :
>>
>> /* from opal/mca/pmix/native/pmix_native.c */
>>
>>if (found) {
>>return OPAL_SUCCESS;
>>}
>>*kv = NULL;
>>if (OPAL_SUCCESS == rc) {
>>if (OPAL_SUCCESS == ret) {
>>rc = OPAL_ERR_NOT_FOUND;
>>} else {
>>rc = ret;
>>}
>>}
>>return rc;
>>
>> Could you please advise ?
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15942.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15950.php



Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Ralph Castain
I fixed this in r32818 - the components shouldn't be passing back success if 
the requested info isn't found. Hope that fixes the problem.


On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet 
 wrote:

> Folks,
> 
> the dynamic/spawn test from the ibm test suite crashes if the openib btl
> is detected
> (the test can be ran on one node with an IB port)
> 
> here is what happens :
> 
> in mca_btl_openib_proc_create,
> the macro
>OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,
>proc, , _size);
> does not find any information *but*
> rc is OPAL_SUCCESS
> msg_size is not updated (e.g. left uninitialized)
> message is not updated (e.g. left uninitialized)
> 
> then, if msg_size is unitialized with a non zero value, and if message
> is uninitialized with
> a non valid address, a crash will occur when accessing message.
> 
> /* i am not debating here the fact that there is no information returned,
> i am simply discussing the crash */
> 
> a simple workaround is to initialize msg_size to zero.
> 
> that being said, is this the correct fix ?
> 
> one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
> like this :
> 
> /* from opal/mca/pmix/pmix.h */
> #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)  \
>do {\
>opal_value_t *kv;   \
>if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,   \
> (s), ))) {  \
>if (NULL != kv)
> {   \
>*(d) =
> kv->data.bo.bytes;   \
>*(sz) =
> kv->data.bo.size;   \
>kv->data.bo.bytes = NULL; /* protect the data
> */\
> 
> OBJ_RELEASE(kv);\
>} else {\
>*(sz) = 0;\
>(r) = OPAL_ERR_NOT_FOUND;
>} \
>}   \
>} while(0);
> 
> /*
> *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
> *or* (r) could be set
> */
> 
> and an other alternate fix is to update the end of the native_get
> function like this :
> 
> /* from opal/mca/pmix/native/pmix_native.c */
> 
>if (found) {
>return OPAL_SUCCESS;
>}
>*kv = NULL;
>if (OPAL_SUCCESS == rc) {
>if (OPAL_SUCCESS == ret) {
>rc = OPAL_ERR_NOT_FOUND;
>} else {
>rc = ret;
>}
>}
>return rc;
> 
> Could you please advise ?
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15942.php



[OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Gilles Gouaillardet
Folks,

the dynamic/spawn test from the ibm test suite crashes if the openib btl
is detected
(the test can be ran on one node with an IB port)

here is what happens :

in mca_btl_openib_proc_create,
the macro
OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,
proc, , _size);
does not find any information *but*
rc is OPAL_SUCCESS
msg_size is not updated (e.g. left uninitialized)
message is not updated (e.g. left uninitialized)

then, if msg_size is unitialized with a non zero value, and if message
is uninitialized with
a non valid address, a crash will occur when accessing message.

/* i am not debating here the fact that there is no information returned,
i am simply discussing the crash */

a simple workaround is to initialize msg_size to zero.

that being said, is this the correct fix ?

one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
like this :

/* from opal/mca/pmix/pmix.h */
#define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)  \
do {\
opal_value_t *kv;   \
if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,   \
 (s), ))) {  \
if (NULL != kv)
{   \
*(d) =
kv->data.bo.bytes;   \
*(sz) =
kv->data.bo.size;   \
kv->data.bo.bytes = NULL; /* protect the data
*/\

OBJ_RELEASE(kv);\
} else {\
*(sz) = 0;\
(r) = OPAL_ERR_NOT_FOUND;
} \
}   \
} while(0);

/*
*(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
*or* (r) could be set
*/

and an other alternate fix is to update the end of the native_get
function like this :

/* from opal/mca/pmix/native/pmix_native.c */

if (found) {
return OPAL_SUCCESS;
}
*kv = NULL;
if (OPAL_SUCCESS == rc) {
if (OPAL_SUCCESS == ret) {
rc = OPAL_ERR_NOT_FOUND;
} else {
rc = ret;
}
}
return rc;

Could you please advise ?

Cheers,

Gilles