Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-07 Thread Gilles Gouaillardet
gt;>>> OPAL_PROCESS_NAME_xTOy
>>>>>>>   on little endian arch if heterogeneous mode is supported.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> does that make sense ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On 2014/07/31 1:29, George Bosilca wrote:
>>>>>>>
>>>>>>> The underlying structure changed, so a little bit of fiddling is normal.
>>>>>>> Instead of using a field in the ompi_proc_t you are now using a field 
>>>>>>> down
>>>>>>> in opal_proc_t, a field that simply cannot have the same type as before
>>>>>>> (orte_process_name_t).
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> George - my point was that we regularly tested using the method in that
>>>>>>> routine, and now we have to do something a little different. So it is an
>>>>>>> "issue" in that we have to make changes across the code base to ensure 
>>>>>>> we
>>>>>>> do things the "new" way, that's all
>>>>>>>
>>>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>>>
>>>>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>>>>> correctly (aka only via the exposed accessors).
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>>>>
>>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>>>
>>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>>>>> bits storage location used by the upper layer to save some local key 
>>>>>>> that
>>>>>>> can be later used to extract information. Calling the OPAL level compare
>>>>>>> function might be a better fit there.
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>>>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> was it really that simple ?
>>>>>>>
>>>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>>>> typedef uint64_t opal_identifier_t;
>>>>>>>
>>>>>>> *but*
>>>>>>>
>>>>>>> item_ptr->peer has type orte_process_name_t :
>>>>>>> struct orte_process_name_t {
>>>>>>>orte_jobid_t jobid;
>>>>>>>orte_vpid_t vpid;
>>>>>>> };
>>>>>>>
>>>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> <

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread Ralph Castain
 "issue" in that we have to make changes across the code base to ensure we
>>>>>> do things the "new" way, that's all
>>>>>> 
>>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>> 
>>>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>>>> correctly (aka only via the exposed accessors).
>>>>>> 
>>>>>>   George.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>> <r...@open-mpi.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>>> 
>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>> 
>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>>>> bits storage location used by the upper layer to save some local key that
>>>>>> can be later used to extract information. Calling the OPAL level compare
>>>>>> function might be a better fit there.
>>>>>> 
>>>>>>   George.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> Ralph,
>>>>>> 
>>>>>> was it really that simple ?
>>>>>> 
>>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>>> typedef uint64_t opal_identifier_t;
>>>>>> 
>>>>>> *but*
>>>>>> 
>>>>>> item_ptr->peer has type orte_process_name_t :
>>>>>> struct orte_process_name_t {
>>>>>>orte_jobid_t jobid;
>>>>>>orte_vpid_t vpid;
>>>>>> };
>>>>>> 
>>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Gilles
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>> <r...@open-mpi.org>
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> I just fixed this one - all that was required was an ampersand as the
>>>>>> name was being passed into the function instead of a pointer to the name
>>>>>> 
>>>>>> r32357
>>>>>> 
>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>>>>>> <gilles.gouaillar...@gmail.com>
>>>>>>  wrote:
>>>>>> 
>>>>>> Rolf,
>>>>>> 
>>>>>> r32353 can be seen as a suspect...
>>>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>> 
>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>> 
>>>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>>>> it and drop a note to #4815
>>>>>> ( I am afk until tomorrow)
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Gilles
>>>>>> 
>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote:
>>>>>> 
>>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>>>>>> ibm/pt2pt/send test running on a si

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread George Bosilca
select.c:372
>
> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
> requested=0, provided=0x7fffe79922e8) at
> ../../ompi/runtime/ompi_mpi_init.c:918
>
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
> argv=0x7fffe7992340) at pinit.c:84
>
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
> send.c:32
>
> (gdb) up
>
> #1  
>
> (gdb) up
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> 522   if (name1->jobid < name2->jobid) {
>
> (gdb) print name1
>
> $1 = (const orte_process_name_t *) 0x192350001
>
> (gdb) print *name1
>
> Cannot access memory at address 0x192350001
>
> (gdb) print name2
>
> $2 = (const orte_process_name_t *) 0xbaf76c
>
> (gdb) print *name2
>
> $3 = {jobid = 2452946945, vpid = 1}
>
> (gdb)
>
>
>
>
>
>
>
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> 
> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>
>
> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org> 
> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of Gilles
>
>
> Gouaillardet
> Sent: Wednesday, July 30, 2014 2:16 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] trunk compilation errors in jenkins
> George,
> #4815 is indirectly related to the move :
> in bcol/basesmuma, we used to compare ompi_process_name_t, and now
> we (try to) compare an ompi_process_name_t and an opal_process_name_t
> (which causes a glory SIGSEGV)
> i proposed a temporary patch which is both broken and unelegant, could
>
> you
>
>
> please advise a correct solution ?
> Cheers,
> Gilles
> On 2014/07/27 7:37, George Bosilca wrote:
>
> If you have any issue with the move, I’ll be happy to help and/or
>
> support
>
>
> you on your last move toward a completely generic BTL. To facilitate
>
> your
>
>
> work I exposed a minimalistic set of OMPI information at the OPAL
>
> level. Take
>
>
> a look at opal/util/proc.h for more info, but please try not to expose
>
> more.
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-
>
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
> mpi.org/community/lists/devel/2014/07/15348.php
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>
>  --
>  This email message is for the sole use of the intended recipient(s)
> and may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>  --
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread Gilles Gouaillardet
a suspect...
>>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>
>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>
>>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>>> it and drop a note to #4815
>>>>> ( I am afk until tomorrow)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote:
>>>>>
>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>>>>> ibm/pt2pt/send test running on a single node.
>>>>>
>>>>>
>>>>>
>>>>> (gdb) where
>>>>>
>>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>>
>>>>> #1  
>>>>>
>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>> ../../orte/util/name_fns.c:522
>>>>>
>>>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>
>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>> "sm_payload_mem_", map_all=false) at
>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>
>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>> reg_data=0xba28c0)
>>>>>
>>>>> at
>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>
>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>> (ml_module=0xba5c40) at 
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>
>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>> (ml_module=0xba5c40) at 
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>
>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>
>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>> priority=0x7fffe7991b58) at
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>
>>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>
>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>
>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>
>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>
>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>
>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>
>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>> comm=0x6037a0) at 
>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>
>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>
>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>
>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>
>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>> send.c:32
&

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
gt;>
>>>>
>>>> George - my point was that we regularly tested using the method in that
>>>> routine, and now we have to do something a little different. So it is an
>>>> "issue" in that we have to make changes across the code base to ensure we
>>>> do things the "new" way, that's all
>>>>
>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>> <bosi...@icl.utk.edu> wrote:
>>>>
>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>> correctly (aka only via the exposed accessors).
>>>>
>>>>   George.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> 
>>>> <r...@open-mpi.org> wrote:
>>>>
>>>>
>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>
>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>> <bosi...@icl.utk.edu> wrote:
>>>>
>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>> bits storage location used by the upper layer to save some local key that
>>>> can be later used to extract information. Calling the OPAL level compare
>>>> function might be a better fit there.
>>>>
>>>>   George.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>
>>>>
>>>> Ralph,
>>>>
>>>> was it really that simple ?
>>>>
>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>> typedef opal_identifier_t opal_process_name_t;
>>>> typedef uint64_t opal_identifier_t;
>>>>
>>>> *but*
>>>>
>>>> item_ptr->peer has type orte_process_name_t :
>>>> struct orte_process_name_t {
>>>>orte_jobid_t jobid;
>>>>orte_vpid_t vpid;
>>>> };
>>>>
>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>>
>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> 
>>>> <r...@open-mpi.org>
>>>> wrote:
>>>>
>>>>
>>>> I just fixed this one - all that was required was an ampersand as the
>>>> name was being passed into the function instead of a pointer to the name
>>>>
>>>> r32357
>>>>
>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>
>>>> Rolf,
>>>>
>>>> r32353 can be seen as a suspect...
>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>
>>>> does the attached patch to #4815 fixes the problem ?
>>>>
>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>> it and drop a note to #4815
>>>> ( I am afk until tomorrow)
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote:
>>>>
>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>>>> ibm/pt2pt/send test running on a single node.
>>>>
>>>>
>>>>
>>>> (gdb) where
>>>>
>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>
>>>> #1  
>>>>
>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>> ../../orte/util/name_fns.c:522
>>>>
>>>> #3  0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>>>> back_files=0x7f6bf3ffd6c8,
>&g

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
<r...@open-mpi.org> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Yeah, my fix won't work for big endian machines - this is going to be 
>>>>>>>>> an
>>>>>>>>> issue across the code base now, so we'll have to troll and fix it. I 
>>>>>>>>> was
>>>>>>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>>>>>> 
>>>>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 
>>>>>>>>> 64
>>>>>>>>> bits storage location used by the upper layer to save some local key 
>>>>>>>>> that
>>>>>>>>> can be later used to extract information. Calling the OPAL level 
>>>>>>>>> compare
>>>>>>>>> function might be a better fit there.
>>>>>>>>> 
>>>>>>>>>   George.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
>>>>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Ralph,
>>>>>>>>>> 
>>>>>>>>>> was it really that simple ?
>>>>>>>>>> 
>>>>>>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>>>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>>>>>>> typedef uint64_t opal_identifier_t;
>>>>>>>>>> 
>>>>>>>>>> *but*
>>>>>>>>>> 
>>>>>>>>>> item_ptr->peer has type orte_process_name_t :
>>>>>>>>>> struct orte_process_name_t {
>>>>>>>>>>orte_jobid_t jobid;
>>>>>>>>>>orte_vpid_t vpid;
>>>>>>>>>> };
>>>>>>>>>> 
>>>>>>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> 
>>>>>>>>>> Gilles
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I just fixed this one - all that was required was an ampersand as 
>>>>>>>>>>> the
>>>>>>>>>>> name was being passed into the function instead of a pointer to the 
>>>>>>>>>>> name
>>>>>>>>>>> 
>>>>>>>>>>> r32357
>>>>>>>>>>> 
>>>>>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>>>>>>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Rolf,
>>>>>>>>>>> 
>>>>>>>>>>> r32353 can be seen as a suspect...
>>>>>>>>>>> Even if it is correct, it might have exposed the bug discussed in 
>>>>>>>>>>> #4815
>>>>>>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>>>>>>> 
>>>>>>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>>>>>>> 
>>>>>>>>>>> If yes, and if you see this issue as a showstopper, feel free to 
>>>>>>>>>>> commit
>>>>>>>>>>> it and drop a note to #4815
>>>>>>>>>>> ( I am afk until tomorrow)
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> 
>>>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
gt;>>
>>>
>>> I just fixed this one - all that was required was an ampersand as the
>>> name was being passed into the function instead of a pointer to the name
>>>
>>> r32357
>>>
>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>>> <gilles.gouaillar...@gmail.com> wrote:
>>>
>>> Rolf,
>>>
>>> r32353 can be seen as a suspect...
>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>> even more (e.g. we hit the bug 100% after the fix)
>>>
>>> does the attached patch to #4815 fixes the problem ?
>>>
>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>> it and drop a note to #4815
>>> ( I am afk until tomorrow)
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote:
>>>
>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>>> ibm/pt2pt/send test running on a single node.
>>>
>>>
>>>
>>> (gdb) where
>>>
>>> #0  0x7f6c0d1321d0 in ?? ()
>>>
>>> #1  
>>>
>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>>
>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>>> back_files=0x7f6bf3ffd6c8,
>>>
>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>> "sm_payload_mem_", map_all=false) at
>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>
>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>> reg_data=0xba28c0)
>>>
>>> at
>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>
>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>
>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>
>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>
>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>> priority=0x7fffe7991b58) at
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>
>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>
>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>
>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>
>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>
>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>
>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>> requested=0, provided=0x7fffe79922e8) at
>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>
>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>> argv=0x7fffe7992340) at pinit.c:84
>>>
>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>> send.c:32
>>>
>>> (gdb) up
>>>
>>> #1  
>>>
>>> (gdb) up
>>>
>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>>
>>> 522   if (name1->jobid < name2->jobid) {
>>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
gt;>>>>>> 
>>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
>>>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Ralph,
>>>>>>>>> 
>>>>>>>>> was it really that simple ?
>>>>>>>>> 
>>>>>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>>>>>> typedef uint64_t opal_identifier_t;
>>>>>>>>> 
>>>>>>>>> *but*
>>>>>>>>> 
>>>>>>>>> item_ptr->peer has type orte_process_name_t :
>>>>>>>>> struct orte_process_name_t {
>>>>>>>>>orte_jobid_t jobid;
>>>>>>>>>orte_vpid_t vpid;
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Gilles
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I just fixed this one - all that was required was an ampersand as the
>>>>>>>>>> name was being passed into the function instead of a pointer to the 
>>>>>>>>>> name
>>>>>>>>>> 
>>>>>>>>>> r32357
>>>>>>>>>> 
>>>>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>>>>>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Rolf,
>>>>>>>>>> 
>>>>>>>>>> r32353 can be seen as a suspect...
>>>>>>>>>> Even if it is correct, it might have exposed the bug discussed in 
>>>>>>>>>> #4815
>>>>>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>>>>>> 
>>>>>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>>>>>> 
>>>>>>>>>> If yes, and if you see this issue as a showstopper, feel free to 
>>>>>>>>>> commit
>>>>>>>>>> it and drop a note to #4815
>>>>>>>>>> ( I am afk until tomorrow)
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> 
>>>>>>>>>> Gilles
>>>>>>>>>> 
>>>>>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Just an FYI that my trunk version (r32355) does not work at all 
>>>>>>>>>> anymore
>>>>>>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from 
>>>>>>>>>> the
>>>>>>>>>> ibm/pt2pt/send test running on a single node.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> (gdb) where
>>>>>>>>>> 
>>>>>>>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>>>>>>> 
>>>>>>>>>> #1  
>>>>>>>>>> 
>>>>>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>>>> 
>>>>>>>>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, 
>>>>>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>>>>>> back_files=0x7f6bf3ffd6c8,
>>>&

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
st=0x7f6c0c0a6748,
>> back_files=0x7f6bf3ffd6c8,
>>
>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>> "sm_payload_mem_", map_all=false) at
>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>
>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>> reg_data=0xba28c0)
>>
>> at
>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>
>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>
>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>
>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>
>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>> priority=0x7fffe7991b58) at
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>
>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>
>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>
>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>
>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>
>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>
>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>
>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>> requested=0, provided=0x7fffe79922e8) at
>> ../../ompi/runtime/ompi_mpi_init.c:918
>>
>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>> argv=0x7fffe7992340) at pinit.c:84
>>
>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>> send.c:32
>>
>> (gdb) up
>>
>> #1  
>>
>> (gdb) up
>>
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>
>> 522   if (name1->jobid < name2->jobid) {
>>
>> (gdb) print name1
>>
>> $1 = (const orte_process_name_t *) 0x192350001
>>
>> (gdb) print *name1
>>
>> Cannot access memory at address 0x192350001
>>
>> (gdb) print name2
>>
>> $2 = (const orte_process_name_t *) 0xbaf76c
>>
>> (gdb) print *name2
>>
>> $3 = {jobid = 2452946945, vpid = 1}
>>
>> (gdb)
>>
>>
>>
>>
>>
>>
>>
>>  -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org <devel-boun...@open-mpi.org>
>>
>>  <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of 
>> Gilles
>>
>>
>>  Gouaillardet
>> Sent: Wednesday, July 30, 2014 2:16 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>> George,
>> #4815 is indirectly related to the move :
>> in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>> we (try to) compare an ompi_process_name_t and an opal_process_name_t
>> (which causes a glory SIGSEGV)
>> i proposed a temporary patch which is both broken and unelegant, could
>>
>>  you
>>
>>
>>  please advise a correct solution ?
>> Cheers,
>> Gilles
>> On 2014/07/27 7:37, George Bosilca wrote:
>>
>>  If you have any issue with the move, I’ll be happy to help and/or
>>
>>  support
>>
>>
>>  you on your last move toward a completely generic BTL. To facilitate
>>
>>  your
>>
>>
>>  work I exposed a minimalistic set of OMPI information at the OPAL
>>
>>  level. Take
>>
>>
>>  a look at opal/util/proc.h for more info, but please try not to expose
>>
>>  more.
>>
>>
>>  ___
>> devel mailing listde...@open-mpi.org
>> Subscripti

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
>>>>>> name
>>>>>>>>> 
>>>>>>>>> r32357
>>>>>>>>> 
>>>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>>>>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Rolf,
>>>>>>>>> 
>>>>>>>>> r32353 can be seen as a suspect...
>>>>>>>>> Even if it is correct, it might have exposed the bug discussed in 
>>>>>>>>> #4815
>>>>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>>>>> 
>>>>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>>>>> 
>>>>>>>>> If yes, and if you see this issue as a showstopper, feel free to 
>>>>>>>>> commit
>>>>>>>>> it and drop a note to #4815
>>>>>>>>> ( I am afk until tomorrow)
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Gilles
>>>>>>>>> 
>>>>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>>>>>>>>> 
>>>>>>>>> Just an FYI that my trunk version (r32355) does not work at all 
>>>>>>>>> anymore
>>>>>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from 
>>>>>>>>> the
>>>>>>>>> ibm/pt2pt/send test running on a single node.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> (gdb) where
>>>>>>>>> 
>>>>>>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>>>>>> 
>>>>>>>>> #1  
>>>>>>>>> 
>>>>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>>> 
>>>>>>>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, 
>>>>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>>>>> 
>>>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>>>>>> "sm_payload_mem_", map_all=false) at
>>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>>>>> 
>>>>>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>>>> reg_data=0xba28c0)
>>>>>>>>> 
>>>>>>>>> at
>>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>>>> 
>>>>>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>>>> (ml_module=0xba5c40) at 
>>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>>>> 
>>>>>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>>>>>> (ml_module=0xba5c40) at 
>>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>>>> 
>>>>>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) 
>>>>>>>>> at
>>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>>>> 
>>>>>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>>>> priority=0x7fffe7991b58) at
>>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>>>> 
>>>>>>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>&

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
/base/coll_base_comm_select.c:117
>
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
> requested=0, provided=0x7fffe79922e8) at
> ../../ompi/runtime/ompi_mpi_init.c:918
>
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
> argv=0x7fffe7992340) at pinit.c:84
>
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
> send.c:32
>
> (gdb) up
>
> #1  
>
> (gdb) up
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> 522   if (name1->jobid < name2->jobid) {
>
> (gdb) print name1
>
> $1 = (const orte_process_name_t *) 0x192350001
>
> (gdb) print *name1
>
> Cannot access memory at address 0x192350001
>
> (gdb) print name2
>
> $2 = (const orte_process_name_t *) 0xbaf76c
>
> (gdb) print *name2
>
> $3 = {jobid = 2452946945, vpid = 1}
>
> (gdb)
>
>
>
>
>
>
>
>  -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org <devel-boun...@open-mpi.org>
>
>  <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of 
> Gilles
>
>
>  Gouaillardet
> Sent: Wednesday, July 30, 2014 2:16 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] trunk compilation errors in jenkins
> George,
> #4815 is indirectly related to the move :
> in bcol/basesmuma, we used to compare ompi_process_name_t, and now
> we (try to) compare an ompi_process_name_t and an opal_process_name_t
> (which causes a glory SIGSEGV)
> i proposed a temporary patch which is both broken and unelegant, could
>
>  you
>
>
>  please advise a correct solution ?
> Cheers,
> Gilles
> On 2014/07/27 7:37, George Bosilca wrote:
>
>  If you have any issue with the move, I’ll be happy to help and/or
>
>  support
>
>
>  you on your last move toward a completely generic BTL. To facilitate
>
>  your
>
>
>  work I exposed a minimalistic set of OMPI information at the OPAL
>
>  level. Take
>
>
>  a look at opal/util/proc.h for more info, but please try not to expose
>
>  more.
>
>
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-
>
>  <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>
>  mpi.org/community/lists/devel/2014/07/15348.php
>
>  <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>  --
>  This email message is for the sole use of the intended recipient(s)
> and may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>  --
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
>
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php
>
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15368.php
>
>  ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15446.php
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15454.php
>
>
>  ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15509.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15514.php
>


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
;> 
>>>>>>>> 
>>>>>>>> (gdb) where
>>>>>>>> 
>>>>>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>>>>> 
>>>>>>>> #1  
>>>>>>>> 
>>>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>> 
>>>>>>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, 
>>>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>>>> 
>>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>>>>> "sm_payload_mem_", map_all=false) at
>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>>>> 
>>>>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>>> reg_data=0xba28c0)
>>>>>>>> 
>>>>>>>> at
>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>>> 
>>>>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>>> (ml_module=0xba5c40) at 
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>>> 
>>>>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>>>>> (ml_module=0xba5c40) at 
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>>> 
>>>>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>>> 
>>>>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>>> priority=0x7fffe7991b58) at
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>>> 
>>>>>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>> 
>>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>>>> 
>>>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>> 
>>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>>>> 
>>>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>>>> 
>>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>>>> 
>>>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>>>>> comm=0x6037a0) at 
>>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>>>> 
>>>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>>>> 
>>>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>>>> 
>>>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>>>> 
>>>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>>>>> send.c:32
>>>>>>>> 
>>>>>>>> (gdb) up
>>>>>>>> 
>>>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
>>>>>
>>>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>> reg_data=0xba28c0)
>>>>>>>
>>>>>>> at
>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>>
>>>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>> (ml_module=0xba5c40) at 
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>>
>>>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>>>> (ml_module=0xba5c40) at 
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>>
>>>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>>
>>>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>> priority=0x7fffe7991b58) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>>
>>>>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>>>
>>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>>>
>>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>>>
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>>>
>>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>>>> comm=0x6037a0) at 
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>>>
>>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>>>
>>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>>>
>>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>>>
>>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>>>> send.c:32
>>>>>>>
>>>>>>> (gdb) up
>>>>>>>
>>>>>>> #1  
>>>>>>>
>>>>>>> (gdb) up
>>>>>>>
>>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>
>>>>>>> 522   if (name1->jobid < name2->jobid) {
>>>>>>>
>>>>>>> (gdb) print name1
>>>>>>>
>>>>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>>>>
>>>>>>> (gdb) print *name1
>>>>>>>
>>>>>>> Cannot access memory at address 0x192350001
>>>>>>>
>>>>>>> (gdb) print name2
>>>>>>>
>>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>>>>
>>>>>>> (gdb) print *name2
>>>>>>>
>>>>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>>>>
>>>>>>> (gdb)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>&

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
gt;>>>>>> 
>>>>>>>> (gdb) where
>>>>>>>> 
>>>>>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>>>>> 
>>>>>>>> #1  
>>>>>>>> 
>>>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>> 
>>>>>>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, 
>>>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>>>> 
>>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>>>>> "sm_payload_mem_", map_all=false) at
>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>>>> 
>>>>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>>> reg_data=0xba28c0)
>>>>>>>> 
>>>>>>>> at
>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>>> 
>>>>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>>> (ml_module=0xba5c40) at 
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>>> 
>>>>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>>>>> (ml_module=0xba5c40) at 
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>>> 
>>>>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>>> 
>>>>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>>> priority=0x7fffe7991b58) at
>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>>> 
>>>>>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>> 
>>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>>>> 
>>>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>> 
>>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>>>> 
>>>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>>>> 
>>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>>>> 
>>>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>>>>> comm=0x6037a0) at 
>>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>>>> 
>>>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>>>> 
>>>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>>>> 
>>>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>>>> 
>>>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>>>>> send.c:32
>>>>>>>> 
>>>>>>>> (gdb) up
>>>>>>>> 
>>>>>>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
37
>>>>>>> 
>>>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>> reg_data=0xba28c0)
>>>>>>> 
>>>>>>> at
>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>> 
>>>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>> (ml_module=0xba5c40) at 
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>> 
>>>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>>>> (ml_module=0xba5c40) at 
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>> 
>>>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>> 
>>>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>> priority=0x7fffe7991b58) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>> 
>>>>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>> 
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>>> 
>>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>> 
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>>> 
>>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>>> 
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>>> 
>>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>>>> comm=0x6037a0) at 
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>>> 
>>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>>> 
>>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>>> 
>>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>>> 
>>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>>>> send.c:32
>>>>>>> 
>>>>>>> (gdb) up
>>>>>>> 
>>>>>>> #1  
>>>>>>> 
>>>>>>> (gdb) up
>>>>>>> 
>>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>> 
>>>>>>> 522   if (name1->jobid < name2->jobid) {
>>>>>>> 
>>>>>>> (gdb) print name1
>>>>>>> 
>>>>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>>>> 
>>>>>>> (gdb) print *name1
>>>>>>> 
>>>>>>> Cannot access memory at address 0x192350001
>>>>>>> 
>>>>>>> (gdb) print name2
>>>>>>> 
>>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>>>> 
>>>>>>> (gdb) print *name2
>>>>>>> 
>>>>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>>>> 
>>>>>>> (gdb)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
gt;>>>
>>>>> was it really that simple ?
>>>>>
>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>> typedef uint64_t opal_identifier_t;
>>>>>
>>>>> *but*
>>>>>
>>>>> item_ptr->peer has type orte_process_name_t :
>>>>> struct orte_process_name_t {
>>>>>orte_jobid_t jobid;
>>>>>orte_vpid_t vpid;
>>>>> };
>>>>>
>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org>
>>>>> wrote:
>>>>>
>>>>>> I just fixed this one - all that was required was an ampersand as the
>>>>>> name was being passed into the function instead of a pointer to the name
>>>>>>
>>>>>> r32357
>>>>>>
>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>
>>>>>> Rolf,
>>>>>>
>>>>>> r32353 can be seen as a suspect...
>>>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>>
>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>>
>>>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>>>> it and drop a note to #4815
>>>>>> ( I am afk until tomorrow)
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>>>>>>
>>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the
>>>>>> ibm/pt2pt/send test running on a single node.
>>>>>>
>>>>>>
>>>>>>
>>>>>> (gdb) where
>>>>>>
>>>>>> #0  0x7f6c0d1321d0 in ?? ()
>>>>>>
>>>>>> #1  
>>>>>>
>>>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>> ../../orte/util/name_fns.c:522
>>>>>>
>>>>>> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, 
>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>>
>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>>> "sm_payload_mem_", map_all=false) at
>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>>
>>>>>> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>> reg_data=0xba28c0)
>>>>>>
>>>>>> at
>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>
>>>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>>>> (ml_module=0xba5c40) at 
>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>
>>>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>>>> (ml_module=0xba5c40) at 
>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>
>>>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>
>>>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>> priority=0x7fffe7991b58) at
>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>
>>>>>> #9  0x7f6c18cc5b09 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
_module=0x7f6bf3b68040,
>>>> reg_data=0xba28c0)
>>>>
>>>> at
>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>
>>>> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols
>>>> (ml_module=0xba5c40) at 
>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>
>>>> #6  0x7f6c0cced68f in ml_module_memory_initialization
>>>> (ml_module=0xba5c40) at 
>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>
>>>> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>
>>>> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>> priority=0x7fffe7991b58) at
>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>
>>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>
>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>
>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>
>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>
>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>
>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>
>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>> comm=0x6037a0) at 
>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>
>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>
>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>> requested=0, provided=0x7fffe79922e8) at
>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>
>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>
>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>> send.c:32
>>>>
>>>> (gdb) up
>>>>
>>>> #1  
>>>>
>>>> (gdb) up
>>>>
>>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>> ../../orte/util/name_fns.c:522
>>>>
>>>> 522   if (name1->jobid < name2->jobid) {
>>>>
>>>> (gdb) print name1
>>>>
>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>
>>>> (gdb) print *name1
>>>>
>>>> Cannot access memory at address 0x192350001
>>>>
>>>> (gdb) print name2
>>>>
>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>
>>>> (gdb) print *name2
>>>>
>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>
>>>> (gdb)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> >-Original Message-
>>>>
>>>> >From: devel [mailto:devel-boun...@open-mpi.org
>>>> <devel-boun...@open-mpi.org>] On Behalf Of Gilles
>>>>
>>>> >Gouaillardet
>>>>
>>>> >Sent: Wednesday, July 30, 2014 2:16 AM
>>>>
>>>> >To: Open MPI Developers
>>>>
>>>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>>>
>>>> >
>>>>
>>>> >George,
>>>>
>>>> >
>>>>
>>>> >#4815 is indirectly related to the move :
>>>>
>>>> >
>>>>
>>>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>>>
>>>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>>>
>>>> >(which causes a glory SIGSEGV)
>>>>
>>>> >
>>>>
>>>> >i proposed a temporary patch which is both broken and unelegant, could
>>>> you
>>>>
>>>> >please advis

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
=0x6037a0, 
>>> priority=0x7fffe7991b58) at 
>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>> 
>>> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>> 
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>> 
>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>>> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>> 
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>> 
>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>> 
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>> 
>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>> 
>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>> 
>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>>> requested=0, provided=0x7fffe79922e8) at 
>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>> 
>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, 
>>> argv=0x7fffe7992340) at pinit.c:84
>>> 
>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>>> 
>>> (gdb) up
>>> 
>>> #1  
>>> 
>>> (gdb) up
>>> 
>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>> 
>>> 522   if (name1->jobid < name2->jobid) {
>>> 
>>> (gdb) print name1
>>> 
>>> $1 = (const orte_process_name_t *) 0x192350001
>>> 
>>> (gdb) print *name1
>>> 
>>> Cannot access memory at address 0x192350001
>>> 
>>> (gdb) print name2
>>> 
>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>> 
>>> (gdb) print *name2
>>> 
>>> $3 = {jobid = 2452946945, vpid = 1}
>>> 
>>> (gdb)
>>> 
>>>  
>>>  
>>> 
>>>  
>>> 
>>> >-Original Message-
>>> 
>>> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>>> 
>>> >Gouaillardet
>>> 
>>> >Sent: Wednesday, July 30, 2014 2:16 AM
>>> 
>>> >To: Open MPI Developers
>>> 
>>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>> 
>>> > 
>>> 
>>> >George,
>>> 
>>> > 
>>> 
>>> >#4815 is indirectly related to the move :
>>> 
>>> > 
>>> 
>>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>> 
>>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>> 
>>> >(which causes a glory SIGSEGV)
>>> 
>>> > 
>>> 
>>> >i proposed a temporary patch which is both broken and unelegant, could you
>>> 
>>> >please advise a correct solution ?
>>> 
>>> > 
>>> 
>>> >Cheers,
>>> 
>>> > 
>>> 
>>> >Gilles
>>> 
>>> > 
>>> 
>>> >On 2014/07/27 7:37, George Bosilca wrote:
>>> 
>>> >> If you have any issue with the move, I’ll be happy to help and/or support
>>> 
>>> >you on your last move toward a completely generic BTL. To facilitate your
>>> 
>>> >work I exposed a minimalistic set of OMPI information at the OPAL level. 
>>> >Take
>>> 
>>> >a look at opal/util/proc.h for more info, but please try not to expose 
>>> >more.
>>> 
>>> > 
>>> 
>>> >___
>>> 
>>> >devel mailing list
>>> 
>>> >de...@open-mpi.org
>>> 
>>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> >Link to this post: http://www.open-
>>> 
>>> >mpi.org/community/lists/devel/2014/07/15348.php
>>> 
>>> This email message is for the sole use of the intended recipient(s) and may 
>>> contain confidential information.  Any unauthorized review, use, disclosure 
>>> or distribution is prohibited.  If you are not the intended recipient, 
>>> please contact the sender by reply email and destroy all copies of the 
>>> original message.
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15366.php



Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
ery (component=0x7f6c0cf50940,
>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>
>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>
>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>
>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>
>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>
>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>> requested=0, provided=0x7fffe79922e8) at
>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>
>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>> argv=0x7fffe7992340) at pinit.c:84
>>>
>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>>>
>>> (gdb) up
>>>
>>> #1  
>>>
>>> (gdb) up
>>>
>>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>>
>>> 522   if (name1->jobid < name2->jobid) {
>>>
>>> (gdb) print name1
>>>
>>> $1 = (const orte_process_name_t *) 0x192350001
>>>
>>> (gdb) print *name1
>>>
>>> Cannot access memory at address 0x192350001
>>>
>>> (gdb) print name2
>>>
>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>
>>> (gdb) print *name2
>>>
>>> $3 = {jobid = 2452946945, vpid = 1}
>>>
>>> (gdb)
>>>
>>>
>>>
>>>
>>>
>>>
>>> >-Original Message-
>>>
>>> >From: devel [mailto:devel-boun...@open-mpi.org
>>> <devel-boun...@open-mpi.org>] On Behalf Of Gilles
>>>
>>> >Gouaillardet
>>>
>>> >Sent: Wednesday, July 30, 2014 2:16 AM
>>>
>>> >To: Open MPI Developers
>>>
>>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>>
>>> >
>>>
>>> >George,
>>>
>>> >
>>>
>>> >#4815 is indirectly related to the move :
>>>
>>> >
>>>
>>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>>
>>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>>
>>> >(which causes a glory SIGSEGV)
>>>
>>> >
>>>
>>> >i proposed a temporary patch which is both broken and unelegant, could
>>> you
>>>
>>> >please advise a correct solution ?
>>>
>>> >
>>>
>>> >Cheers,
>>>
>>> >
>>>
>>> >Gilles
>>>
>>> >
>>>
>>> >On 2014/07/27 7:37, George Bosilca wrote:
>>>
>>> >> If you have any issue with the move, I’ll be happy to help and/or
>>> support
>>>
>>> >you on your last move toward a completely generic BTL. To facilitate
>>> your
>>>
>>> >work I exposed a minimalistic set of OMPI information at the OPAL
>>> level. Take
>>>
>>> >a look at opal/util/proc.h for more info, but please try not to expose
>>> more.
>>>
>>> >
>>>
>>> >___
>>>
>>> >devel mailing list
>>>
>>> >de...@open-mpi.org
>>>
>>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> >Link to this post: http://www.open-
>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>
>>> >mpi.org/community/lists/devel/2014/07/15348.php
>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>  --
>>>  This email message is for the sole use of the intended recipient(s)
>>> and may contain confidential information.  Any unauthorized review, use,
>>> disclosure or distribution is prohibited.  If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>>  --
>>>  ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
>


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
_coll_base_comm_select (comm=0x6037a0) at 
>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>> 
>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>> requested=0, provided=0x7fffe79922e8) at 
>> ../../ompi/runtime/ompi_mpi_init.c:918
>> 
>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, 
>> argv=0x7fffe7992340) at pinit.c:84
>> 
>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>> 
>> (gdb) up
>> 
>> #1  
>> 
>> (gdb) up
>> 
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>> 
>> 522   if (name1->jobid < name2->jobid) {
>> 
>> (gdb) print name1
>> 
>> $1 = (const orte_process_name_t *) 0x192350001
>> 
>> (gdb) print *name1
>> 
>> Cannot access memory at address 0x192350001
>> 
>> (gdb) print name2
>> 
>> $2 = (const orte_process_name_t *) 0xbaf76c
>> 
>> (gdb) print *name2
>> 
>> $3 = {jobid = 2452946945, vpid = 1}
>> 
>> (gdb)
>> 
>>  
>>  
>> 
>>  
>> 
>> >-Original Message-
>> 
>> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>> 
>> >Gouaillardet
>> 
>> >Sent: Wednesday, July 30, 2014 2:16 AM
>> 
>> >To: Open MPI Developers
>> 
>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>> 
>> > 
>> 
>> >George,
>> 
>> > 
>> 
>> >#4815 is indirectly related to the move :
>> 
>> > 
>> 
>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>> 
>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>> 
>> >(which causes a glory SIGSEGV)
>> 
>> > 
>> 
>> >i proposed a temporary patch which is both broken and unelegant, could you
>> 
>> >please advise a correct solution ?
>> 
>> > 
>> 
>> >Cheers,
>> 
>> > 
>> 
>> >Gilles
>> 
>> > 
>> 
>> >On 2014/07/27 7:37, George Bosilca wrote:
>> 
>> >> If you have any issue with the move, I’ll be happy to help and/or support
>> 
>> >you on your last move toward a completely generic BTL. To facilitate your
>> 
>> >work I exposed a minimalistic set of OMPI information at the OPAL level. 
>> >Take
>> 
>> >a look at opal/util/proc.h for more info, but please try not to expose more.
>> 
>> > 
>> 
>> >___
>> 
>> >devel mailing list
>> 
>> >de...@open-mpi.org
>> 
>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> >Link to this post: http://www.open-
>> 
>> >mpi.org/community/lists/devel/2014/07/15348.php
>> 
>> This email message is for the sole use of the intended recipient(s) and may 
>> contain confidential information.  Any unauthorized review, use, disclosure 
>> or distribution is prohibited.  If you are not the intended recipient, 
>> please contact the sender by reply email and destroy all copies of the 
>> original message.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15364.php



Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
it (argc=0x7fffe799234c,
>> argv=0x7fffe7992340) at pinit.c:84
>>
>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>>
>> (gdb) up
>>
>> #1  
>>
>> (gdb) up
>>
>> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15
>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>
>> 522   if (name1->jobid < name2->jobid) {
>>
>> (gdb) print name1
>>
>> $1 = (const orte_process_name_t *) 0x192350001
>>
>> (gdb) print *name1
>>
>> Cannot access memory at address 0x192350001
>>
>> (gdb) print name2
>>
>> $2 = (const orte_process_name_t *) 0xbaf76c
>>
>> (gdb) print *name2
>>
>> $3 = {jobid = 2452946945, vpid = 1}
>>
>> (gdb)
>>
>>
>>
>>
>>
>>
>>
>> >-Original Message-
>>
>> >From: devel [mailto:devel-boun...@open-mpi.org
>> <devel-boun...@open-mpi.org>] On Behalf Of Gilles
>>
>> >Gouaillardet
>>
>> >Sent: Wednesday, July 30, 2014 2:16 AM
>>
>> >To: Open MPI Developers
>>
>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>
>> >
>>
>> >George,
>>
>> >
>>
>> >#4815 is indirectly related to the move :
>>
>> >
>>
>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>
>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>
>> >(which causes a glory SIGSEGV)
>>
>> >
>>
>> >i proposed a temporary patch which is both broken and unelegant, could
>> you
>>
>> >please advise a correct solution ?
>>
>> >
>>
>> >Cheers,
>>
>> >
>>
>> >Gilles
>>
>> >
>>
>> >On 2014/07/27 7:37, George Bosilca wrote:
>>
>> >> If you have any issue with the move, I’ll be happy to help and/or
>> support
>>
>> >you on your last move toward a completely generic BTL. To facilitate your
>>
>> >work I exposed a minimalistic set of OMPI information at the OPAL level.
>> Take
>>
>> >a look at opal/util/proc.h for more info, but please try not to expose
>> more.
>>
>> >
>>
>> >___
>>
>> >devel mailing list
>>
>> >de...@open-mpi.org
>>
>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> >Link to this post: http://www.open-
>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>
>> >mpi.org/community/lists/devel/2014/07/15348.php
>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>  --
>>  This email message is for the sole use of the intended recipient(s) and
>> may contain confidential information.  Any unauthorized review, use,
>> disclosure or distribution is prohibited.  If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>>  --
>>  ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
Ralph,

was it really that simple ?

proc_temp->super.proc_name has type opal_process_name_t :
typedef opal_identifier_t opal_process_name_t;
typedef uint64_t opal_identifier_t;

*but*

item_ptr->peer has type orte_process_name_t :
struct orte_process_name_t {
   orte_jobid_t jobid;
   orte_vpid_t vpid;
};

bottom line, is r32357 still valid on a big endian arch ?

Cheers,

Gilles


On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I just fixed this one - all that was required was an ampersand as the name
> was being passed into the function instead of a pointer to the name
>
> r32357
>
> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
> gilles.gouaillar...@gmail.com> wrote:
>
> Rolf,
>
> r32353 can be seen as a suspect...
> Even if it is correct, it might have exposed the bug discussed in #4815
> even more (e.g. we hit the bug 100% after the fix)
>
> does the attached patch to #4815 fixes the problem ?
>
> If yes, and if you see this issue as a showstopper, feel free to commit it
> and drop a note to #4815
> ( I am afk until tomorrow)
>
> Cheers,
>
> Gilles
>
> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
> Just an FYI that my trunk version (r32355) does not work at all anymore if
> I do not include "--mca coll ^ml".Here is a stack trace from the
> ibm/pt2pt/send test running on a single node.
>
>
>
> (gdb) where
>
> #0  0x7f6c0d1321d0 in ?? ()
>
> #1  
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
> back_files=0x7f6bf3ffd6c8,
>
> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_",
> map_all=false) at
> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
> reg_data=0xba28c0)
>
> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40)
> at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
> #6  0x7f6c0cced68f in ml_module_memory_initialization
> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
> priority=0x7fffe7991b58) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0,
> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
> requested=0, provided=0x7fffe79922e8) at
> ../../ompi/runtime/ompi_mpi_init.c:918
>
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
> argv=0x7fffe7992340) at pinit.c:84
>
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
> (gdb) up
>
> #1  
>
> (gdb) up
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> 522   if (name1->jobid < name2->jobid) {
>
> (gdb) print name1
>
> $1 = (const orte_process_name_t *) 0x192350001
>
> (gdb) print *name1
>
> Cannot access memory at address 0x192350001
>
> (gdb) print name2
>
> $2 = (const orte_process_name_t *) 0xbaf76c
>
> (gdb) print *name2
>
> $3 = {jobid = 2452946945, vpid = 1}
>
> (gdb)
>
>
>
>
>
>
>
> >-Original Message-
>
> >From: devel [mailto:devel-boun...

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Rolf vandeVaart
Thanks Ralph and Gilles!  All is looking good for me again.  I think all tests 
are passing again.  Will check results again tomorrow.

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 30, 2014 10:49 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

I just fixed this one - all that was required was an ampersand as the name was 
being passed into the function instead of a pointer to the name

r32357

On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
<gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote:


Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:

Just an FYI that my trunk version (r32355) does not work at all anymore if I do 
not include "--mca coll ^ml".Here is a stack trace from the ibm/pt2pt/send 
test running on a single node.



(gdb) where

#0  0x7f6c0d1321d0 in ?? ()

#1  

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
back_files=0x7f6bf3ffd6c8,

comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
map_all=false) at 
../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
reg_data=0xba28c0)

at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
priority=0x7fffe7991b58) at 
../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
priority=0x7fffe7991b58, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355

#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
component=0x7f6c0cf50940, module=0x7fffe7991b90)

at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317

#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281

#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
../../../../ompi/mca/coll/base/coll_base_comm_select.c:117

#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918

#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
at pinit.c:84

#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32

(gdb) up

#1  

(gdb) up

#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

522   if (name1->jobid < name2->jobid) {

(gdb) print name1

$1 = (const orte_process_name_t *) 0x192350001

(gdb) print *name1

Cannot access memory at address 0x192350001

(gdb) print name2

$2 = (const orte_process_name_t *) 0xbaf76c

(gdb) print *name2

$3 = {jobid = 2452946945, vpid = 1}

(gdb)







>-Original Message-

>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles

>Gouaillardet

>Sent: Wednesday, July 30, 2014 2:16 AM

>To: Open MPI Developers

>Subject: Re: [OMPI devel] trunk compilation errors in jenkins

>

>George,

>

>#4815 is indirectly related to the move :

>

>in bcol/basesmuma, we used to compare ompi_process_name_t, and now

>we (try to) compare an ompi_process_name_t and an opal_process_name_t

>(which causes a glory SIGSEGV)

>

>i proposed a temporary patch which is both broken and unelegant, could you

>please advise a correct solution ?

>

>Cheers,

>

>Gilles

>

>On 2014/07/

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
I just fixed this one - all that was required was an ampersand as the name was 
being passed into the function instead of a pointer to the name

r32357

On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
<gilles.gouaillar...@gmail.com> wrote:

> Rolf,
> 
> r32353 can be seen as a suspect...
> Even if it is correct, it might have exposed the bug discussed in #4815 even 
> more (e.g. we hit the bug 100% after the fix)
> 
> does the attached patch to #4815 fixes the problem ?
> 
> If yes, and if you see this issue as a showstopper, feel free to commit it 
> and drop a note to #4815
> ( I am afk until tomorrow)
> 
> Cheers,
> 
> Gilles
> 
> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
> Just an FYI that my trunk version (r32355) does not work at all anymore if I 
> do not include "--mca coll ^ml".Here is a stack trace from the 
> ibm/pt2pt/send test running on a single node.
> 
>  
> 
> (gdb) where
> 
> #0  0x7f6c0d1321d0 in ?? ()
> 
> #1  
> 
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
> 
> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
> back_files=0x7f6bf3ffd6c8,
> 
> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
> map_all=false) at 
> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
> 
> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
> reg_data=0xba28c0)
> 
> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
> 
> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
> 
> #6  0x7f6c0cced68f in ml_module_memory_initialization 
> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
> 
> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
> 
> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
> priority=0x7fffe7991b58) at 
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
> 
> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
> 
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
> 
> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
> priority=0x7fffe7991b58, module=0x7fffe7991b90)
> 
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
> 
> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
> 
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
> 
> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
> 
> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
> 
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
> requested=0, provided=0x7fffe79922e8) at 
> ../../ompi/runtime/ompi_mpi_init.c:918
> 
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, 
> argv=0x7fffe7992340) at pinit.c:84
> 
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
> 
> (gdb) up
> 
> #1  
> 
> (gdb) up
> 
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
> 
> 522   if (name1->jobid < name2->jobid) {
> 
> (gdb) print name1
> 
> $1 = (const orte_process_name_t *) 0x192350001
> 
> (gdb) print *name1
> 
> Cannot access memory at address 0x192350001
> 
> (gdb) print name2
> 
> $2 = (const orte_process_name_t *) 0xbaf76c
> 
> (gdb) print *name2
> 
> $3 = {jobid = 2452946945, vpid = 1}
> 
> (gdb)
> 
>  
> 
>  
> 
>  
> 
> >-Original Message-
> 
> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
> 
> >Gouaillardet
> 
> >Sent: Wednesday, July 30, 2014 2:16 AM
> 
> >To: Open MPI Developers
> 
> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
> 
> > 
> 
> >George,
> 
> > 
> 
> >#4815 is indirectly related to the move :
> 
> > 
> 
> >in bcol/basesmuma, we used to compare ompi_process_n

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles GOUAILLARDET
Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
>
>Just an FYI that my trunk version (r32355) does not work at all anymore if I 
>do not include "--mca coll ^ml".    Here is a stack trace from the 
>ibm/pt2pt/send test running on a single node.
>
> 
>
>(gdb) where
>
>#0  0x7f6c0d1321d0 in ?? ()
>
>#1  
>
>#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>back_files=0x7f6bf3ffd6c8, 
>
>comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>map_all=false) at 
>../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
>#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
>(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>reg_data=0xba28c0)
>
>    at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
>#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
>#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
>at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
>#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
>#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>priority=0x7fffe7991b58) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
>#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
>#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
>#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
>#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
>#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
>#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918
>
>#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
>at pinit.c:84
>
>#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
>(gdb) up
>
>#1  
>
>(gdb) up
>
>#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>522       if (name1->jobid < name2->jobid) {
>
>(gdb) print name1
>
>$1 = (const orte_process_name_t *) 0x192350001
>
>(gdb) print *name1
>
>Cannot access memory at address 0x192350001
>
>(gdb) print name2
>
>$2 = (const orte_process_name_t *) 0xbaf76c
>
>(gdb) print *name2
>
>$3 = {jobid = 2452946945, vpid = 1}
>
>(gdb)
>
> 
>
> 
>
> 
>
>>-Original Message-
>
>>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>
>>Gouaillardet
>
>>Sent: Wednesday, July 30, 2014 2:16 AM
>
>>To: Open MPI Developers
>
>>Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>
>>
>
> 
>
>>George,
>
>>
>
> 
>
>>#4815 is indirectly related to the move :
>
>>
>
> 
>
>>in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>
>>we (try to) compare an ompi_process_name_t and an opal_process_name_t
>
>>(which causes a glory SIGSEGV)
>
>>
>
> 
>
>>i proposed a temporary patch which is both broken and unelegant, could you
>
>>please advise a correct solution ?
>
>>
>
> 
>
>>Cheers,
>
>>
>
> 
>
>>Gilles
>
>>
>
> 
>
>