Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
gt;>>> OPAL_PROCESS_NAME_xTOy >>>>>>> on little endian arch if heterogeneous mode is supported. >>>>>>> >>>>>>> >>>>>>> >>>>>>> does that make sense ? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On 2014/07/31 1:29, George Bosilca wrote: >>>>>>> >>>>>>> The underlying structure changed, so a little bit of fiddling is normal. >>>>>>> Instead of using a field in the ompi_proc_t you are now using a field >>>>>>> down >>>>>>> in opal_proc_t, a field that simply cannot have the same type as before >>>>>>> (orte_process_name_t). >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> <r...@open-mpi.org> wrote: >>>>>>> >>>>>>> >>>>>>> George - my point was that we regularly tested using the method in that >>>>>>> routine, and now we have to do something a little different. So it is an >>>>>>> "issue" in that we have to make changes across the code base to ensure >>>>>>> we >>>>>>> do things the "new" way, that's all >>>>>>> >>>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>> <bosi...@icl.utk.edu> wrote: >>>>>>> >>>>>>> No, this is not going to be an issue if the opal_identifier_t is used >>>>>>> correctly (aka only via the exposed accessors). >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> <r...@open-mpi.org> wrote: >>>>>>> >>>>>>> >>>>>>> Yeah, my fix won't work for big endian machines - this is going to be an >>>>>>> issue across the code base now, so we'll have to troll and fix it. I was >>>>>>> doing the minimal change required to fix the trunk in the meantime. >>>>>>> >>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>> <bosi...@icl.utk.edu> wrote: >>>>>>> >>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 >>>>>>> bits storage location used by the upper layer to save some local key >>>>>>> that >>>>>>> can be later used to extract information. Calling the OPAL level compare >>>>>>> function might be a better fit there. >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet >>>>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Ralph, >>>>>>> >>>>>>> was it really that simple ? >>>>>>> >>>>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>>>> typedef opal_identifier_t opal_process_name_t; >>>>>>> typedef uint64_t opal_identifier_t; >>>>>>> >>>>>>> *but* >>>>>>> >>>>>>> item_ptr->peer has type orte_process_name_t : >>>>>>> struct orte_process_name_t { >>>>>>>orte_jobid_t jobid; >>>>>>>orte_vpid_t vpid; >>>>>>> }; >>>>>>> >>>>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> <
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
"issue" in that we have to make changes across the code base to ensure we >>>>>> do things the "new" way, that's all >>>>>> >>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>> <bosi...@icl.utk.edu> wrote: >>>>>> >>>>>> No, this is not going to be an issue if the opal_identifier_t is used >>>>>> correctly (aka only via the exposed accessors). >>>>>> >>>>>> George. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> >>>>>> <r...@open-mpi.org> wrote: >>>>>> >>>>>> >>>>>> Yeah, my fix won't work for big endian machines - this is going to be an >>>>>> issue across the code base now, so we'll have to troll and fix it. I was >>>>>> doing the minimal change required to fix the trunk in the meantime. >>>>>> >>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>> <bosi...@icl.utk.edu> wrote: >>>>>> >>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 >>>>>> bits storage location used by the upper layer to save some local key that >>>>>> can be later used to extract information. Calling the OPAL level compare >>>>>> function might be a better fit there. >>>>>> >>>>>> George. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet >>>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Ralph, >>>>>> >>>>>> was it really that simple ? >>>>>> >>>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>>> typedef opal_identifier_t opal_process_name_t; >>>>>> typedef uint64_t opal_identifier_t; >>>>>> >>>>>> *but* >>>>>> >>>>>> item_ptr->peer has type orte_process_name_t : >>>>>> struct orte_process_name_t { >>>>>>orte_jobid_t jobid; >>>>>>orte_vpid_t vpid; >>>>>> }; >>>>>> >>>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> >>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>>> <r...@open-mpi.org> >>>>>> >>>>>> wrote: >>>>>> >>>>>> >>>>>> I just fixed this one - all that was required was an ampersand as the >>>>>> name was being passed into the function instead of a pointer to the name >>>>>> >>>>>> r32357 >>>>>> >>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET >>>>>> <gilles.gouaillar...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Rolf, >>>>>> >>>>>> r32353 can be seen as a suspect... >>>>>> Even if it is correct, it might have exposed the bug discussed in #4815 >>>>>> even more (e.g. we hit the bug 100% after the fix) >>>>>> >>>>>> does the attached patch to #4815 fixes the problem ? >>>>>> >>>>>> If yes, and if you see this issue as a showstopper, feel free to commit >>>>>> it and drop a note to #4815 >>>>>> ( I am afk until tomorrow) >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote: >>>>>> >>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore >>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the >>>>>> ibm/pt2pt/send test running on a si
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
select.c:372 > > #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, > comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 > > #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, > component=0x7f6c0cf50940, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 > > #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, > comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 > > #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at > ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 > > #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, > requested=0, provided=0x7fffe79922e8) at > ../../ompi/runtime/ompi_mpi_init.c:918 > > #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, > argv=0x7fffe7992340) at pinit.c:84 > > #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at > send.c:32 > > (gdb) up > > #1 > > (gdb) up > > #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 > '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > > 522 if (name1->jobid < name2->jobid) { > > (gdb) print name1 > > $1 = (const orte_process_name_t *) 0x192350001 > > (gdb) print *name1 > > Cannot access memory at address 0x192350001 > > (gdb) print name2 > > $2 = (const orte_process_name_t *) 0xbaf76c > > (gdb) print *name2 > > $3 = {jobid = 2452946945, vpid = 1} > > (gdb) > > > > > > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> > <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org> > > <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org> > <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of Gilles > > > Gouaillardet > Sent: Wednesday, July 30, 2014 2:16 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] trunk compilation errors in jenkins > George, > #4815 is indirectly related to the move : > in bcol/basesmuma, we used to compare ompi_process_name_t, and now > we (try to) compare an ompi_process_name_t and an opal_process_name_t > (which causes a glory SIGSEGV) > i proposed a temporary patch which is both broken and unelegant, could > > you > > > please advise a correct solution ? > Cheers, > Gilles > On 2014/07/27 7:37, George Bosilca wrote: > > If you have any issue with the move, I’ll be happy to help and/or > > support > > > you on your last move toward a completely generic BTL. To facilitate > > your > > > work I exposed a minimalistic set of OMPI information at the OPAL > > level. Take > > > a look at opal/util/proc.h for more info, but please try not to expose > > more. > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open- > > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > mpi.org/community/lists/devel/2014/07/15348.php > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > > -- > This email message is for the sole use of the intended recipient(s) > and may contain confidential information. Any unauthorized review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender by reply email and destroy all copies > of the original message. > -- > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
a suspect... >>>>> Even if it is correct, it might have exposed the bug discussed in #4815 >>>>> even more (e.g. we hit the bug 100% after the fix) >>>>> >>>>> does the attached patch to #4815 fixes the problem ? >>>>> >>>>> If yes, and if you see this issue as a showstopper, feel free to commit >>>>> it and drop a note to #4815 >>>>> ( I am afk until tomorrow) >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote: >>>>> >>>>> Just an FYI that my trunk version (r32355) does not work at all anymore >>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the >>>>> ibm/pt2pt/send test running on a single node. >>>>> >>>>> >>>>> >>>>> (gdb) where >>>>> >>>>> #0 0x7f6c0d1321d0 in ?? () >>>>> >>>>> #1 >>>>> >>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>> ../../orte/util/name_fns.c:522 >>>>> >>>>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >>>>> back_files=0x7f6bf3ffd6c8, >>>>> >>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>> "sm_payload_mem_", map_all=false) at >>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>> >>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>> reg_data=0xba28c0) >>>>> >>>>> at >>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>> >>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>> (ml_module=0xba5c40) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>> >>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>> (ml_module=0xba5c40) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>> >>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>> >>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>> priority=0x7fffe7991b58) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>> >>>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>> >>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>> >>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>> >>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>> >>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>> >>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>> >>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>> comm=0x6037a0) at >>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>> >>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>> >>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>> requested=0, provided=0x7fffe79922e8) at >>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>> >>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>> >>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>> send.c:32 &
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
gt;> >>>> >>>> George - my point was that we regularly tested using the method in that >>>> routine, and now we have to do something a little different. So it is an >>>> "issue" in that we have to make changes across the code base to ensure we >>>> do things the "new" way, that's all >>>> >>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> >>>> <bosi...@icl.utk.edu> wrote: >>>> >>>> No, this is not going to be an issue if the opal_identifier_t is used >>>> correctly (aka only via the exposed accessors). >>>> >>>> George. >>>> >>>> >>>> >>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> >>>> <r...@open-mpi.org> wrote: >>>> >>>> >>>> Yeah, my fix won't work for big endian machines - this is going to be an >>>> issue across the code base now, so we'll have to troll and fix it. I was >>>> doing the minimal change required to fix the trunk in the meantime. >>>> >>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> >>>> <bosi...@icl.utk.edu> wrote: >>>> >>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 >>>> bits storage location used by the upper layer to save some local key that >>>> can be later used to extract information. Calling the OPAL level compare >>>> function might be a better fit there. >>>> >>>> George. >>>> >>>> >>>> >>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet >>>> <gilles.gouaillar...@gmail.com> wrote: >>>> >>>> >>>> Ralph, >>>> >>>> was it really that simple ? >>>> >>>> proc_temp->super.proc_name has type opal_process_name_t : >>>> typedef opal_identifier_t opal_process_name_t; >>>> typedef uint64_t opal_identifier_t; >>>> >>>> *but* >>>> >>>> item_ptr->peer has type orte_process_name_t : >>>> struct orte_process_name_t { >>>>orte_jobid_t jobid; >>>>orte_vpid_t vpid; >>>> }; >>>> >>>> bottom line, is r32357 still valid on a big endian arch ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>> <r...@open-mpi.org> >>>> wrote: >>>> >>>> >>>> I just fixed this one - all that was required was an ampersand as the >>>> name was being passed into the function instead of a pointer to the name >>>> >>>> r32357 >>>> >>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET >>>> <gilles.gouaillar...@gmail.com> wrote: >>>> >>>> Rolf, >>>> >>>> r32353 can be seen as a suspect... >>>> Even if it is correct, it might have exposed the bug discussed in #4815 >>>> even more (e.g. we hit the bug 100% after the fix) >>>> >>>> does the attached patch to #4815 fixes the problem ? >>>> >>>> If yes, and if you see this issue as a showstopper, feel free to commit >>>> it and drop a note to #4815 >>>> ( I am afk until tomorrow) >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote: >>>> >>>> Just an FYI that my trunk version (r32355) does not work at all anymore >>>> if I do not include "--mca coll ^ml".Here is a stack trace from the >>>> ibm/pt2pt/send test running on a single node. >>>> >>>> >>>> >>>> (gdb) where >>>> >>>> #0 0x7f6c0d1321d0 in ?? () >>>> >>>> #1 >>>> >>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>> ../../orte/util/name_fns.c:522 >>>> >>>> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >>>> back_files=0x7f6bf3ffd6c8, >&g
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
<r...@open-mpi.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yeah, my fix won't work for big endian machines - this is going to be >>>>>>>>> an >>>>>>>>> issue across the code base now, so we'll have to troll and fix it. I >>>>>>>>> was >>>>>>>>> doing the minimal change required to fix the trunk in the meantime. >>>>>>>>> >>>>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a >>>>>>>>> 64 >>>>>>>>> bits storage location used by the upper layer to save some local key >>>>>>>>> that >>>>>>>>> can be later used to extract information. Calling the OPAL level >>>>>>>>> compare >>>>>>>>> function might be a better fit there. >>>>>>>>> >>>>>>>>> George. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet < >>>>>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Ralph, >>>>>>>>>> >>>>>>>>>> was it really that simple ? >>>>>>>>>> >>>>>>>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>>>>>>> typedef opal_identifier_t opal_process_name_t; >>>>>>>>>> typedef uint64_t opal_identifier_t; >>>>>>>>>> >>>>>>>>>> *but* >>>>>>>>>> >>>>>>>>>> item_ptr->peer has type orte_process_name_t : >>>>>>>>>> struct orte_process_name_t { >>>>>>>>>>orte_jobid_t jobid; >>>>>>>>>>orte_vpid_t vpid; >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Gilles >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I just fixed this one - all that was required was an ampersand as >>>>>>>>>>> the >>>>>>>>>>> name was being passed into the function instead of a pointer to the >>>>>>>>>>> name >>>>>>>>>>> >>>>>>>>>>> r32357 >>>>>>>>>>> >>>>>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < >>>>>>>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Rolf, >>>>>>>>>>> >>>>>>>>>>> r32353 can be seen as a suspect... >>>>>>>>>>> Even if it is correct, it might have exposed the bug discussed in >>>>>>>>>>> #4815 >>>>>>>>>>> even more (e.g. we hit the bug 100% after the fix) >>>>>>>>>>> >>>>>>>>>>> does the attached patch to #4815 fixes the problem ? >>>>>>>>>>> >>>>>>>>>>> If yes, and if you see this issue as a showstopper, feel free to >>>>>>>>>>> commit >>>>>>>>>>> it and drop a note to #4815 >>>>>>>>>>> ( I am afk until tomorrow) >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
gt;>> >>> >>> I just fixed this one - all that was required was an ampersand as the >>> name was being passed into the function instead of a pointer to the name >>> >>> r32357 >>> >>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET >>> <gilles.gouaillar...@gmail.com> wrote: >>> >>> Rolf, >>> >>> r32353 can be seen as a suspect... >>> Even if it is correct, it might have exposed the bug discussed in #4815 >>> even more (e.g. we hit the bug 100% after the fix) >>> >>> does the attached patch to #4815 fixes the problem ? >>> >>> If yes, and if you see this issue as a showstopper, feel free to commit >>> it and drop a note to #4815 >>> ( I am afk until tomorrow) >>> >>> Cheers, >>> >>> Gilles >>> >>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote: >>> >>> Just an FYI that my trunk version (r32355) does not work at all anymore >>> if I do not include "--mca coll ^ml".Here is a stack trace from the >>> ibm/pt2pt/send test running on a single node. >>> >>> >>> >>> (gdb) where >>> >>> #0 0x7f6c0d1321d0 in ?? () >>> >>> #1 >>> >>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >>> >>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >>> back_files=0x7f6bf3ffd6c8, >>> >>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>> "sm_payload_mem_", map_all=false) at >>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>> >>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>> reg_data=0xba28c0) >>> >>> at >>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>> >>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>> >>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>> >>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>> >>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>> priority=0x7fffe7991b58) at >>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>> >>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>> >>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>> >>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>> >>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>> >>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>> >>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>> requested=0, provided=0x7fffe79922e8) at >>> ../../ompi/runtime/ompi_mpi_init.c:918 >>> >>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>> argv=0x7fffe7992340) at pinit.c:84 >>> >>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>> send.c:32 >>> >>> (gdb) up >>> >>> #1 >>> >>> (gdb) up >>> >>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >>> >>> 522 if (name1->jobid < name2->jobid) { >>>
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
gt;>>>>>> >>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet < >>>>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Ralph, >>>>>>>>> >>>>>>>>> was it really that simple ? >>>>>>>>> >>>>>>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>>>>>> typedef opal_identifier_t opal_process_name_t; >>>>>>>>> typedef uint64_t opal_identifier_t; >>>>>>>>> >>>>>>>>> *but* >>>>>>>>> >>>>>>>>> item_ptr->peer has type orte_process_name_t : >>>>>>>>> struct orte_process_name_t { >>>>>>>>>orte_jobid_t jobid; >>>>>>>>>orte_vpid_t vpid; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Gilles >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I just fixed this one - all that was required was an ampersand as the >>>>>>>>>> name was being passed into the function instead of a pointer to the >>>>>>>>>> name >>>>>>>>>> >>>>>>>>>> r32357 >>>>>>>>>> >>>>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < >>>>>>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Rolf, >>>>>>>>>> >>>>>>>>>> r32353 can be seen as a suspect... >>>>>>>>>> Even if it is correct, it might have exposed the bug discussed in >>>>>>>>>> #4815 >>>>>>>>>> even more (e.g. we hit the bug 100% after the fix) >>>>>>>>>> >>>>>>>>>> does the attached patch to #4815 fixes the problem ? >>>>>>>>>> >>>>>>>>>> If yes, and if you see this issue as a showstopper, feel free to >>>>>>>>>> commit >>>>>>>>>> it and drop a note to #4815 >>>>>>>>>> ( I am afk until tomorrow) >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Gilles >>>>>>>>>> >>>>>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote: >>>>>>>>>> >>>>>>>>>> Just an FYI that my trunk version (r32355) does not work at all >>>>>>>>>> anymore >>>>>>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from >>>>>>>>>> the >>>>>>>>>> ibm/pt2pt/send test running on a single node. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (gdb) where >>>>>>>>>> >>>>>>>>>> #0 0x7f6c0d1321d0 in ?? () >>>>>>>>>> >>>>>>>>>> #1 >>>>>>>>>> >>>>>>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>>>>> ../../orte/util/name_fns.c:522 >>>>>>>>>> >>>>>>>>>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, >>>>>>>>>> peer_list=0x7f6c0c0a6748, >>>>>>>>>> back_files=0x7f6bf3ffd6c8, >>>&
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
st=0x7f6c0c0a6748, >> back_files=0x7f6bf3ffd6c8, >> >> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >> "sm_payload_mem_", map_all=false) at >> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >> >> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >> reg_data=0xba28c0) >> >> at >> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >> >> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >> >> #6 0x7f6c0cced68f in ml_module_memory_initialization >> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >> >> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >> >> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >> priority=0x7fffe7991b58) at >> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >> >> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >> >> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >> >> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >> >> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >> >> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >> component=0x7f6c0cf50940, module=0x7fffe7991b90) >> >> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >> >> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >> >> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >> >> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >> requested=0, provided=0x7fffe79922e8) at >> ../../ompi/runtime/ompi_mpi_init.c:918 >> >> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >> argv=0x7fffe7992340) at pinit.c:84 >> >> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >> send.c:32 >> >> (gdb) up >> >> #1 >> >> (gdb) up >> >> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >> >> 522 if (name1->jobid < name2->jobid) { >> >> (gdb) print name1 >> >> $1 = (const orte_process_name_t *) 0x192350001 >> >> (gdb) print *name1 >> >> Cannot access memory at address 0x192350001 >> >> (gdb) print name2 >> >> $2 = (const orte_process_name_t *) 0xbaf76c >> >> (gdb) print *name2 >> >> $3 = {jobid = 2452946945, vpid = 1} >> >> (gdb) >> >> >> >> >> >> >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> >> >> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of >> Gilles >> >> >> Gouaillardet >> Sent: Wednesday, July 30, 2014 2:16 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] trunk compilation errors in jenkins >> George, >> #4815 is indirectly related to the move : >> in bcol/basesmuma, we used to compare ompi_process_name_t, and now >> we (try to) compare an ompi_process_name_t and an opal_process_name_t >> (which causes a glory SIGSEGV) >> i proposed a temporary patch which is both broken and unelegant, could >> >> you >> >> >> please advise a correct solution ? >> Cheers, >> Gilles >> On 2014/07/27 7:37, George Bosilca wrote: >> >> If you have any issue with the move, I’ll be happy to help and/or >> >> support >> >> >> you on your last move toward a completely generic BTL. To facilitate >> >> your >> >> >> work I exposed a minimalistic set of OMPI information at the OPAL >> >> level. Take >> >> >> a look at opal/util/proc.h for more info, but please try not to expose >> >> more. >> >> >> ___ >> devel mailing listde...@open-mpi.org >> Subscripti
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
>>>>>> name >>>>>>>>> >>>>>>>>> r32357 >>>>>>>>> >>>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < >>>>>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Rolf, >>>>>>>>> >>>>>>>>> r32353 can be seen as a suspect... >>>>>>>>> Even if it is correct, it might have exposed the bug discussed in >>>>>>>>> #4815 >>>>>>>>> even more (e.g. we hit the bug 100% after the fix) >>>>>>>>> >>>>>>>>> does the attached patch to #4815 fixes the problem ? >>>>>>>>> >>>>>>>>> If yes, and if you see this issue as a showstopper, feel free to >>>>>>>>> commit >>>>>>>>> it and drop a note to #4815 >>>>>>>>> ( I am afk until tomorrow) >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Gilles >>>>>>>>> >>>>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote: >>>>>>>>> >>>>>>>>> Just an FYI that my trunk version (r32355) does not work at all >>>>>>>>> anymore >>>>>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from >>>>>>>>> the >>>>>>>>> ibm/pt2pt/send test running on a single node. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> (gdb) where >>>>>>>>> >>>>>>>>> #0 0x7f6c0d1321d0 in ?? () >>>>>>>>> >>>>>>>>> #1 >>>>>>>>> >>>>>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>>>> ../../orte/util/name_fns.c:522 >>>>>>>>> >>>>>>>>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, >>>>>>>>> peer_list=0x7f6c0c0a6748, >>>>>>>>> back_files=0x7f6bf3ffd6c8, >>>>>>>>> >>>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>>>>>> "sm_payload_mem_", map_all=false) at >>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>>>>>> >>>>>>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>>>>> reg_data=0xba28c0) >>>>>>>>> >>>>>>>>> at >>>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>>>>> >>>>>>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>>>>>> (ml_module=0xba5c40) at >>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>>>>> >>>>>>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>>>>>> (ml_module=0xba5c40) at >>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>>>>> >>>>>>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) >>>>>>>>> at >>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>>>>> >>>>>>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>>>>> priority=0x7fffe7991b58) at >>>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>>>>> >>>>>>>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>>&
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
/base/coll_base_comm_select.c:117 > > #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, > requested=0, provided=0x7fffe79922e8) at > ../../ompi/runtime/ompi_mpi_init.c:918 > > #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, > argv=0x7fffe7992340) at pinit.c:84 > > #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at > send.c:32 > > (gdb) up > > #1 > > (gdb) up > > #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 > '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > > 522 if (name1->jobid < name2->jobid) { > > (gdb) print name1 > > $1 = (const orte_process_name_t *) 0x192350001 > > (gdb) print *name1 > > Cannot access memory at address 0x192350001 > > (gdb) print name2 > > $2 = (const orte_process_name_t *) 0xbaf76c > > (gdb) print *name2 > > $3 = {jobid = 2452946945, vpid = 1} > > (gdb) > > > > > > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> > > <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of > Gilles > > > Gouaillardet > Sent: Wednesday, July 30, 2014 2:16 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] trunk compilation errors in jenkins > George, > #4815 is indirectly related to the move : > in bcol/basesmuma, we used to compare ompi_process_name_t, and now > we (try to) compare an ompi_process_name_t and an opal_process_name_t > (which causes a glory SIGSEGV) > i proposed a temporary patch which is both broken and unelegant, could > > you > > > please advise a correct solution ? > Cheers, > Gilles > On 2014/07/27 7:37, George Bosilca wrote: > > If you have any issue with the move, I’ll be happy to help and/or > > support > > > you on your last move toward a completely generic BTL. To facilitate > > your > > > work I exposed a minimalistic set of OMPI information at the OPAL > > level. Take > > > a look at opal/util/proc.h for more info, but please try not to expose > > more. > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open- > > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > > mpi.org/community/lists/devel/2014/07/15348.php > > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> > -- > This email message is for the sole use of the intended recipient(s) > and may contain confidential information. Any unauthorized review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender by reply email and destroy all copies > of the original message. > -- > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this > post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15368.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15446.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15454.php > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15509.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15514.php >
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
;> >>>>>>>> >>>>>>>> (gdb) where >>>>>>>> >>>>>>>> #0 0x7f6c0d1321d0 in ?? () >>>>>>>> >>>>>>>> #1 >>>>>>>> >>>>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>>> ../../orte/util/name_fns.c:522 >>>>>>>> >>>>>>>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, >>>>>>>> peer_list=0x7f6c0c0a6748, >>>>>>>> back_files=0x7f6bf3ffd6c8, >>>>>>>> >>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>>>>> "sm_payload_mem_", map_all=false) at >>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>>>>> >>>>>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>>>> reg_data=0xba28c0) >>>>>>>> >>>>>>>> at >>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>>>> >>>>>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>>>>> (ml_module=0xba5c40) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>>>> >>>>>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>>>>> (ml_module=0xba5c40) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>>>> >>>>>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>>>> >>>>>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>>>> priority=0x7fffe7991b58) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>>>> >>>>>>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>>> >>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>>>>> >>>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>>> >>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>>>>> >>>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>>>>> >>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>>>>> >>>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>>>>> comm=0x6037a0) at >>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>>>>> >>>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>>>>> >>>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>>>>> requested=0, provided=0x7fffe79922e8) at >>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>>>>> >>>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>>>>> >>>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>>>>> send.c:32 >>>>>>>> >>>>>>>> (gdb) up >>>>>>>> >>>>
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
>>>>> >>>>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>>> reg_data=0xba28c0) >>>>>>> >>>>>>> at >>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>>> >>>>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>>>> (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>>> >>>>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>>>> (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>>> >>>>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>>> >>>>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>>> priority=0x7fffe7991b58) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>>> >>>>>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>>>> >>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>>>> >>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>>>> >>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>>>> comm=0x6037a0) at >>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>>>> >>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>>>> >>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>>>> requested=0, provided=0x7fffe79922e8) at >>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>>>> >>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>>>> >>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>>>> send.c:32 >>>>>>> >>>>>>> (gdb) up >>>>>>> >>>>>>> #1 >>>>>>> >>>>>>> (gdb) up >>>>>>> >>>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>> ../../orte/util/name_fns.c:522 >>>>>>> >>>>>>> 522 if (name1->jobid < name2->jobid) { >>>>>>> >>>>>>> (gdb) print name1 >>>>>>> >>>>>>> $1 = (const orte_process_name_t *) 0x192350001 >>>>>>> >>>>>>> (gdb) print *name1 >>>>>>> >>>>>>> Cannot access memory at address 0x192350001 >>>>>>> >>>>>>> (gdb) print name2 >>>>>>> >>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c >>>>>>> >>>>>>> (gdb) print *name2 >>>>>>> >>>>>>> $3 = {jobid = 2452946945, vpid = 1} >>>>>>> >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>&
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
gt;>>>>>> >>>>>>>> (gdb) where >>>>>>>> >>>>>>>> #0 0x7f6c0d1321d0 in ?? () >>>>>>>> >>>>>>>> #1 >>>>>>>> >>>>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>>> ../../orte/util/name_fns.c:522 >>>>>>>> >>>>>>>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, >>>>>>>> peer_list=0x7f6c0c0a6748, >>>>>>>> back_files=0x7f6bf3ffd6c8, >>>>>>>> >>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>>>>> "sm_payload_mem_", map_all=false) at >>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>>>>> >>>>>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>>>> reg_data=0xba28c0) >>>>>>>> >>>>>>>> at >>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>>>> >>>>>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>>>>> (ml_module=0xba5c40) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>>>> >>>>>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>>>>> (ml_module=0xba5c40) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>>>> >>>>>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>>>> >>>>>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>>>> priority=0x7fffe7991b58) at >>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>>>> >>>>>>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>>> >>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>>>>> >>>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>>> >>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>>>>> >>>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>>>>> >>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>>>>> >>>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>>>>> comm=0x6037a0) at >>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>>>>> >>>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>>>>> >>>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>>>>> requested=0, provided=0x7fffe79922e8) at >>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>>>>> >>>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>>>>> >>>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>>>>> send.c:32 >>>>>>>> >>>>>>>> (gdb) up >>>>>>>> >>>>>>>
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
37 >>>>>>> >>>>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>>> reg_data=0xba28c0) >>>>>>> >>>>>>> at >>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>>> >>>>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>>>> (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>>> >>>>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>>>> (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>>> >>>>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>>> >>>>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>>> priority=0x7fffe7991b58) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>>> >>>>>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>>>> >>>>>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>>>> >>>>>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>>>> >>>>>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>>>> comm=0x6037a0) at >>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>>>> >>>>>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>>>> >>>>>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>>>> requested=0, provided=0x7fffe79922e8) at >>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>>>> >>>>>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>>>> >>>>>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>>>> send.c:32 >>>>>>> >>>>>>> (gdb) up >>>>>>> >>>>>>> #1 >>>>>>> >>>>>>> (gdb) up >>>>>>> >>>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>> ../../orte/util/name_fns.c:522 >>>>>>> >>>>>>> 522 if (name1->jobid < name2->jobid) { >>>>>>> >>>>>>> (gdb) print name1 >>>>>>> >>>>>>> $1 = (const orte_process_name_t *) 0x192350001 >>>>>>> >>>>>>> (gdb) print *name1 >>>>>>> >>>>>>> Cannot access memory at address 0x192350001 >>>>>>> >>>>>>> (gdb) print name2 >>>>>>> >>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c >>>>>>> >>>>>>> (gdb) print *name2 >>>>>>> >>>>>>> $3 = {jobid = 2452946945, vpid = 1} >>>>>>> >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
gt;>>> >>>>> was it really that simple ? >>>>> >>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>> typedef opal_identifier_t opal_process_name_t; >>>>> typedef uint64_t opal_identifier_t; >>>>> >>>>> *but* >>>>> >>>>> item_ptr->peer has type orte_process_name_t : >>>>> struct orte_process_name_t { >>>>>orte_jobid_t jobid; >>>>>orte_vpid_t vpid; >>>>> }; >>>>> >>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> >>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>> wrote: >>>>> >>>>>> I just fixed this one - all that was required was an ampersand as the >>>>>> name was being passed into the function instead of a pointer to the name >>>>>> >>>>>> r32357 >>>>>> >>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < >>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>> >>>>>> Rolf, >>>>>> >>>>>> r32353 can be seen as a suspect... >>>>>> Even if it is correct, it might have exposed the bug discussed in #4815 >>>>>> even more (e.g. we hit the bug 100% after the fix) >>>>>> >>>>>> does the attached patch to #4815 fixes the problem ? >>>>>> >>>>>> If yes, and if you see this issue as a showstopper, feel free to commit >>>>>> it and drop a note to #4815 >>>>>> ( I am afk until tomorrow) >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote: >>>>>> >>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore >>>>>> if I do not include "--mca coll ^ml".Here is a stack trace from the >>>>>> ibm/pt2pt/send test running on a single node. >>>>>> >>>>>> >>>>>> >>>>>> (gdb) where >>>>>> >>>>>> #0 0x7f6c0d1321d0 in ?? () >>>>>> >>>>>> #1 >>>>>> >>>>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>> ../../orte/util/name_fns.c:522 >>>>>> >>>>>> #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, >>>>>> peer_list=0x7f6c0c0a6748, >>>>>> back_files=0x7f6bf3ffd6c8, >>>>>> >>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>>> "sm_payload_mem_", map_all=false) at >>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>>> >>>>>> #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>> reg_data=0xba28c0) >>>>>> >>>>>> at >>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>> >>>>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>>>> (ml_module=0xba5c40) at >>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>> >>>>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>>>> (ml_module=0xba5c40) at >>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>> >>>>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>> >>>>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>> priority=0x7fffe7991b58) at >>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>> >>>>>> #9 0x7f6c18cc5b09
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
_module=0x7f6bf3b68040, >>>> reg_data=0xba28c0) >>>> >>>> at >>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>> >>>> #5 0x7f6c0cced386 in mca_coll_ml_register_bcols >>>> (ml_module=0xba5c40) at >>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>> >>>> #6 0x7f6c0cced68f in ml_module_memory_initialization >>>> (ml_module=0xba5c40) at >>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>> >>>> #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>> >>>> #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>> priority=0x7fffe7991b58) at >>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>> >>>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>> >>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>> >>>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>> >>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>> >>>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>> >>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>> >>>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>> comm=0x6037a0) at >>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>> >>>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>> >>>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>> requested=0, provided=0x7fffe79922e8) at >>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>> >>>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>> argv=0x7fffe7992340) at pinit.c:84 >>>> >>>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at >>>> send.c:32 >>>> >>>> (gdb) up >>>> >>>> #1 >>>> >>>> (gdb) up >>>> >>>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>> ../../orte/util/name_fns.c:522 >>>> >>>> 522 if (name1->jobid < name2->jobid) { >>>> >>>> (gdb) print name1 >>>> >>>> $1 = (const orte_process_name_t *) 0x192350001 >>>> >>>> (gdb) print *name1 >>>> >>>> Cannot access memory at address 0x192350001 >>>> >>>> (gdb) print name2 >>>> >>>> $2 = (const orte_process_name_t *) 0xbaf76c >>>> >>>> (gdb) print *name2 >>>> >>>> $3 = {jobid = 2452946945, vpid = 1} >>>> >>>> (gdb) >>>> >>>> >>>> >>>> >>>> >>>> >>>> >-Original Message- >>>> >>>> >From: devel [mailto:devel-boun...@open-mpi.org >>>> <devel-boun...@open-mpi.org>] On Behalf Of Gilles >>>> >>>> >Gouaillardet >>>> >>>> >Sent: Wednesday, July 30, 2014 2:16 AM >>>> >>>> >To: Open MPI Developers >>>> >>>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins >>>> >>>> > >>>> >>>> >George, >>>> >>>> > >>>> >>>> >#4815 is indirectly related to the move : >>>> >>>> > >>>> >>>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >>>> >>>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t >>>> >>>> >(which causes a glory SIGSEGV) >>>> >>>> > >>>> >>>> >i proposed a temporary patch which is both broken and unelegant, could >>>> you >>>> >>>> >please advis
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
=0x6037a0, >>> priority=0x7fffe7991b58) at >>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>> >>> #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>> >>> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, >>> priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>> >>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>> >>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>> >>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>> >>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>> requested=0, provided=0x7fffe79922e8) at >>> ../../ompi/runtime/ompi_mpi_init.c:918 >>> >>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>> argv=0x7fffe7992340) at pinit.c:84 >>> >>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 >>> >>> (gdb) up >>> >>> #1 >>> >>> (gdb) up >>> >>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', >>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >>> >>> 522 if (name1->jobid < name2->jobid) { >>> >>> (gdb) print name1 >>> >>> $1 = (const orte_process_name_t *) 0x192350001 >>> >>> (gdb) print *name1 >>> >>> Cannot access memory at address 0x192350001 >>> >>> (gdb) print name2 >>> >>> $2 = (const orte_process_name_t *) 0xbaf76c >>> >>> (gdb) print *name2 >>> >>> $3 = {jobid = 2452946945, vpid = 1} >>> >>> (gdb) >>> >>> >>> >>> >>> >>> >>> >-Original Message- >>> >>> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles >>> >>> >Gouaillardet >>> >>> >Sent: Wednesday, July 30, 2014 2:16 AM >>> >>> >To: Open MPI Developers >>> >>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins >>> >>> > >>> >>> >George, >>> >>> > >>> >>> >#4815 is indirectly related to the move : >>> >>> > >>> >>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >>> >>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t >>> >>> >(which causes a glory SIGSEGV) >>> >>> > >>> >>> >i proposed a temporary patch which is both broken and unelegant, could you >>> >>> >please advise a correct solution ? >>> >>> > >>> >>> >Cheers, >>> >>> > >>> >>> >Gilles >>> >>> > >>> >>> >On 2014/07/27 7:37, George Bosilca wrote: >>> >>> >> If you have any issue with the move, I’ll be happy to help and/or support >>> >>> >you on your last move toward a completely generic BTL. To facilitate your >>> >>> >work I exposed a minimalistic set of OMPI information at the OPAL level. >>> >Take >>> >>> >a look at opal/util/proc.h for more info, but please try not to expose >>> >more. >>> >>> > >>> >>> >___ >>> >>> >devel mailing list >>> >>> >de...@open-mpi.org >>> >>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >Link to this post: http://www.open- >>> >>> >mpi.org/community/lists/devel/2014/07/15348.php >>> >>> This email message is for the sole use of the intended recipient(s) and may >>> contain confidential information. Any unauthorized review, use, disclosure >>> or distribution is prohibited. If you are not the intended recipient, >>> please contact the sender by reply email and destroy all copies of the >>> original message. >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15364.php > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15365.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15366.php
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
ery (component=0x7f6c0cf50940, >>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>> >>> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>> >>> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>> >>> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>> >>> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>> requested=0, provided=0x7fffe79922e8) at >>> ../../ompi/runtime/ompi_mpi_init.c:918 >>> >>> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>> argv=0x7fffe7992340) at pinit.c:84 >>> >>> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 >>> >>> (gdb) up >>> >>> #1 >>> >>> (gdb) up >>> >>> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >>> >>> 522 if (name1->jobid < name2->jobid) { >>> >>> (gdb) print name1 >>> >>> $1 = (const orte_process_name_t *) 0x192350001 >>> >>> (gdb) print *name1 >>> >>> Cannot access memory at address 0x192350001 >>> >>> (gdb) print name2 >>> >>> $2 = (const orte_process_name_t *) 0xbaf76c >>> >>> (gdb) print *name2 >>> >>> $3 = {jobid = 2452946945, vpid = 1} >>> >>> (gdb) >>> >>> >>> >>> >>> >>> >>> >-Original Message- >>> >>> >From: devel [mailto:devel-boun...@open-mpi.org >>> <devel-boun...@open-mpi.org>] On Behalf Of Gilles >>> >>> >Gouaillardet >>> >>> >Sent: Wednesday, July 30, 2014 2:16 AM >>> >>> >To: Open MPI Developers >>> >>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins >>> >>> > >>> >>> >George, >>> >>> > >>> >>> >#4815 is indirectly related to the move : >>> >>> > >>> >>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >>> >>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t >>> >>> >(which causes a glory SIGSEGV) >>> >>> > >>> >>> >i proposed a temporary patch which is both broken and unelegant, could >>> you >>> >>> >please advise a correct solution ? >>> >>> > >>> >>> >Cheers, >>> >>> > >>> >>> >Gilles >>> >>> > >>> >>> >On 2014/07/27 7:37, George Bosilca wrote: >>> >>> >> If you have any issue with the move, I’ll be happy to help and/or >>> support >>> >>> >you on your last move toward a completely generic BTL. To facilitate >>> your >>> >>> >work I exposed a minimalistic set of OMPI information at the OPAL >>> level. Take >>> >>> >a look at opal/util/proc.h for more info, but please try not to expose >>> more. >>> >>> > >>> >>> >___ >>> >>> >devel mailing list >>> >>> >de...@open-mpi.org >>> >>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >Link to this post: http://www.open- >>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>> >>> >mpi.org/community/lists/devel/2014/07/15348.php >>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>> -- >>> This email message is for the sole use of the intended recipient(s) >>> and may contain confidential information. Any unauthorized review, use, >>> disclosure or distribution is prohibited. If you are not the intended >>> recipient, please contact the sender by reply email and destroy all copies >>> of the original message. >>> -- >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php >>> >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php >>> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php >> > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15364.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15365.php >
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
_coll_base_comm_select (comm=0x6037a0) at >> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >> >> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >> requested=0, provided=0x7fffe79922e8) at >> ../../ompi/runtime/ompi_mpi_init.c:918 >> >> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >> argv=0x7fffe7992340) at pinit.c:84 >> >> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 >> >> (gdb) up >> >> #1 >> >> (gdb) up >> >> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', >> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >> >> 522 if (name1->jobid < name2->jobid) { >> >> (gdb) print name1 >> >> $1 = (const orte_process_name_t *) 0x192350001 >> >> (gdb) print *name1 >> >> Cannot access memory at address 0x192350001 >> >> (gdb) print name2 >> >> $2 = (const orte_process_name_t *) 0xbaf76c >> >> (gdb) print *name2 >> >> $3 = {jobid = 2452946945, vpid = 1} >> >> (gdb) >> >> >> >> >> >> >> >-Original Message- >> >> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles >> >> >Gouaillardet >> >> >Sent: Wednesday, July 30, 2014 2:16 AM >> >> >To: Open MPI Developers >> >> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins >> >> > >> >> >George, >> >> > >> >> >#4815 is indirectly related to the move : >> >> > >> >> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >> >> >we (try to) compare an ompi_process_name_t and an opal_process_name_t >> >> >(which causes a glory SIGSEGV) >> >> > >> >> >i proposed a temporary patch which is both broken and unelegant, could you >> >> >please advise a correct solution ? >> >> > >> >> >Cheers, >> >> > >> >> >Gilles >> >> > >> >> >On 2014/07/27 7:37, George Bosilca wrote: >> >> >> If you have any issue with the move, I’ll be happy to help and/or support >> >> >you on your last move toward a completely generic BTL. To facilitate your >> >> >work I exposed a minimalistic set of OMPI information at the OPAL level. >> >Take >> >> >a look at opal/util/proc.h for more info, but please try not to expose more. >> >> > >> >> >___ >> >> >devel mailing list >> >> >de...@open-mpi.org >> >> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >Link to this post: http://www.open- >> >> >mpi.org/community/lists/devel/2014/07/15348.php >> >> This email message is for the sole use of the intended recipient(s) and may >> contain confidential information. Any unauthorized review, use, disclosure >> or distribution is prohibited. If you are not the intended recipient, >> please contact the sender by reply email and destroy all copies of the >> original message. >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15356.php > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15363.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
it (argc=0x7fffe799234c, >> argv=0x7fffe7992340) at pinit.c:84 >> >> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 >> >> (gdb) up >> >> #1 >> >> (gdb) up >> >> #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 >> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >> >> 522 if (name1->jobid < name2->jobid) { >> >> (gdb) print name1 >> >> $1 = (const orte_process_name_t *) 0x192350001 >> >> (gdb) print *name1 >> >> Cannot access memory at address 0x192350001 >> >> (gdb) print name2 >> >> $2 = (const orte_process_name_t *) 0xbaf76c >> >> (gdb) print *name2 >> >> $3 = {jobid = 2452946945, vpid = 1} >> >> (gdb) >> >> >> >> >> >> >> >> >-Original Message- >> >> >From: devel [mailto:devel-boun...@open-mpi.org >> <devel-boun...@open-mpi.org>] On Behalf Of Gilles >> >> >Gouaillardet >> >> >Sent: Wednesday, July 30, 2014 2:16 AM >> >> >To: Open MPI Developers >> >> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins >> >> > >> >> >George, >> >> > >> >> >#4815 is indirectly related to the move : >> >> > >> >> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >> >> >we (try to) compare an ompi_process_name_t and an opal_process_name_t >> >> >(which causes a glory SIGSEGV) >> >> > >> >> >i proposed a temporary patch which is both broken and unelegant, could >> you >> >> >please advise a correct solution ? >> >> > >> >> >Cheers, >> >> > >> >> >Gilles >> >> > >> >> >On 2014/07/27 7:37, George Bosilca wrote: >> >> >> If you have any issue with the move, I’ll be happy to help and/or >> support >> >> >you on your last move toward a completely generic BTL. To facilitate your >> >> >work I exposed a minimalistic set of OMPI information at the OPAL level. >> Take >> >> >a look at opal/util/proc.h for more info, but please try not to expose >> more. >> >> > >> >> >___ >> >> >devel mailing list >> >> >de...@open-mpi.org >> >> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >Link to this post: http://www.open- >> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >> >> >mpi.org/community/lists/devel/2014/07/15348.php >> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >> -- >> This email message is for the sole use of the intended recipient(s) and >> may contain confidential information. Any unauthorized review, use, >> disclosure or distribution is prohibited. If you are not the intended >> recipient, please contact the sender by reply email and destroy all copies >> of the original message. >> -- >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php >> > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15363.php >
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
Ralph, was it really that simple ? proc_temp->super.proc_name has type opal_process_name_t : typedef opal_identifier_t opal_process_name_t; typedef uint64_t opal_identifier_t; *but* item_ptr->peer has type orte_process_name_t : struct orte_process_name_t { orte_jobid_t jobid; orte_vpid_t vpid; }; bottom line, is r32357 still valid on a big endian arch ? Cheers, Gilles On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote: > I just fixed this one - all that was required was an ampersand as the name > was being passed into the function instead of a pointer to the name > > r32357 > > On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < > gilles.gouaillar...@gmail.com> wrote: > > Rolf, > > r32353 can be seen as a suspect... > Even if it is correct, it might have exposed the bug discussed in #4815 > even more (e.g. we hit the bug 100% after the fix) > > does the attached patch to #4815 fixes the problem ? > > If yes, and if you see this issue as a showstopper, feel free to commit it > and drop a note to #4815 > ( I am afk until tomorrow) > > Cheers, > > Gilles > > Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > > Just an FYI that my trunk version (r32355) does not work at all anymore if > I do not include "--mca coll ^ml".Here is a stack trace from the > ibm/pt2pt/send test running on a single node. > > > > (gdb) where > > #0 0x7f6c0d1321d0 in ?? () > > #1 > > #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', > name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > > #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection > (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, > back_files=0x7f6bf3ffd6c8, > > comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", > map_all=false) at > ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 > > #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti > (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, > reg_data=0xba28c0) > > at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 > > #5 0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) > at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 > > #6 0x7f6c0cced68f in ml_module_memory_initialization > (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 > > #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at > ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 > > #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, > priority=0x7fffe7991b58) at > ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 > > #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, > comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 > > #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, > priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 > > #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, > component=0x7f6c0cf50940, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 > > #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, > comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 > > #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at > ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 > > #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, > requested=0, provided=0x7fffe79922e8) at > ../../ompi/runtime/ompi_mpi_init.c:918 > > #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, > argv=0x7fffe7992340) at pinit.c:84 > > #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 > > (gdb) up > > #1 > > (gdb) up > > #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', > name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > > 522 if (name1->jobid < name2->jobid) { > > (gdb) print name1 > > $1 = (const orte_process_name_t *) 0x192350001 > > (gdb) print *name1 > > Cannot access memory at address 0x192350001 > > (gdb) print name2 > > $2 = (const orte_process_name_t *) 0xbaf76c > > (gdb) print *name2 > > $3 = {jobid = 2452946945, vpid = 1} > > (gdb) > > > > > > > > >-Original Message- > > >From: devel [mailto:devel-boun...
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
Thanks Ralph and Gilles! All is looking good for me again. I think all tests are passing again. Will check results again tomorrow. From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, July 30, 2014 10:49 AM To: Open MPI Developers Subject: Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins I just fixed this one - all that was required was an ampersand as the name was being passed into the function instead of a pointer to the name r32357 On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote: Rolf, r32353 can be seen as a suspect... Even if it is correct, it might have exposed the bug discussed in #4815 even more (e.g. we hit the bug 100% after the fix) does the attached patch to #4815 fixes the problem ? If yes, and if you see this issue as a showstopper, feel free to commit it and drop a note to #4815 ( I am afk until tomorrow) Cheers, Gilles Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote: Just an FYI that my trunk version (r32355) does not work at all anymore if I do not include "--mca coll ^ml".Here is a stack trace from the ibm/pt2pt/send test running on a single node. (gdb) where #0 0x7f6c0d1321d0 in ?? () #1 #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, back_files=0x7f6bf3ffd6c8, comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", map_all=false) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, reg_data=0xba28c0) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 #5 0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 #6 0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, priority=0x7fffe7991b58) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, component=0x7f6c0cf50940, module=0x7fffe7991b90) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918 #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) at pinit.c:84 #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 (gdb) up #1 (gdb) up #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 522 if (name1->jobid < name2->jobid) { (gdb) print name1 $1 = (const orte_process_name_t *) 0x192350001 (gdb) print *name1 Cannot access memory at address 0x192350001 (gdb) print name2 $2 = (const orte_process_name_t *) 0xbaf76c (gdb) print *name2 $3 = {jobid = 2452946945, vpid = 1} (gdb) >-Original Message- >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles >Gouaillardet >Sent: Wednesday, July 30, 2014 2:16 AM >To: Open MPI Developers >Subject: Re: [OMPI devel] trunk compilation errors in jenkins > >George, > >#4815 is indirectly related to the move : > >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >we (try to) compare an ompi_process_name_t and an opal_process_name_t >(which causes a glory SIGSEGV) > >i proposed a temporary patch which is both broken and unelegant, could you >please advise a correct solution ? > >Cheers, > >Gilles > >On 2014/07/
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
I just fixed this one - all that was required was an ampersand as the name was being passed into the function instead of a pointer to the name r32357 On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <gilles.gouaillar...@gmail.com> wrote: > Rolf, > > r32353 can be seen as a suspect... > Even if it is correct, it might have exposed the bug discussed in #4815 even > more (e.g. we hit the bug 100% after the fix) > > does the attached patch to #4815 fixes the problem ? > > If yes, and if you see this issue as a showstopper, feel free to commit it > and drop a note to #4815 > ( I am afk until tomorrow) > > Cheers, > > Gilles > > Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > Just an FYI that my trunk version (r32355) does not work at all anymore if I > do not include "--mca coll ^ml".Here is a stack trace from the > ibm/pt2pt/send test running on a single node. > > > > (gdb) where > > #0 0x7f6c0d1321d0 in ?? () > > #1 > > #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', > name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > > #3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection > (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, > back_files=0x7f6bf3ffd6c8, > > comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", > map_all=false) at > ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 > > #4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti > (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, > reg_data=0xba28c0) > > at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 > > #5 0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at > ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 > > #6 0x7f6c0cced68f in ml_module_memory_initialization > (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 > > #7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at > ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 > > #8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, > priority=0x7fffe7991b58) at > ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 > > #9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, > comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 > > #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, > priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 > > #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, > component=0x7f6c0cf50940, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 > > #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, > comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 > > #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at > ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 > > #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, > requested=0, provided=0x7fffe79922e8) at > ../../ompi/runtime/ompi_mpi_init.c:918 > > #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, > argv=0x7fffe7992340) at pinit.c:84 > > #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 > > (gdb) up > > #1 > > (gdb) up > > #2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', > name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > > 522 if (name1->jobid < name2->jobid) { > > (gdb) print name1 > > $1 = (const orte_process_name_t *) 0x192350001 > > (gdb) print *name1 > > Cannot access memory at address 0x192350001 > > (gdb) print name2 > > $2 = (const orte_process_name_t *) 0xbaf76c > > (gdb) print *name2 > > $3 = {jobid = 2452946945, vpid = 1} > > (gdb) > > > > > > > > >-Original Message- > > >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles > > >Gouaillardet > > >Sent: Wednesday, July 30, 2014 2:16 AM > > >To: Open MPI Developers > > >Subject: Re: [OMPI devel] trunk compilation errors in jenkins > > > > > >George, > > > > > >#4815 is indirectly related to the move : > > > > > >in bcol/basesmuma, we used to compare ompi_process_n
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
Rolf, r32353 can be seen as a suspect... Even if it is correct, it might have exposed the bug discussed in #4815 even more (e.g. we hit the bug 100% after the fix) does the attached patch to #4815 fixes the problem ? If yes, and if you see this issue as a showstopper, feel free to commit it and drop a note to #4815 ( I am afk until tomorrow) Cheers, Gilles Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > > >Just an FYI that my trunk version (r32355) does not work at all anymore if I >do not include "--mca coll ^ml". Here is a stack trace from the >ibm/pt2pt/send test running on a single node. > > > >(gdb) where > >#0 0x7f6c0d1321d0 in ?? () > >#1 > >#2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', >name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > >#3 0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >back_files=0x7f6bf3ffd6c8, > >comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", >map_all=false) at >../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 > >#4 0x7f6c0be98307 in bcol_basesmuma_bank_init_opti >(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >reg_data=0xba28c0) > > at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 > >#5 0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at >../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 > >#6 0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) >at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 > >#7 0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 > >#8 0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >priority=0x7fffe7991b58) at >../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 > >#9 0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 > >#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, >priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 > >#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, >component=0x7f6c0cf50940, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 > >#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 > >#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 > >#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918 > >#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) >at pinit.c:84 > >#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 > >(gdb) up > >#1 > >(gdb) up > >#2 0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', >name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > >522 if (name1->jobid < name2->jobid) { > >(gdb) print name1 > >$1 = (const orte_process_name_t *) 0x192350001 > >(gdb) print *name1 > >Cannot access memory at address 0x192350001 > >(gdb) print name2 > >$2 = (const orte_process_name_t *) 0xbaf76c > >(gdb) print *name2 > >$3 = {jobid = 2452946945, vpid = 1} > >(gdb) > > > > > > > >>-Original Message- > >>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles > >>Gouaillardet > >>Sent: Wednesday, July 30, 2014 2:16 AM > >>To: Open MPI Developers > >>Subject: Re: [OMPI devel] trunk compilation errors in jenkins > >> > > > >>George, > >> > > > >>#4815 is indirectly related to the move : > >> > > > >>in bcol/basesmuma, we used to compare ompi_process_name_t, and now > >>we (try to) compare an ompi_process_name_t and an opal_process_name_t > >>(which causes a glory SIGSEGV) > >> > > > >>i proposed a temporary patch which is both broken and unelegant, could you > >>please advise a correct solution ? > >> > > > >>Cheers, > >> > > > >>Gilles > >> > > > >