[hwloc-devel] nightly builds failed

2013-09-20 Thread Jeff Squyres (jsquyres)
IU moved the nightly build cron jobs to a new machine today, and they failed.  
I'm manually running the build cron jobs on the old build machine (eddie) right 
now.

I've alerted IU to what I think the error was in the move; hopefully they'll be 
able to fix it over the weekend.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Ralph Castain
Hmmm...nope, not a peep (no extra output at all). Just segfaulted like before.

On Sep 20, 2013, at 4:06 PM, Brice Goglin  wrote:

> Try adding HWLOC_DEBUG_CHECK=1 in your environment, it will enable many 
> assertions at the end of hwloc_topology_load()
> 
> Brice
> 
> 
> 
> Le 21/09/2013 01:03, Ralph Castain a écrit :
>> I didn't try loading it with lstopo - just tried the OMPI trunk. It loads 
>> okay, but segfaults when you try to find an object by depth
>> 
>> #0  0x0001005fe5dc in opal_hwloc172_hwloc_get_obj_by_depth 
>> (topology=Cannot access memory at address 0xfff7
>> ) at traversal.c:623
>> #1  0x000100b6dfaa in opal_hwloc172_hwloc_get_root_obj (topology=Cannot 
>> access memory at address 0xfff7
>> ) at rmaps_rr_mappers.c:747
>> #2  0x000100b6e139 in orte_rmaps_rr_byslot (jdata=Cannot access memory 
>> at address 0xff77
>> ) at rmaps_rr_mappers.c:774
>> #3  0x000100b6d6da in orte_rmaps_rr_map (jdata=Cannot access memory at 
>> address 0xff17
>> ) at rmaps_rr.c:211
>> #4  0x000100353098 in orte_rmaps_base_map_job (fd=Cannot access memory 
>> at address 0xfe7b
>> ) at base/rmaps_base_map_job.c:320
>> #5  0x0001005ce28c in event_process_active_single_queue (base=Cannot 
>> access memory at address 0xffe7
>> ) at event.c:1367
>> #6  0x0001005ce500 in event_process_active (base=Cannot access memory at 
>> address 0xffe7
>> ) at event.c:1437
>> #7  0x0001005ceb71 in opal_libevent2021_event_base_loop (base=Cannot 
>> access memory at address 0xffb7
>> ) at event.c:1645
>> #8  0x0001002c5158 in orterun (argc=Cannot access memory at address 
>> 0xfd1b
>> ) at orterun.c:3039
>> #9  0x0001002c32a4 in main (argc=Cannot access memory at address 
>> 0xfffb
>> ) at main.c:14
>> 
>> Looks to me like memory may be getting hosed
>> 
>> 
>> On Sep 20, 2013, at 2:59 PM, Brice Goglin  wrote:
>> 
>>> I can't see any segfault. Where does the segfault occurs for you? In OMPI 
>>> only (or lstopo too)? When loading or when using the topology?
>>> 
>>> I tried lstopo on that file with and without HWLOC_NO_LIBXML_IMPORT=1 (in 
>>> case the bug is in one of XML backends), looks ok.
>>> 
>>> Brice
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Le 20/09/2013 23:53, Ralph Castain a écrit :
 Here are the two files I tried - not from the same machine. The foo.xml 
 works, the topo.xml segfaults
 
 
 
 
 One of our users reported it from their machine, but I don't have their 
 topo file.
 
 On Sep 20, 2013, at 2:41 PM, Brice Goglin  wrote:
 
> Hello,
> I don't see anything reason for such an incompatibility. But there are
> many combinations, we can't test everything.
> I can't reproduce that on my machines. Can you send the XML output of
> both versions on one of your machines?
> Brice
> 
> 
> 
> Le 20/09/2013 23:32, Ralph Castain a écrit :
>> Hi folks
>> 
>> I've run across a rather strange behavior. We have two branches in OMPI 
>> - the devel trunk (using hwloc v1.7.2) and our feature release series 
>> (using hwloc 1.5.2). I have found the following:
>> 
>> *the feature series can correctly load an xml file generated by lstopo 
>> of versions 1.5 or greater
>> 
>> * the devel series can correctly load an xml file generated by lstopo of 
>> versions 1.7 or greater, but not files generated by prior versions. In 
>> the latter case, I segfault as soon as I try to use the loaded topology.
>> 
>> Any ideas why the discrepancy? Can I at least detect the version used to 
>> create a file when loading it so I can error out instead of segfaulting?
>> 
>> Ralph
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
 
 
 ___
 hwloc-devel mailing list
 hwloc-de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>> 
>>> ___
>>> hwloc-devel mailing list
>>> hwloc-de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>> 
>> 
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Brice Goglin
Try adding HWLOC_DEBUG_CHECK=1 in your environment, it will enable many
assertions at the end of hwloc_topology_load()

Brice



Le 21/09/2013 01:03, Ralph Castain a écrit :
> I didn't try loading it with lstopo - just tried the OMPI trunk. It
> loads okay, but segfaults when you try to find an object by depth
>
> #0  0x0001005fe5dc in opal_hwloc172_hwloc_get_obj_by_depth
> (topology=Cannot access memory at address 0xfff7
> ) at traversal.c:623
> #1  0x000100b6dfaa in opal_hwloc172_hwloc_get_root_obj
> (topology=Cannot access memory at address 0xfff7
> ) at rmaps_rr_mappers.c:747
> #2  0x000100b6e139 in orte_rmaps_rr_byslot (jdata=Cannot access
> memory at address 0xff77
> ) at rmaps_rr_mappers.c:774
> #3  0x000100b6d6da in orte_rmaps_rr_map (jdata=Cannot access
> memory at address 0xff17
> ) at rmaps_rr.c:211
> #4  0x000100353098 in orte_rmaps_base_map_job (fd=Cannot access
> memory at address 0xfe7b
> ) at base/rmaps_base_map_job.c:320
> #5  0x0001005ce28c in event_process_active_single_queue
> (base=Cannot access memory at address 0xffe7
> ) at event.c:1367
> #6  0x0001005ce500 in event_process_active (base=Cannot access
> memory at address 0xffe7
> ) at event.c:1437
> #7  0x0001005ceb71 in opal_libevent2021_event_base_loop
> (base=Cannot access memory at address 0xffb7
> ) at event.c:1645
> #8  0x0001002c5158 in orterun (argc=Cannot access memory at
> address 0xfd1b
> ) at orterun.c:3039
> #9  0x0001002c32a4 in main (argc=Cannot access memory at address
> 0xfffb
> ) at main.c:14
>
> Looks to me like memory may be getting hosed
>
>
> On Sep 20, 2013, at 2:59 PM, Brice Goglin  > wrote:
>
>> I can't see any segfault. Where does the segfault occurs for you? In
>> OMPI only (or lstopo too)? When loading or when using the topology?
>>
>> I tried lstopo on that file with and without HWLOC_NO_LIBXML_IMPORT=1
>> (in case the bug is in one of XML backends), looks ok.
>>
>> Brice
>>
>>
>>
>>
>>
>> Le 20/09/2013 23:53, Ralph Castain a écrit :
>>> Here are the two files I tried - not from the same machine. The foo.xml 
>>> works, the topo.xml segfaults
>>>
>>>
>>>
>>>
>>> One of our users reported it from their machine, but I don't have their 
>>> topo file.
>>>
>>> On Sep 20, 2013, at 2:41 PM, Brice Goglin  wrote:
>>>
 Hello,
 I don't see anything reason for such an incompatibility. But there are
 many combinations, we can't test everything.
 I can't reproduce that on my machines. Can you send the XML output of
 both versions on one of your machines?
 Brice



 Le 20/09/2013 23:32, Ralph Castain a écrit :
> Hi folks
>
> I've run across a rather strange behavior. We have two branches in OMPI - 
> the devel trunk (using hwloc v1.7.2) and our feature release series 
> (using hwloc 1.5.2). I have found the following:
>
> *the feature series can correctly load an xml file generated by lstopo of 
> versions 1.5 or greater
>
> * the devel series can correctly load an xml file generated by lstopo of 
> versions 1.7 or greater, but not files generated by prior versions. In 
> the latter case, I segfault as soon as I try to use the loaded topology.
>
> Any ideas why the discrepancy? Can I at least detect the version used to 
> create a file when loading it so I can error out instead of segfaulting?
>
> Ralph
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
 ___
 hwloc-devel mailing list
 hwloc-de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>>
>>>
>>> ___
>>> hwloc-devel mailing list
>>> hwloc-de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>
>
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Ralph Castain
I didn't try loading it with lstopo - just tried the OMPI trunk. It loads okay, 
but segfaults when you try to find an object by depth

#0  0x0001005fe5dc in opal_hwloc172_hwloc_get_obj_by_depth (topology=Cannot 
access memory at address 0xfff7
) at traversal.c:623
#1  0x000100b6dfaa in opal_hwloc172_hwloc_get_root_obj (topology=Cannot 
access memory at address 0xfff7
) at rmaps_rr_mappers.c:747
#2  0x000100b6e139 in orte_rmaps_rr_byslot (jdata=Cannot access memory at 
address 0xff77
) at rmaps_rr_mappers.c:774
#3  0x000100b6d6da in orte_rmaps_rr_map (jdata=Cannot access memory at 
address 0xff17
) at rmaps_rr.c:211
#4  0x000100353098 in orte_rmaps_base_map_job (fd=Cannot access memory at 
address 0xfe7b
) at base/rmaps_base_map_job.c:320
#5  0x0001005ce28c in event_process_active_single_queue (base=Cannot access 
memory at address 0xffe7
) at event.c:1367
#6  0x0001005ce500 in event_process_active (base=Cannot access memory at 
address 0xffe7
) at event.c:1437
#7  0x0001005ceb71 in opal_libevent2021_event_base_loop (base=Cannot access 
memory at address 0xffb7
) at event.c:1645
#8  0x0001002c5158 in orterun (argc=Cannot access memory at address 
0xfd1b
) at orterun.c:3039
#9  0x0001002c32a4 in main (argc=Cannot access memory at address 
0xfffb
) at main.c:14

Looks to me like memory may be getting hosed


On Sep 20, 2013, at 2:59 PM, Brice Goglin  wrote:

> I can't see any segfault. Where does the segfault occurs for you? In OMPI 
> only (or lstopo too)? When loading or when using the topology?
> 
> I tried lstopo on that file with and without HWLOC_NO_LIBXML_IMPORT=1 (in 
> case the bug is in one of XML backends), looks ok.
> 
> Brice
> 
> 
> 
> 
> 
> Le 20/09/2013 23:53, Ralph Castain a écrit :
>> Here are the two files I tried - not from the same machine. The foo.xml 
>> works, the topo.xml segfaults
>> 
>> 
>> 
>> 
>> 
>> One of our users reported it from their machine, but I don't have their topo 
>> file.
>> 
>> On Sep 20, 2013, at 2:41 PM, Brice Goglin  wrote:
>> 
>>> Hello,
>>> I don't see anything reason for such an incompatibility. But there are
>>> many combinations, we can't test everything.
>>> I can't reproduce that on my machines. Can you send the XML output of
>>> both versions on one of your machines?
>>> Brice
>>> 
>>> 
>>> 
>>> Le 20/09/2013 23:32, Ralph Castain a écrit :
 Hi folks
 
 I've run across a rather strange behavior. We have two branches in OMPI - 
 the devel trunk (using hwloc v1.7.2) and our feature release series (using 
 hwloc 1.5.2). I have found the following:
 
 *the feature series can correctly load an xml file generated by lstopo of 
 versions 1.5 or greater
 
 * the devel series can correctly load an xml file generated by lstopo of 
 versions 1.7 or greater, but not files generated by prior versions. In the 
 latter case, I segfault as soon as I try to use the loaded topology.
 
 Any ideas why the discrepancy? Can I at least detect the version used to 
 create a file when loading it so I can error out instead of segfaulting?
 
 Ralph
 
 ___
 hwloc-devel mailing list
 hwloc-de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>> ___
>>> hwloc-devel mailing list
>>> hwloc-de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>> 
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Brice Goglin
I can't see any segfault. Where does the segfault occurs for you? In
OMPI only (or lstopo too)? When loading or when using the topology?

I tried lstopo on that file with and without HWLOC_NO_LIBXML_IMPORT=1
(in case the bug is in one of XML backends), looks ok.

Brice





Le 20/09/2013 23:53, Ralph Castain a écrit :
> Here are the two files I tried - not from the same machine. The foo.xml 
> works, the topo.xml segfaults
>
>
>
>
>
> One of our users reported it from their machine, but I don't have their topo 
> file.
>
> On Sep 20, 2013, at 2:41 PM, Brice Goglin  wrote:
>
>> Hello,
>> I don't see anything reason for such an incompatibility. But there are
>> many combinations, we can't test everything.
>> I can't reproduce that on my machines. Can you send the XML output of
>> both versions on one of your machines?
>> Brice
>>
>>
>>
>> Le 20/09/2013 23:32, Ralph Castain a écrit :
>>> Hi folks
>>>
>>> I've run across a rather strange behavior. We have two branches in OMPI - 
>>> the devel trunk (using hwloc v1.7.2) and our feature release series (using 
>>> hwloc 1.5.2). I have found the following:
>>>
>>> *the feature series can correctly load an xml file generated by lstopo of 
>>> versions 1.5 or greater
>>>
>>> * the devel series can correctly load an xml file generated by lstopo of 
>>> versions 1.7 or greater, but not files generated by prior versions. In the 
>>> latter case, I segfault as soon as I try to use the loaded topology.
>>>
>>> Any ideas why the discrepancy? Can I at least detect the version used to 
>>> create a file when loading it so I can error out instead of segfaulting?
>>>
>>> Ralph
>>>
>>> ___
>>> hwloc-devel mailing list
>>> hwloc-de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Ralph Castain
Here are the two files I tried - not from the same machine. The foo.xml works, 
the topo.xml segfaults




topo.xml
Description: XML document


foo.xml
Description: XML document


One of our users reported it from their machine, but I don't have their topo 
file.

On Sep 20, 2013, at 2:41 PM, Brice Goglin  wrote:

> Hello,
> I don't see anything reason for such an incompatibility. But there are
> many combinations, we can't test everything.
> I can't reproduce that on my machines. Can you send the XML output of
> both versions on one of your machines?
> Brice
> 
> 
> 
> Le 20/09/2013 23:32, Ralph Castain a écrit :
>> Hi folks
>> 
>> I've run across a rather strange behavior. We have two branches in OMPI - 
>> the devel trunk (using hwloc v1.7.2) and our feature release series (using 
>> hwloc 1.5.2). I have found the following:
>> 
>> *the feature series can correctly load an xml file generated by lstopo of 
>> versions 1.5 or greater
>> 
>> * the devel series can correctly load an xml file generated by lstopo of 
>> versions 1.7 or greater, but not files generated by prior versions. In the 
>> latter case, I segfault as soon as I try to use the loaded topology.
>> 
>> Any ideas why the discrepancy? Can I at least detect the version used to 
>> create a file when loading it so I can error out instead of segfaulting?
>> 
>> Ralph
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Brice Goglin
Hello,
I don't see anything reason for such an incompatibility. But there are
many combinations, we can't test everything.
I can't reproduce that on my machines. Can you send the XML output of
both versions on one of your machines?
Brice



Le 20/09/2013 23:32, Ralph Castain a écrit :
> Hi folks
>
> I've run across a rather strange behavior. We have two branches in OMPI - the 
> devel trunk (using hwloc v1.7.2) and our feature release series (using hwloc 
> 1.5.2). I have found the following:
>
> *the feature series can correctly load an xml file generated by lstopo of 
> versions 1.5 or greater
>
> * the devel series can correctly load an xml file generated by lstopo of 
> versions 1.7 or greater, but not files generated by prior versions. In the 
> latter case, I segfault as soon as I try to use the loaded topology.
>
> Any ideas why the discrepancy? Can I at least detect the version used to 
> create a file when loading it so I can error out instead of segfaulting?
>
> Ralph
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



[hwloc-devel] xml file load incompatibilities

2013-09-20 Thread Ralph Castain
Hi folks

I've run across a rather strange behavior. We have two branches in OMPI - the 
devel trunk (using hwloc v1.7.2) and our feature release series (using hwloc 
1.5.2). I have found the following:

*the feature series can correctly load an xml file generated by lstopo of 
versions 1.5 or greater

* the devel series can correctly load an xml file generated by lstopo of 
versions 1.7 or greater, but not files generated by prior versions. In the 
latter case, I segfault as soon as I try to use the loaded topology.

Any ideas why the discrepancy? Can I at least detect the version used to create 
a file when loading it so I can error out instead of segfaulting?

Ralph