Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Ralph Castain
Thanks!

On Sep 24, 2011, at 2:18 PM, Brice Goglin wrote:

> I fixed one parsing bug in commit 3660 on the v1.2-ompi branch. Things
> should work better now.
> 
> Parsing XML distance matrices was broken when the XML file came from the
> no-libxml exporter. That's why you had problems on your dual-amd machine
> (those have distance matrices) and not on your mac (single processor, no
> distances, no bug).
> 
> The v1.2 branch doesn't report parsing failure well, so it just crashed.
> Trunk exits with an error instead of crashing.
> 
> Brice
> 
> 
> 
> 
> Le 24/09/2011 20:37, Ralph Castain a écrit :
>> Yep, it fails. Runs on my Mac, but not under Linux.
>> 
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x2acdbedd in hwloc_bitmap_snprintf () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> (gdb) where
>> #0  0x2acdbedd in hwloc_bitmap_snprintf () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #1  0x2acdc060 in hwloc_bitmap_asprintf () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #2  0x2acd9b34 in hwloc__xml_export_object () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #3  0x2acda325 in hwloc___nolibxml_prepare_export () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #4  0x2acda39c in hwloc__nolibxml_prepare_export () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #5  0x2acda4be in hwloc_topology_export_xmlbuffer () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #6  0x004009b8 in main () at xmlbuffer.c:31
>> 
>> On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote:
>> 
>>> Indeed, this object contains invalid pointers.
>>> 
>>> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does
>>> export+import+export+compare on the same machine. It would be good to
>>> know if it fails on one of the machines you're using here.
>>> 
>>> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt
>>> 
>>> thanks
>>> Brice
>>> 
>>> 
>>> 
>>> Le 24/09/2011 17:07, Ralph Castain a écrit :
 FWIW: I tried just printing out the contents of that root object 
 immediately after importing the xml, and it clearly has a problem:
 
 (gdb) print *obj
 $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 
 , memory = {
   total_memory = 46912502995240, local_memory = 46912502995240, 
 page_types_len = 0, page_types = 0x0}, attr = 0x2, 
 depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, 
 prev_cousin = 0x, parent = 0x0, 
 sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, 
 children = 0x2b139738, 
 first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 
 0x0, complete_cpuset = 0x0, 
 online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, 
 complete_nodeset = 0x644c90, 
 allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 
 690, infos = 0x0, infos_count = 0}
 
 
 On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote:
 
> Here's the trace:
> 
> #0  0x2ae61737 in hwloc__xml_export_object 
> (output=0x7fffd890, topology=0x695f10, obj=0x2b139b28)
>  at topology-xml.c:1094
> #1  0x2ae61b69 in hwloc___nolibxml_prepare_export 
> (topology=0x695f10, 
>  xmlbuffer=0x698a70 " encoding=\"UTF-8\"?>\n \"hwloc.dtd\">\n\n   os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" 
> complete_cpuset=\"0xf...f\" onl"..., 
>  buflen=16384) at topology-xml.c:1193
> #2  0x2ae61be0 in hwloc__nolibxml_prepare_export 
> (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c)
>  at topology-xml.c:1207
> #3  0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer 
> (topology=0x695f10, xmlbuffer=0x7fffd988, 
>  buflen=0x7fffd97c) at topology-xml.c:1281
> #4  0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, 
> topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183
> #5  0x2adf348c in opal_dss_compare (value1=0x695f10, 
> value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39
> #6  0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, 
> data=0x6444d0) at base/plm_base_launch_support.c:564
> #7  0x2ae3881f in event_process_active_single_queue 
> (base=0x60dd60, activeq=0x6111e0) at event.c:1329
> #8  0x2ae38c71 in event_process_active (base=0x60dd60) at 
> event.c:1396
> #9  0x2ae3902b in opal_libevent2012_event_base_loop 
> (base=0x60dd60, flags=1) at event.c:1598
> #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189
> #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) 
> at base/plm_base_launch_support.c:666
> #12 0x2ada49e1 in 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Brice Goglin
I fixed one parsing bug in commit 3660 on the v1.2-ompi branch. Things
should work better now.

Parsing XML distance matrices was broken when the XML file came from the
no-libxml exporter. That's why you had problems on your dual-amd machine
(those have distance matrices) and not on your mac (single processor, no
distances, no bug).

The v1.2 branch doesn't report parsing failure well, so it just crashed.
Trunk exits with an error instead of crashing.

Brice




Le 24/09/2011 20:37, Ralph Castain a écrit :
> Yep, it fails. Runs on my Mac, but not under Linux.
>
> Program terminated with signal 11, Segmentation fault.
> #0  0x2acdbedd in hwloc_bitmap_snprintf () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> (gdb) where
> #0  0x2acdbedd in hwloc_bitmap_snprintf () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> #1  0x2acdc060 in hwloc_bitmap_asprintf () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> #2  0x2acd9b34 in hwloc__xml_export_object () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> #3  0x2acda325 in hwloc___nolibxml_prepare_export () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> #4  0x2acda39c in hwloc__nolibxml_prepare_export () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> #5  0x2acda4be in hwloc_topology_export_xmlbuffer () from 
> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
> #6  0x004009b8 in main () at xmlbuffer.c:31
>
> On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote:
>
>> Indeed, this object contains invalid pointers.
>>
>> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does
>> export+import+export+compare on the same machine. It would be good to
>> know if it fails on one of the machines you're using here.
>>
>> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt
>>
>> thanks
>> Brice
>>
>>
>>
>> Le 24/09/2011 17:07, Ralph Castain a écrit :
>>> FWIW: I tried just printing out the contents of that root object 
>>> immediately after importing the xml, and it clearly has a problem:
>>>
>>> (gdb) print *obj
>>> $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 
>>> , memory = {
>>>total_memory = 46912502995240, local_memory = 46912502995240, 
>>> page_types_len = 0, page_types = 0x0}, attr = 0x2, 
>>>  depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, 
>>> prev_cousin = 0x, parent = 0x0, 
>>>  sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, 
>>> children = 0x2b139738, 
>>>  first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 
>>> 0x0, complete_cpuset = 0x0, 
>>>  online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, 
>>> complete_nodeset = 0x644c90, 
>>>  allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 
>>> 690, infos = 0x0, infos_count = 0}
>>>
>>>
>>> On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote:
>>>
 Here's the trace:

 #0  0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, 
 topology=0x695f10, obj=0x2b139b28)
   at topology-xml.c:1094
 #1  0x2ae61b69 in hwloc___nolibxml_prepare_export 
 (topology=0x695f10, 
   xmlbuffer=0x698a70 ">>> encoding=\"UTF-8\"?>\n>>> \"hwloc.dtd\">\n\n  >>> os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" 
 complete_cpuset=\"0xf...f\" onl"..., 
   buflen=16384) at topology-xml.c:1193
 #2  0x2ae61be0 in hwloc__nolibxml_prepare_export 
 (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c)
   at topology-xml.c:1207
 #3  0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer 
 (topology=0x695f10, xmlbuffer=0x7fffd988, 
   buflen=0x7fffd97c) at topology-xml.c:1281
 #4  0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, 
 topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183
 #5  0x2adf348c in opal_dss_compare (value1=0x695f10, 
 value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39
 #6  0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, 
 data=0x6444d0) at base/plm_base_launch_support.c:564
 #7  0x2ae3881f in event_process_active_single_queue 
 (base=0x60dd60, activeq=0x6111e0) at event.c:1329
 #8  0x2ae38c71 in event_process_active (base=0x60dd60) at 
 event.c:1396
 #9  0x2ae3902b in opal_libevent2012_event_base_loop 
 (base=0x60dd60, flags=1) at event.c:1598
 #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189
 #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at 
 base/plm_base_launch_support.c:666
 #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at 
 plm_slurm_module.c:404
 #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at 
 orterun.c:817
 #14 0x00402aa3 in main 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Ralph Castain
This is 1.2-ompi, running on Linux 2.6.18-274.el5 on x86_64

$ uname -a
Linux xxx 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 
x86_64 GNU/Linux


On Sep 24, 2011, at 12:43 PM, Brice Goglin wrote:

> What platform and distribution do you have?
> 
> Brice
> 
> 
> 
> Le 24/09/2011 20:37, Ralph Castain a écrit :
>> Yep, it fails. Runs on my Mac, but not under Linux.
>> 
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x2acdbedd in hwloc_bitmap_snprintf () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> (gdb) where
>> #0  0x2acdbedd in hwloc_bitmap_snprintf () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #1  0x2acdc060 in hwloc_bitmap_asprintf () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #2  0x2acd9b34 in hwloc__xml_export_object () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #3  0x2acda325 in hwloc___nolibxml_prepare_export () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #4  0x2acda39c in hwloc__nolibxml_prepare_export () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #5  0x2acda4be in hwloc_topology_export_xmlbuffer () from 
>> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
>> #6  0x004009b8 in main () at xmlbuffer.c:31
>> 
>> On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote:
>> 
>>> Indeed, this object contains invalid pointers.
>>> 
>>> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does
>>> export+import+export+compare on the same machine. It would be good to
>>> know if it fails on one of the machines you're using here.
>>> 
>>> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt
>>> 
>>> thanks
>>> Brice
>>> 
>>> 
>>> 
>>> Le 24/09/2011 17:07, Ralph Castain a écrit :
 FWIW: I tried just printing out the contents of that root object 
 immediately after importing the xml, and it clearly has a problem:
 
 (gdb) print *obj
 $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 
 , memory = {
   total_memory = 46912502995240, local_memory = 46912502995240, 
 page_types_len = 0, page_types = 0x0}, attr = 0x2, 
 depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, 
 prev_cousin = 0x, parent = 0x0, 
 sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, 
 children = 0x2b139738, 
 first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 
 0x0, complete_cpuset = 0x0, 
 online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, 
 complete_nodeset = 0x644c90, 
 allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 
 690, infos = 0x0, infos_count = 0}
 
 
 On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote:
 
> Here's the trace:
> 
> #0  0x2ae61737 in hwloc__xml_export_object 
> (output=0x7fffd890, topology=0x695f10, obj=0x2b139b28)
>  at topology-xml.c:1094
> #1  0x2ae61b69 in hwloc___nolibxml_prepare_export 
> (topology=0x695f10, 
>  xmlbuffer=0x698a70 " encoding=\"UTF-8\"?>\n \"hwloc.dtd\">\n\n   os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" 
> complete_cpuset=\"0xf...f\" onl"..., 
>  buflen=16384) at topology-xml.c:1193
> #2  0x2ae61be0 in hwloc__nolibxml_prepare_export 
> (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c)
>  at topology-xml.c:1207
> #3  0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer 
> (topology=0x695f10, xmlbuffer=0x7fffd988, 
>  buflen=0x7fffd97c) at topology-xml.c:1281
> #4  0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, 
> topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183
> #5  0x2adf348c in opal_dss_compare (value1=0x695f10, 
> value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39
> #6  0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, 
> data=0x6444d0) at base/plm_base_launch_support.c:564
> #7  0x2ae3881f in event_process_active_single_queue 
> (base=0x60dd60, activeq=0x6111e0) at event.c:1329
> #8  0x2ae38c71 in event_process_active (base=0x60dd60) at 
> event.c:1396
> #9  0x2ae3902b in opal_libevent2012_event_base_loop 
> (base=0x60dd60, flags=1) at event.c:1598
> #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189
> #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) 
> at base/plm_base_launch_support.c:666
> #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at 
> plm_slurm_module.c:404
> #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at 
> orterun.c:817
> #14 0x00402aa3 in main (argc=4, argv=0x7fffe1d8) at main.c:13
> 
> And the error report
> 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Ralph Castain
Yep, it fails. Runs on my Mac, but not under Linux.

Program terminated with signal 11, Segmentation fault.
#0  0x2acdbedd in hwloc_bitmap_snprintf () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
(gdb) where
#0  0x2acdbedd in hwloc_bitmap_snprintf () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
#1  0x2acdc060 in hwloc_bitmap_asprintf () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
#2  0x2acd9b34 in hwloc__xml_export_object () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
#3  0x2acda325 in hwloc___nolibxml_prepare_export () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
#4  0x2acda39c in hwloc__nolibxml_prepare_export () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
#5  0x2acda4be in hwloc_topology_export_xmlbuffer () from 
/nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3
#6  0x004009b8 in main () at xmlbuffer.c:31

On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote:

> Indeed, this object contains invalid pointers.
> 
> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does
> export+import+export+compare on the same machine. It would be good to
> know if it fails on one of the machines you're using here.
> 
> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt
> 
> thanks
> Brice
> 
> 
> 
> Le 24/09/2011 17:07, Ralph Castain a écrit :
>> FWIW: I tried just printing out the contents of that root object immediately 
>> after importing the xml, and it clearly has a problem:
>> 
>> (gdb) print *obj
>> $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 
>> , memory = {
>>total_memory = 46912502995240, local_memory = 46912502995240, 
>> page_types_len = 0, page_types = 0x0}, attr = 0x2, 
>>  depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, 
>> prev_cousin = 0x, parent = 0x0, 
>>  sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, 
>> children = 0x2b139738, 
>>  first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 
>> 0x0, complete_cpuset = 0x0, 
>>  online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, 
>> complete_nodeset = 0x644c90, 
>>  allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 
>> 690, infos = 0x0, infos_count = 0}
>> 
>> 
>> On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote:
>> 
>>> Here's the trace:
>>> 
>>> #0  0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, 
>>> topology=0x695f10, obj=0x2b139b28)
>>>   at topology-xml.c:1094
>>> #1  0x2ae61b69 in hwloc___nolibxml_prepare_export 
>>> (topology=0x695f10, 
>>>   xmlbuffer=0x698a70 "\n>> topology SYSTEM \"hwloc.dtd\">\n\n  >> os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" 
>>> complete_cpuset=\"0xf...f\" onl"..., 
>>>   buflen=16384) at topology-xml.c:1193
>>> #2  0x2ae61be0 in hwloc__nolibxml_prepare_export 
>>> (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c)
>>>   at topology-xml.c:1207
>>> #3  0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer 
>>> (topology=0x695f10, xmlbuffer=0x7fffd988, 
>>>   buflen=0x7fffd97c) at topology-xml.c:1281
>>> #4  0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, 
>>> topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183
>>> #5  0x2adf348c in opal_dss_compare (value1=0x695f10, 
>>> value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39
>>> #6  0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, 
>>> data=0x6444d0) at base/plm_base_launch_support.c:564
>>> #7  0x2ae3881f in event_process_active_single_queue (base=0x60dd60, 
>>> activeq=0x6111e0) at event.c:1329
>>> #8  0x2ae38c71 in event_process_active (base=0x60dd60) at 
>>> event.c:1396
>>> #9  0x2ae3902b in opal_libevent2012_event_base_loop (base=0x60dd60, 
>>> flags=1) at event.c:1598
>>> #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189
>>> #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at 
>>> base/plm_base_launch_support.c:666
>>> #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at 
>>> plm_slurm_module.c:404
>>> #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at 
>>> orterun.c:817
>>> #14 0x00402aa3 in main (argc=4, argv=0x7fffe1d8) at main.c:13
>>> 
>>> And the error report
>>> 
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, 
>>> topology=0x695f10, obj=0x2b139b28)
>>>   at topology-xml.c:1094
>>> 1094sprintf(tmp, "%llu", (unsigned long long) 
>>> obj->memory.page_types[i].count);
>>> (gdb) print obj
>>> $1 = (opal_hwloc122_hwloc_obj_t) 0x2b139b28
>>> (gdb) print *obj
>>> $2 = {type = 2870188824, os_index = 10922, name = 0x2b139b18 
>>> "\b\233\023\253\252*", memory = {total_memory = 6579376, 
>>>   local_memory 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Ralph Castain
FWIW: I tried just printing out the contents of that root object immediately 
after importing the xml, and it clearly has a problem:

(gdb) print *obj
$2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 
, memory = {
total_memory = 46912502995240, local_memory = 46912502995240, 
page_types_len = 0, page_types = 0x0}, attr = 0x2, 
  depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, 
prev_cousin = 0x, parent = 0x0, 
  sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, 
children = 0x2b139738, 
  first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 0x0, 
complete_cpuset = 0x0, 
  online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, 
complete_nodeset = 0x644c90, 
  allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 690, 
infos = 0x0, infos_count = 0}


On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote:

> Here's the trace:
> 
> #0  0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, 
> topology=0x695f10, obj=0x2b139b28)
>at topology-xml.c:1094
> #1  0x2ae61b69 in hwloc___nolibxml_prepare_export (topology=0x695f10, 
>xmlbuffer=0x698a70 "\n topology SYSTEM \"hwloc.dtd\">\n\n   os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" 
> complete_cpuset=\"0xf...f\" onl"..., 
>buflen=16384) at topology-xml.c:1193
> #2  0x2ae61be0 in hwloc__nolibxml_prepare_export (topology=0x695f10, 
> bufferp=0x7fffd988, buflenp=0x7fffd97c)
>at topology-xml.c:1207
> #3  0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer 
> (topology=0x695f10, xmlbuffer=0x7fffd988, 
>buflen=0x7fffd97c) at topology-xml.c:1281
> #4  0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, topo2=0x6915c0, 
> type=22 '\026') at base/hwloc_base_dt.c:183
> #5  0x2adf348c in opal_dss_compare (value1=0x695f10, value2=0x6915c0, 
> type=22 '\026') at dss/dss_compare.c:39
> #6  0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, 
> data=0x6444d0) at base/plm_base_launch_support.c:564
> #7  0x2ae3881f in event_process_active_single_queue (base=0x60dd60, 
> activeq=0x6111e0) at event.c:1329
> #8  0x2ae38c71 in event_process_active (base=0x60dd60) at event.c:1396
> #9  0x2ae3902b in opal_libevent2012_event_base_loop (base=0x60dd60, 
> flags=1) at event.c:1598
> #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189
> #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at 
> base/plm_base_launch_support.c:666
> #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at 
> plm_slurm_module.c:404
> #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at 
> orterun.c:817
> #14 0x00402aa3 in main (argc=4, argv=0x7fffe1d8) at main.c:13
> 
> And the error report
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, 
> topology=0x695f10, obj=0x2b139b28)
>at topology-xml.c:1094
> 1094  sprintf(tmp, "%llu", (unsigned long long) 
> obj->memory.page_types[i].count);
> (gdb) print obj
> $1 = (opal_hwloc122_hwloc_obj_t) 0x2b139b28
> (gdb) print *obj
> $2 = {type = 2870188824, os_index = 10922, name = 0x2b139b18 
> "\b\233\023\253\252*", memory = {total_memory = 6579376, 
>local_memory = 6579376, page_types_len = 2870188856, page_types = 
> 0x2b139b38}, attr = 0x2b139b48, 
>  depth = 2870188872, logical_index = 10922, os_level = -1424778408, 
> next_cousin = 0x2b139b58, 
>  prev_cousin = 0x2b139b68, parent = 0x2b139b68, sibling_rank = 
> 2870188920, next_sibling = 0x2b139b78, 
>  prev_sibling = 0x2b139b88, arity = 2870188936, children = 
> 0x2b139b98, first_child = 0x2b139b98, 
>  last_child = 0x2b139ba8, userdata = 0x2b139ba8, cpuset = 
> 0x2b139bb8, complete_cpuset = 0x2b139bb8, 
>  online_cpuset = 0x2b139bc8, allowed_cpuset = 0x2b139bc8, nodeset = 
> 0x2b139bd8, 
>  complete_nodeset = 0x2b139bd8, allowed_nodeset = 0x2b139be8, 
> distances = 0x2b139be8, 
>  distances_count = 2870189048, infos = 0x2b139bf8, infos_count = 
> 2870189064}
> (gdb) print obj->memory
> $3 = {total_memory = 6579376, local_memory = 6579376, page_types_len = 
> 2870188856, page_types = 0x2b139b38}
> (gdb) print obj->memory.page_types
> $4 = (struct opal_hwloc122_hwloc_obj_memory_page_type_s *) 0x2b139b38
> (gdb) print i
> $5 = 1612
> (gdb) print obj->memory.page_types[1600]
> $6 = {size = 0, count = 0}
> (gdb) print obj->memory.page_types[1612]
> Cannot access memory at address 0x2b13fff8
> (gdb) print obj->memory.page_types[1611]
> $7 = {size = 0, count = 0}
> (gdb) 
> 
> 
> The whole obj looks like trash to me. I looked a little more - the object 
> referenced is the root object:
> 
> 1193hwloc__xml_export_object (, topology, 
> 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Ralph Castain
Here's the trace:

#0  0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, 
topology=0x695f10, obj=0x2b139b28)
at topology-xml.c:1094
#1  0x2ae61b69 in hwloc___nolibxml_prepare_export (topology=0x695f10, 
xmlbuffer=0x698a70 "\n\n\n  memory.page_types[i].count);
(gdb) print obj
$1 = (opal_hwloc122_hwloc_obj_t) 0x2b139b28
(gdb) print *obj
$2 = {type = 2870188824, os_index = 10922, name = 0x2b139b18 
"\b\233\023\253\252*", memory = {total_memory = 6579376, 
local_memory = 6579376, page_types_len = 2870188856, page_types = 
0x2b139b38}, attr = 0x2b139b48, 
  depth = 2870188872, logical_index = 10922, os_level = -1424778408, 
next_cousin = 0x2b139b58, 
  prev_cousin = 0x2b139b68, parent = 0x2b139b68, sibling_rank = 
2870188920, next_sibling = 0x2b139b78, 
  prev_sibling = 0x2b139b88, arity = 2870188936, children = 0x2b139b98, 
first_child = 0x2b139b98, 
  last_child = 0x2b139ba8, userdata = 0x2b139ba8, cpuset = 
0x2b139bb8, complete_cpuset = 0x2b139bb8, 
  online_cpuset = 0x2b139bc8, allowed_cpuset = 0x2b139bc8, nodeset = 
0x2b139bd8, 
  complete_nodeset = 0x2b139bd8, allowed_nodeset = 0x2b139be8, 
distances = 0x2b139be8, 
  distances_count = 2870189048, infos = 0x2b139bf8, infos_count = 
2870189064}
(gdb) print obj->memory
$3 = {total_memory = 6579376, local_memory = 6579376, page_types_len = 
2870188856, page_types = 0x2b139b38}
(gdb) print obj->memory.page_types
$4 = (struct opal_hwloc122_hwloc_obj_memory_page_type_s *) 0x2b139b38
(gdb) print i
$5 = 1612
(gdb) print obj->memory.page_types[1600]
$6 = {size = 0, count = 0}
(gdb) print obj->memory.page_types[1612]
Cannot access memory at address 0x2b13fff8
(gdb) print obj->memory.page_types[1611]
$7 = {size = 0, count = 0}
(gdb) 


The whole obj looks like trash to me. I looked a little more - the object 
referenced is the root object:

1193  hwloc__xml_export_object (, topology, 
hwloc_get_root_obj(topology));

I'm continuing to look in case I'm doing something stupid, but the code is 
pretty linear here - unpack, import, export for compare.


On Sep 24, 2011, at 8:59 AM, Jeff Squyres wrote:

> Here's some feedback from Ralph -- any idea what's going wrong here?
> 
> -
> 
> 1. I export a topology into xml using
> 
>   hwloc_topology_export_xmlbuffer(t, , );
> 
> I then pack and send the string.
> 
> 2. I unpack the string on the other end and import it into a topology
>   hwloc_topology_init();
>   if (0 != (rc = hwloc_topology_set_xmlbuffer(t, xmlbuffer, 
> strlen(xmlbuffer {
>   hwloc_topology_destroy(t);
>   goto cleanup;
>   }
>   hwloc_topology_load(t);
> 
> 3. I then need to compare two topologies, so I export the topology I received 
> into another xml string
>   hwloc_topology_export_xmlbuffer(t1, , );
> 
> It is this export that fails, which implies to me that somehow the import 
> didn't work right. Note that this code worked fine with libxml2, so this is a 
> regression.
> 
> 
> On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote:
> 
>> Yes, I can get some testing of the ompi branch pretty quickly.  I can bring 
>> in a new copy of this later today and see what we can see.
>> 
>> Many thanks!
>> 
>> 
>> On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote:
>> 
>>> I pushed the new minimalistic XML import/export implementation without
>>> libxml2 to the nolibxml branch. If libxml2 is available, it's still used
>>> by default. --disable-libxml2 or some env variables can be used for
>>> force the minimalistic implementation if needed. The minimalistic implem
>>> is only guaranteed to import XML files that were generated by hwloc
>>> (even if libxml was enabled there).
>>> 
>>> I also backported most of this to the new v1.2-ompi branch (required to
>>> backport some other XML cleanups from trunk). This branch will now serve
>>> as a base for Open MPI's embedded hwloc. The idea is to have a complete
>>> v1.2 + nolibxml somewhere so that we can at least run make check (Open
>>> MPI does not embed enough to run hwloc's make check).
>>> 
>>> How do we proceed now? Can we have the OMPI guys test the new code soon?
>>> Should I wait for their feedback before merging the nolibxml branch into
>>> the trunk? I'd like to merge this in v1.3 too (and basically release rc2
>>> as the actual first feature-complete RC), so getting feedback early
>>> might be appreciated.
>>> 
>>> Brice
>>> 
>>> ___
>>> hwloc-devel mailing list
>>> hwloc-de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-24 Thread Jeff Squyres
Here's some feedback from Ralph -- any idea what's going wrong here?

-

1. I export a topology into xml using

   hwloc_topology_export_xmlbuffer(t, , );

I then pack and send the string.

2. I unpack the string on the other end and import it into a topology
   hwloc_topology_init();
   if (0 != (rc = hwloc_topology_set_xmlbuffer(t, xmlbuffer, 
strlen(xmlbuffer {
   hwloc_topology_destroy(t);
   goto cleanup;
   }
   hwloc_topology_load(t);

3. I then need to compare two topologies, so I export the topology I received 
into another xml string
   hwloc_topology_export_xmlbuffer(t1, , );

It is this export that fails, which implies to me that somehow the import 
didn't work right. Note that this code worked fine with libxml2, so this is a 
regression.


On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote:

> Yes, I can get some testing of the ompi branch pretty quickly.  I can bring 
> in a new copy of this later today and see what we can see.
> 
> Many thanks!
> 
> 
> On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote:
> 
>> I pushed the new minimalistic XML import/export implementation without
>> libxml2 to the nolibxml branch. If libxml2 is available, it's still used
>> by default. --disable-libxml2 or some env variables can be used for
>> force the minimalistic implementation if needed. The minimalistic implem
>> is only guaranteed to import XML files that were generated by hwloc
>> (even if libxml was enabled there).
>> 
>> I also backported most of this to the new v1.2-ompi branch (required to
>> backport some other XML cleanups from trunk). This branch will now serve
>> as a base for Open MPI's embedded hwloc. The idea is to have a complete
>> v1.2 + nolibxml somewhere so that we can at least run make check (Open
>> MPI does not embed enough to run hwloc's make check).
>> 
>> How do we proceed now? Can we have the OMPI guys test the new code soon?
>> Should I wait for their feedback before merging the nolibxml branch into
>> the trunk? I'd like to merge this in v1.3 too (and basically release rc2
>> as the actual first feature-complete RC), so getting feedback early
>> might be appreciated.
>> 
>> Brice
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-22 Thread Jeff Squyres
On Sep 22, 2011, at 11:41 AM, Brice Goglin wrote:

> This is strange. I just tried building the hwloc tree with prefixing
> enabled, I could not find any problem (except one missing symbol that
> doesn't matter here, see next commits).
> 
> Basically nothing has changed outside of src/topology-xml.c, the above
> symbols existed before, they still exist. I don't understand why their
> renaming would fail now. However, those were in #ifdef HWLOC_HAVE_XML
> before, but this symbol isn't used anymore. Did you rerun autogen ?

Disregard -- I think this is a problem in how we're slurping OMPI's hwloc into 
our tree... nothing wrong in hwloc itself...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-22 Thread Jeff Squyres
I have to run out ATM so I can't dig into this deeply for a few hours, but with 
a first take, I'm getting this error:

Making all in src
  CC topology.lo
topology.c: In function 'hwloc_discover':
topology.c:1673: error: 'OPAL_HWLOC121hwloc_BACKEND_XML' undeclared (first use 
in this function)
topology.c:1673: error: (Each undeclared identifier is reported only once
topology.c:1673: error: for each function it appears in.)
topology.c:1674: error: implicit declaration of function 
'opal_hwloc121hwloc_look_xml'
topology.c: In function 'hwloc_backend_exit':
topology.c:2078: error: 'OPAL_HWLOC121hwloc_BACKEND_XML' undeclared (first use 
in this function)
topology.c:2079: error: implicit declaration of function 
'opal_hwloc121hwloc_backend_xml_exit'
topology.c: In function 'opal_hwloc121hwloc_topology_set_xml':
topology.c:2134: error: implicit declaration of function 
'opal_hwloc121hwloc_backend_xml_init'
make[1]: *** [topology.lo] Error 1
make: *** [all-recursive] Error 1

We use the hwloc prefix stuff in the OMPI embedded build; did something not get 
prefixed properly in the minimal XML stuff?

Back in a few hours...



On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote:

> Yes, I can get some testing of the ompi branch pretty quickly.  I can bring 
> in a new copy of this later today and see what we can see.
> 
> Many thanks!
> 
> 
> On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote:
> 
>> I pushed the new minimalistic XML import/export implementation without
>> libxml2 to the nolibxml branch. If libxml2 is available, it's still used
>> by default. --disable-libxml2 or some env variables can be used for
>> force the minimalistic implementation if needed. The minimalistic implem
>> is only guaranteed to import XML files that were generated by hwloc
>> (even if libxml was enabled there).
>> 
>> I also backported most of this to the new v1.2-ompi branch (required to
>> backport some other XML cleanups from trunk). This branch will now serve
>> as a base for Open MPI's embedded hwloc. The idea is to have a complete
>> v1.2 + nolibxml somewhere so that we can at least run make check (Open
>> MPI does not embed enough to run hwloc's make check).
>> 
>> How do we proceed now? Can we have the OMPI guys test the new code soon?
>> Should I wait for their feedback before merging the nolibxml branch into
>> the trunk? I'd like to merge this in v1.3 too (and basically release rc2
>> as the actual first feature-complete RC), so getting feedback early
>> might be appreciated.
>> 
>> Brice
>> 
>> ___
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-19 Thread Brice Goglin
I pushed the new minimalistic XML import/export implementation without
libxml2 to the nolibxml branch. If libxml2 is available, it's still used
by default. --disable-libxml2 or some env variables can be used for
force the minimalistic implementation if needed. The minimalistic implem
is only guaranteed to import XML files that were generated by hwloc
(even if libxml was enabled there).

I also backported most of this to the new v1.2-ompi branch (required to
backport some other XML cleanups from trunk). This branch will now serve
as a base for Open MPI's embedded hwloc. The idea is to have a complete
v1.2 + nolibxml somewhere so that we can at least run make check (Open
MPI does not embed enough to run hwloc's make check).

How do we proceed now? Can we have the OMPI guys test the new code soon?
Should I wait for their feedback before merging the nolibxml branch into
the trunk? I'd like to merge this in v1.3 too (and basically release rc2
as the actual first feature-complete RC), so getting feedback early
might be appreciated.

Brice



Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-06 Thread Jeff Squyres
My $0.02: for simplicity, let's force ASCII-only.  If we get complaints/feature 
requests, we can see about updating to include non-ASCII.

But then again, I'm biased because I'm an American.  You guys might have 
different views -- e.g., you need non-ASCII for your organization's name.



On Sep 5, 2011, at 11:04 AM, Brice Goglin wrote:

> Regarding XML encoding:
> 
> It seems that libxml2 rewrites the following characters as XML entities:
> \n
> \r
> \t
> "
> <
>> 
> &
> 
> 
> hwloc already tells libxml2 to export as UTF-8. However, a quick check
> seems to say that the output is not UTF8 when the locale isn't UTF8. We
> may need to cdouble-check/clarify/fix this.
> 
> Or we can enforce ASCII-only for all strings. Should be OK for all
> strings we import from the OS. Will need to be enforced for user-given
> strings (object info attributes).
> 
> Brice
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-05 Thread Jeff Squyres
On Sep 5, 2011, at 2:22 AM, Brice Goglin wrote:

> Samuel thinks we could stay with XML and reimplement our own
> parsing/dumping without libxml2.
> 
> My feeling about this is:
> + We would have a single file format for import/export.
> + Saving would likely be easy (copy-paste from the current code and/or
> from the JSON export)
> - Parsing would require some work (the libxml2-based parser isn't easy
> to modify, but we could adapt the JSON parser)

Is there a way to make the parsing easier?  I.e., do we have to accept fully 
generic XML?  Or can we restrict it somehow such that the parsing becomes much 
more deterministic / simpler?

> - Encoding may be annoying. libxml2 does a lot of things to manage
> strings properly. There's not a lot of special character in a usual XML
> output, but there can be (because the user can annotate the objects).
> - I am a bit afraid that we would go from a well-working XML support to
> something much less reliable (do we need to be fully XML compliant so
> that external programs can load our XML files and play with them?)

A fair point.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/09/11 19:54, Jeff Squyres wrote:

> Fail enough,

Nice Freudian slip. :-)

> but do the back-end nodes have libxml?

Apparently so..

rpm --root /bgsys/drivers/ppcfloor/linux/OS -qa | grep -i xml
libxml2-2.6.23-22

That's the I/O node filesystem, which is what the compute node
kernel maps I/O's back to I believe.  Mind you most people on
BG do statically linking as dynamic linking is rather new there.

>  For us to do what we want, it would need to be available on
> all nodes because the OMPI orted processes would be querying
> hwloc for the local topology and then sending it to the "head"
> node process (usually mpirun) for further analysis and process
> mapping.

Umm, not sure that'll work on a BG because you can't fork() or
execve() on a BG, the IBM mpirun runs on the login node and talks
to an mpirund on the service node which then launches the users
code on the compute nodes via the Navigator API.

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5kgRIACgkQO2KABBYQAh+EHgCfQhsNl5axcV+tHQ6jrAJW6Pq6
6EQAn3Dc4qkwoRd23KimXh9rrO0CKz9n
=xlWv
-END PGP SIGNATURE-


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-05 Thread Brice Goglin
Samuel thinks we could stay with XML and reimplement our own
parsing/dumping without libxml2.

My feeling about this is:
+ We would have a single file format for import/export.
+ Saving would likely be easy (copy-paste from the current code and/or
from the JSON export)
- Parsing would require some work (the libxml2-based parser isn't easy
to modify, but we could adapt the JSON parser)
- Encoding may be annoying. libxml2 does a lot of things to manage
strings properly. There's not a lot of special character in a usual XML
output, but there can be (because the user can annotate the objects).
- I am a bit afraid that we would go from a well-working XML support to
something much less reliable (do we need to be fully XML compliant so
that external programs can load our XML files and play with them?)

Opinions?

Brice



Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-04 Thread Brice Goglin
JSON looks a bit more verbose than YAML, but JSON also looked better for
our hierarchical information, so I gave JSON a try. I just pushed the
result to the new json branch.

Notes:
* You can only load/save from/to a memory buffer (set_jsonbufffer and
export_jsonbuffer), but lstopo needed to load/save from/to a file, so I
could add the corresponding routines (set_json and export_json) to the
public interface to match what we have for XML
* We don't care about the validity of our JSON output, but some quick
tests seem to say that it's OK anyway
* I tried to handle most parsing errors, it should not crash during
parsing, but it may crash later after the discovery (e.g. if you get an
error within a child before finishing its parent). It's not clear that
it's worse than XML. Loading a bogus JSON or XML topology is a user
error anyway :)
* Distances needs rework (the same I did for XML recently). I didn't do
it because it would make backporting to 

Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-02 Thread Marcelo Alaniz

On Fri, Sep 02, 2011 at 05:57:11AM -0400, Jeff Squyres wrote:
> JSON: sure, it's an easy format, but we're not really targeting web-ish kinds 
> of things here, are we?  
The format isn't only for web-ish. A lot of embebbed apps use it.
I send an example in attach and use this site to do it: 
http://www.thomasfrank.se/xml_to_json.html

Cheers! 
> 
> YAML: ya, that's also an easy format.
> 
> But the goal here is to do something utterly trivial that has no support 
> library requirement.  Unless someone has specific requirements for these 
> formats, I'm ok with a totally trivial and 
> not-necessarily-compatibilte-with-anyone-else's-format format.
> 
> 
> On Sep 1, 2011, at 9:38 PM, Christopher Samuel wrote:
> 
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> > 
> > On 02/09/11 01:30, Jeff Squyres wrote:
> > 
> >> Is there any chance that a lighter-weight, simple string
> >> parsing module could be added to hwloc?
> > 
> > What about something based on YAML ?
> > 
> > http://www.yaml.org/spec/1.2/spec.html
> > 
> > Designed to be easy to read by a human..
> > 
> > - -- 
> >Christopher Samuel - Senior Systems Administrator
> > VLSCI - Victorian Life Sciences Computation Initiative
> > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> > http://www.vlsci.unimelb.edu.au/
> > 
> > -BEGIN PGP SIGNATURE-
> > Version: GnuPG v1.4.11 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> > 
> > iEYEARECAAYFAk5gM5YACgkQO2KABBYQAh8LAgCgh9dBLor3Sfiw8PCDvffZxjN1
> > j/YAnjB9vno4MY34DSxOwWT45yyU29y/
> > =/FPJ
> > -END PGP SIGNATURE-
> > ___
> > hwloc-devel mailing list
> > hwloc-de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel

-- 
Degree Alaniz Marcelo
Frontend Development 
HPC PhD Student


out.json
Description: application/json


signature.asc
Description: Digital signature


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-02 Thread Jeff Squyres (jsquyres)
I don't know enough about either format to say. 

Sent from my phone. No type good. 

On Sep 2, 2011, at 6:03 AM, "Samuel Thibault"  wrote:

> Jeff Squyres, le Fri 02 Sep 2011 11:58:05 +0200, a écrit :
>> JSON: sure, it's an easy format, but we're not really targeting web-ish 
>> kinds of things here, are we?  
>> 
>> YAML: ya, that's also an easy format.
>> 
>> But the goal here is to do something utterly trivial that has no support 
>> library requirement.  Unless someone has specific requirements for these 
>> formats, I'm ok with a totally trivial and 
>> not-necessarily-compatibilte-with-anyone-else's-format format.
> 
> If we can easily implement our own parser for json or yaml (or some
> other standard format), we should simply go for one of them.
> 
> Samuel
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-02 Thread Samuel Thibault
Jeff Squyres, le Fri 02 Sep 2011 11:58:05 +0200, a écrit :
> JSON: sure, it's an easy format, but we're not really targeting web-ish kinds 
> of things here, are we?  
> 
> YAML: ya, that's also an easy format.
> 
> But the goal here is to do something utterly trivial that has no support 
> library requirement.  Unless someone has specific requirements for these 
> formats, I'm ok with a totally trivial and 
> not-necessarily-compatibilte-with-anyone-else's-format format.

If we can easily implement our own parser for json or yaml (or some
other standard format), we should simply go for one of them.

Samuel


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-02 Thread Jeff Squyres
JSON: sure, it's an easy format, but we're not really targeting web-ish kinds 
of things here, are we?  

YAML: ya, that's also an easy format.

But the goal here is to do something utterly trivial that has no support 
library requirement.  Unless someone has specific requirements for these 
formats, I'm ok with a totally trivial and 
not-necessarily-compatibilte-with-anyone-else's-format format.


On Sep 1, 2011, at 9:38 PM, Christopher Samuel wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 02/09/11 01:30, Jeff Squyres wrote:
> 
>> Is there any chance that a lighter-weight, simple string
>> parsing module could be added to hwloc?
> 
> What about something based on YAML ?
> 
> http://www.yaml.org/spec/1.2/spec.html
> 
> Designed to be easy to read by a human..
> 
> - -- 
>Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.unimelb.edu.au/
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAk5gM5YACgkQO2KABBYQAh8LAgCgh9dBLor3Sfiw8PCDvffZxjN1
> j/YAnjB9vno4MY34DSxOwWT45yyU29y/
> =/FPJ
> -END PGP SIGNATURE-
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-02 Thread Jeff Squyres
On Sep 1, 2011, at 9:40 PM, Christopher Samuel wrote:

> Well BG/P doesn't support Open-MPI, but the service
> (management) node and the front end (login) nodes are
> PPC SLES10 and libxml2 is there..
> 
> tambo-m:~ # rpm -q libxml2
> libxml2-2.6.23-15.25.5

Fail enough, but do the back-end nodes have libxml?  For us to do what we want, 
it would need to be available on all nodes because the OMPI orted processes 
would be querying hwloc for the local topology and then sending it to the 
"head" node process (usually mpirun) for further analysis and process mapping.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/09/11 02:01, Jeff Squyres wrote:

> Blue Gene?

Well BG/P doesn't support Open-MPI, but the service
(management) node and the front end (login) nodes are
PPC SLES10 and libxml2 is there..

tambo-m:~ # rpm -q libxml2
libxml2-2.6.23-15.25.5

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5gNCIACgkQO2KABBYQAh/slQCePYvmBweezxSw0B+GySgdpmz8
bZIAn2MB0wg6ahQomHqWtiocCRZcYm/O
=DLRU
-END PGP SIGNATURE-


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/09/11 01:30, Jeff Squyres wrote:

> Is there any chance that a lighter-weight, simple string
> parsing module could be added to hwloc?

What about something based on YAML ?

 http://www.yaml.org/spec/1.2/spec.html

Designed to be easy to read by a human..

- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5gM5YACgkQO2KABBYQAh8LAgCgh9dBLor3Sfiw8PCDvffZxjN1
j/YAnjB9vno4MY34DSxOwWT45yyU29y/
=/FPJ
-END PGP SIGNATURE-


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Samuel Thibault
Jeff Squyres, le Thu 01 Sep 2011 20:31:44 +0200, a écrit :
> hst: hwloc simple text

I like this one.

Samuel


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Jeff Squyres
On Sep 1, 2011, at 12:17 PM, Brice Goglin wrote:

> Support for the most useful attributes would be done within a couple
> hours. The annoying thing is to support all attributes, distances, ...

If you kruft up some of the infrastructure and some examples, I could volunteer 
some grunt work to fill in the rest.

> Also we'd need to find a good name for this new backend. Something away
> from "text" because we already have the txt and ncurses outputs in
> lstopo :) Maybe "hwloc". hwloc_topology_export_hwlocbuffer() and "lstopo
> foo.hwloc" :) Or "htx" for "hwloc tiny xml".

Hah!  htx might work.  Or:

hst: hwloc simple text
hnx: hwloc NoXML
htt: hwloc trivial text
simple: obvious
serialized: obvious
string: obvious
newline: because the fields are newline-delimited

...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Brice Goglin
Le 01/09/2011 18:01, Jeff Squyres a écrit :
> I could *probably* write this, but I'm guessing you guys could write it much 
> faster than I could...

Support for the most useful attributes would be done within a couple
hours. The annoying thing is to support all attributes, distances, ...

Also we'd need to find a good name for this new backend. Something away
from "text" because we already have the txt and ncurses outputs in
lstopo :) Maybe "hwloc". hwloc_topology_export_hwlocbuffer() and "lstopo
foo.hwloc" :) Or "htx" for "hwloc tiny xml".

Brice



Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Jeff Squyres
On Sep 1, 2011, at 11:49 AM, Brice Goglin wrote:

> Did you actually find many machines/distribs that don't have libxml2 
> installed by default? There are literaly hundreds of packages that depend on 
> libxml2 (at least in Debian) so I am not sure depending on it is really a 
> problem.

Cray, for sure.  Josh told me off-list that it's a real PITA for them to 
build/support libxml on the ORNL Crays.

Blue Gene?  Windows?

> Also are there really some string space problems?

No.  The space savings is a minor benefit; I only included it for completeness.

> Otherwise, implementing this is likely easy, especially if you find somebody 
> to do it :) Start from the XML export, convert it into a text export, and 
> write the corresponding import (starting from the XML import may be hard 
> because it's recursive).
> 
> Would you need an export to a file or to a memory buffer or both?

Memory buffer would be most preferable, because we're going to generate it on 
the back end node, pack it to a network buffer, send it, receive it on the head 
node, unpack it from the network buffer, and slurp it into a hwloc topology.

> Last but not least: what's the deadline?

Ralph is actively working on code for the RFC I sent around yesterday:

http://www.open-mpi.org/community/lists/devel/2011/08/9737.php

We'll probably use XML just to get it going, but it would be good to not equate 
"libxmpi" with "hwloc" in OMPI developers' brains.  :-)  So -- "sometime soon" 
would be nice.

I could *probably* write this, but I'm guessing you guys could write it much 
faster than I could...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Samuel Thibault
Jeff Squyres, le Thu 01 Sep 2011 17:31:05 +0200, a écrit :
> Do you think this would be easy to implement?

A quite strict format could probably be easy to implement and still be
extensible. The XML will probably remain useful for people who like XSLT
:)

Samuel


Re: [hwloc-devel] Something lighter-weight than XML?

2011-09-01 Thread Brice Goglin
Did you actually find many machines/distribs that don't have libxml2
installed by default? There are literaly hundreds of packages that
depend on libxml2 (at least in Debian) so I am not sure depending on it
is really a problem.

Also are there really some string space problems? Even when talking
about 1000 nodes transferring 100kB once at the beginning on the job, it
doesn't look too bad to me (and these XMLs could be cached on the
frontend as long as the compute nodes don't change).

Otherwise, implementing this is likely easy, especially if you find
somebody to do it :) Start from the XML export, convert it into a text
export, and write the corresponding import (starting from the XML import
may be hard because it's recursive).

Would you need an export to a file or to a memory buffer or both?

Last but not least: what's the deadline?

Brice



Le 01/09/2011 17:30, Jeff Squyres a écrit :
> We're (finally) bringing full hwloc services up in Open MPI.
>
> One of the things we want to do is send server topologies from back-end 
> compute nodes to the front-end node.  The XML export/import functionality 
> would work for this, but a) it's a bit heavyweight, and b) it seems weird to 
> require XML to build MPI.
>
> Is there any chance that a lighter-weight, simple string parsing module could 
> be added to hwloc?  I'm guessing that we could save a modest amount of string 
> space (SWAG: 20%?), but we wouldn't need a dependency on libxml, which would 
> be good.
>
> I took a lstopo --no-io foo.xml output on an older xeon machine and, while 
> sitting on a boring teleconf, I manually converted it in emacs to a 
> (slightly) simpler text format.  I attached the two files.  There's a modest 
> space savings (about 17%).  But libxml clearly would not be necessary.
>
> Do you think this would be easy to implement?
>
>
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel