Re: [hwloc-devel] Something lighter-weight than XML?
Thanks! On Sep 24, 2011, at 2:18 PM, Brice Goglin wrote: > I fixed one parsing bug in commit 3660 on the v1.2-ompi branch. Things > should work better now. > > Parsing XML distance matrices was broken when the XML file came from the > no-libxml exporter. That's why you had problems on your dual-amd machine > (those have distance matrices) and not on your mac (single processor, no > distances, no bug). > > The v1.2 branch doesn't report parsing failure well, so it just crashed. > Trunk exits with an error instead of crashing. > > Brice > > > > > Le 24/09/2011 20:37, Ralph Castain a écrit : >> Yep, it fails. Runs on my Mac, but not under Linux. >> >> Program terminated with signal 11, Segmentation fault. >> #0 0x2acdbedd in hwloc_bitmap_snprintf () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> (gdb) where >> #0 0x2acdbedd in hwloc_bitmap_snprintf () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #1 0x2acdc060 in hwloc_bitmap_asprintf () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #2 0x2acd9b34 in hwloc__xml_export_object () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #3 0x2acda325 in hwloc___nolibxml_prepare_export () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #4 0x2acda39c in hwloc__nolibxml_prepare_export () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #5 0x2acda4be in hwloc_topology_export_xmlbuffer () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #6 0x004009b8 in main () at xmlbuffer.c:31 >> >> On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote: >> >>> Indeed, this object contains invalid pointers. >>> >>> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does >>> export+import+export+compare on the same machine. It would be good to >>> know if it fails on one of the machines you're using here. >>> >>> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt >>> >>> thanks >>> Brice >>> >>> >>> >>> Le 24/09/2011 17:07, Ralph Castain a écrit : FWIW: I tried just printing out the contents of that root object immediately after importing the xml, and it clearly has a problem: (gdb) print *obj $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 , memory = { total_memory = 46912502995240, local_memory = 46912502995240, page_types_len = 0, page_types = 0x0}, attr = 0x2, depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, prev_cousin = 0x, parent = 0x0, sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, children = 0x2b139738, first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 0x0, complete_cpuset = 0x0, online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, complete_nodeset = 0x644c90, allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 690, infos = 0x0, infos_count = 0} On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote: > Here's the trace: > > #0 0x2ae61737 in hwloc__xml_export_object > (output=0x7fffd890, topology=0x695f10, obj=0x2b139b28) > at topology-xml.c:1094 > #1 0x2ae61b69 in hwloc___nolibxml_prepare_export > (topology=0x695f10, > xmlbuffer=0x698a70 " encoding=\"UTF-8\"?>\n \"hwloc.dtd\">\n\n os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" > complete_cpuset=\"0xf...f\" onl"..., > buflen=16384) at topology-xml.c:1193 > #2 0x2ae61be0 in hwloc__nolibxml_prepare_export > (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c) > at topology-xml.c:1207 > #3 0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer > (topology=0x695f10, xmlbuffer=0x7fffd988, > buflen=0x7fffd97c) at topology-xml.c:1281 > #4 0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, > topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183 > #5 0x2adf348c in opal_dss_compare (value1=0x695f10, > value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39 > #6 0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, > data=0x6444d0) at base/plm_base_launch_support.c:564 > #7 0x2ae3881f in event_process_active_single_queue > (base=0x60dd60, activeq=0x6111e0) at event.c:1329 > #8 0x2ae38c71 in event_process_active (base=0x60dd60) at > event.c:1396 > #9 0x2ae3902b in opal_libevent2012_event_base_loop > (base=0x60dd60, flags=1) at event.c:1598 > #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189 > #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) > at base/plm_base_launch_support.c:666 > #12 0x2ada49e1 in
Re: [hwloc-devel] Something lighter-weight than XML?
I fixed one parsing bug in commit 3660 on the v1.2-ompi branch. Things should work better now. Parsing XML distance matrices was broken when the XML file came from the no-libxml exporter. That's why you had problems on your dual-amd machine (those have distance matrices) and not on your mac (single processor, no distances, no bug). The v1.2 branch doesn't report parsing failure well, so it just crashed. Trunk exits with an error instead of crashing. Brice Le 24/09/2011 20:37, Ralph Castain a écrit : > Yep, it fails. Runs on my Mac, but not under Linux. > > Program terminated with signal 11, Segmentation fault. > #0 0x2acdbedd in hwloc_bitmap_snprintf () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > (gdb) where > #0 0x2acdbedd in hwloc_bitmap_snprintf () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > #1 0x2acdc060 in hwloc_bitmap_asprintf () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > #2 0x2acd9b34 in hwloc__xml_export_object () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > #3 0x2acda325 in hwloc___nolibxml_prepare_export () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > #4 0x2acda39c in hwloc__nolibxml_prepare_export () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > #5 0x2acda4be in hwloc_topology_export_xmlbuffer () from > /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 > #6 0x004009b8 in main () at xmlbuffer.c:31 > > On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote: > >> Indeed, this object contains invalid pointers. >> >> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does >> export+import+export+compare on the same machine. It would be good to >> know if it fails on one of the machines you're using here. >> >> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt >> >> thanks >> Brice >> >> >> >> Le 24/09/2011 17:07, Ralph Castain a écrit : >>> FWIW: I tried just printing out the contents of that root object >>> immediately after importing the xml, and it clearly has a problem: >>> >>> (gdb) print *obj >>> $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 >>> , memory = { >>>total_memory = 46912502995240, local_memory = 46912502995240, >>> page_types_len = 0, page_types = 0x0}, attr = 0x2, >>> depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, >>> prev_cousin = 0x, parent = 0x0, >>> sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, >>> children = 0x2b139738, >>> first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = >>> 0x0, complete_cpuset = 0x0, >>> online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, >>> complete_nodeset = 0x644c90, >>> allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = >>> 690, infos = 0x0, infos_count = 0} >>> >>> >>> On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote: >>> Here's the trace: #0 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, topology=0x695f10, obj=0x2b139b28) at topology-xml.c:1094 #1 0x2ae61b69 in hwloc___nolibxml_prepare_export (topology=0x695f10, xmlbuffer=0x698a70 ">>> encoding=\"UTF-8\"?>\n>>> \"hwloc.dtd\">\n\n >>> os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" complete_cpuset=\"0xf...f\" onl"..., buflen=16384) at topology-xml.c:1193 #2 0x2ae61be0 in hwloc__nolibxml_prepare_export (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c) at topology-xml.c:1207 #3 0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer (topology=0x695f10, xmlbuffer=0x7fffd988, buflen=0x7fffd97c) at topology-xml.c:1281 #4 0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183 #5 0x2adf348c in opal_dss_compare (value1=0x695f10, value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39 #6 0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, data=0x6444d0) at base/plm_base_launch_support.c:564 #7 0x2ae3881f in event_process_active_single_queue (base=0x60dd60, activeq=0x6111e0) at event.c:1329 #8 0x2ae38c71 in event_process_active (base=0x60dd60) at event.c:1396 #9 0x2ae3902b in opal_libevent2012_event_base_loop (base=0x60dd60, flags=1) at event.c:1598 #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189 #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at base/plm_base_launch_support.c:666 #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at plm_slurm_module.c:404 #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at orterun.c:817 #14 0x00402aa3 in main
Re: [hwloc-devel] Something lighter-weight than XML?
This is 1.2-ompi, running on Linux 2.6.18-274.el5 on x86_64 $ uname -a Linux xxx 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux On Sep 24, 2011, at 12:43 PM, Brice Goglin wrote: > What platform and distribution do you have? > > Brice > > > > Le 24/09/2011 20:37, Ralph Castain a écrit : >> Yep, it fails. Runs on my Mac, but not under Linux. >> >> Program terminated with signal 11, Segmentation fault. >> #0 0x2acdbedd in hwloc_bitmap_snprintf () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> (gdb) where >> #0 0x2acdbedd in hwloc_bitmap_snprintf () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #1 0x2acdc060 in hwloc_bitmap_asprintf () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #2 0x2acd9b34 in hwloc__xml_export_object () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #3 0x2acda325 in hwloc___nolibxml_prepare_export () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #4 0x2acda39c in hwloc__nolibxml_prepare_export () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #5 0x2acda4be in hwloc_topology_export_xmlbuffer () from >> /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 >> #6 0x004009b8 in main () at xmlbuffer.c:31 >> >> On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote: >> >>> Indeed, this object contains invalid pointers. >>> >>> Can you try to run tests/xmlbuffer.c from hwloc's tree? It does >>> export+import+export+compare on the same machine. It would be good to >>> know if it fails on one of the machines you're using here. >>> >>> https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt >>> >>> thanks >>> Brice >>> >>> >>> >>> Le 24/09/2011 17:07, Ralph Castain a écrit : FWIW: I tried just printing out the contents of that root object immediately after importing the xml, and it clearly has a problem: (gdb) print *obj $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 , memory = { total_memory = 46912502995240, local_memory = 46912502995240, page_types_len = 0, page_types = 0x0}, attr = 0x2, depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, prev_cousin = 0x, parent = 0x0, sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, children = 0x2b139738, first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 0x0, complete_cpuset = 0x0, online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, complete_nodeset = 0x644c90, allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 690, infos = 0x0, infos_count = 0} On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote: > Here's the trace: > > #0 0x2ae61737 in hwloc__xml_export_object > (output=0x7fffd890, topology=0x695f10, obj=0x2b139b28) > at topology-xml.c:1094 > #1 0x2ae61b69 in hwloc___nolibxml_prepare_export > (topology=0x695f10, > xmlbuffer=0x698a70 " encoding=\"UTF-8\"?>\n \"hwloc.dtd\">\n\n os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" > complete_cpuset=\"0xf...f\" onl"..., > buflen=16384) at topology-xml.c:1193 > #2 0x2ae61be0 in hwloc__nolibxml_prepare_export > (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c) > at topology-xml.c:1207 > #3 0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer > (topology=0x695f10, xmlbuffer=0x7fffd988, > buflen=0x7fffd97c) at topology-xml.c:1281 > #4 0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, > topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183 > #5 0x2adf348c in opal_dss_compare (value1=0x695f10, > value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39 > #6 0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, > data=0x6444d0) at base/plm_base_launch_support.c:564 > #7 0x2ae3881f in event_process_active_single_queue > (base=0x60dd60, activeq=0x6111e0) at event.c:1329 > #8 0x2ae38c71 in event_process_active (base=0x60dd60) at > event.c:1396 > #9 0x2ae3902b in opal_libevent2012_event_base_loop > (base=0x60dd60, flags=1) at event.c:1598 > #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189 > #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) > at base/plm_base_launch_support.c:666 > #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at > plm_slurm_module.c:404 > #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at > orterun.c:817 > #14 0x00402aa3 in main (argc=4, argv=0x7fffe1d8) at main.c:13 > > And the error report >
Re: [hwloc-devel] Something lighter-weight than XML?
Yep, it fails. Runs on my Mac, but not under Linux. Program terminated with signal 11, Segmentation fault. #0 0x2acdbedd in hwloc_bitmap_snprintf () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 (gdb) where #0 0x2acdbedd in hwloc_bitmap_snprintf () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 #1 0x2acdc060 in hwloc_bitmap_asprintf () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 #2 0x2acd9b34 in hwloc__xml_export_object () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 #3 0x2acda325 in hwloc___nolibxml_prepare_export () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 #4 0x2acda39c in hwloc__nolibxml_prepare_export () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 #5 0x2acda4be in hwloc_topology_export_xmlbuffer () from /nfs/rinfs/san/homedirs/rhc/lib/libhwloc.so.3 #6 0x004009b8 in main () at xmlbuffer.c:31 On Sep 24, 2011, at 9:45 AM, Brice Goglin wrote: > Indeed, this object contains invalid pointers. > > Can you try to run tests/xmlbuffer.c from hwloc's tree? It does > export+import+export+compare on the same machine. It would be good to > know if it fails on one of the machines you're using here. > > https://svn.open-mpi.org/trac/hwloc/browser/branches/v1.2-ompi/tests/xmlbuffer.c?rev=3837=txt > > thanks > Brice > > > > Le 24/09/2011 17:07, Ralph Castain a écrit : >> FWIW: I tried just printing out the contents of that root object immediately >> after importing the xml, and it clearly has a problem: >> >> (gdb) print *obj >> $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 >> , memory = { >>total_memory = 46912502995240, local_memory = 46912502995240, >> page_types_len = 0, page_types = 0x0}, attr = 0x2, >> depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, >> prev_cousin = 0x, parent = 0x0, >> sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, >> children = 0x2b139738, >> first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = >> 0x0, complete_cpuset = 0x0, >> online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, >> complete_nodeset = 0x644c90, >> allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = >> 690, infos = 0x0, infos_count = 0} >> >> >> On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote: >> >>> Here's the trace: >>> >>> #0 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, >>> topology=0x695f10, obj=0x2b139b28) >>> at topology-xml.c:1094 >>> #1 0x2ae61b69 in hwloc___nolibxml_prepare_export >>> (topology=0x695f10, >>> xmlbuffer=0x698a70 "\n>> topology SYSTEM \"hwloc.dtd\">\n\n >> os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" >>> complete_cpuset=\"0xf...f\" onl"..., >>> buflen=16384) at topology-xml.c:1193 >>> #2 0x2ae61be0 in hwloc__nolibxml_prepare_export >>> (topology=0x695f10, bufferp=0x7fffd988, buflenp=0x7fffd97c) >>> at topology-xml.c:1207 >>> #3 0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer >>> (topology=0x695f10, xmlbuffer=0x7fffd988, >>> buflen=0x7fffd97c) at topology-xml.c:1281 >>> #4 0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, >>> topo2=0x6915c0, type=22 '\026') at base/hwloc_base_dt.c:183 >>> #5 0x2adf348c in opal_dss_compare (value1=0x695f10, >>> value2=0x6915c0, type=22 '\026') at dss/dss_compare.c:39 >>> #6 0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, >>> data=0x6444d0) at base/plm_base_launch_support.c:564 >>> #7 0x2ae3881f in event_process_active_single_queue (base=0x60dd60, >>> activeq=0x6111e0) at event.c:1329 >>> #8 0x2ae38c71 in event_process_active (base=0x60dd60) at >>> event.c:1396 >>> #9 0x2ae3902b in opal_libevent2012_event_base_loop (base=0x60dd60, >>> flags=1) at event.c:1598 >>> #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189 >>> #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at >>> base/plm_base_launch_support.c:666 >>> #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at >>> plm_slurm_module.c:404 >>> #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at >>> orterun.c:817 >>> #14 0x00402aa3 in main (argc=4, argv=0x7fffe1d8) at main.c:13 >>> >>> And the error report >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, >>> topology=0x695f10, obj=0x2b139b28) >>> at topology-xml.c:1094 >>> 1094sprintf(tmp, "%llu", (unsigned long long) >>> obj->memory.page_types[i].count); >>> (gdb) print obj >>> $1 = (opal_hwloc122_hwloc_obj_t) 0x2b139b28 >>> (gdb) print *obj >>> $2 = {type = 2870188824, os_index = 10922, name = 0x2b139b18 >>> "\b\233\023\253\252*", memory = {total_memory = 6579376, >>> local_memory
Re: [hwloc-devel] Something lighter-weight than XML?
FWIW: I tried just printing out the contents of that root object immediately after importing the xml, and it clearly has a problem: (gdb) print *obj $2 = {type = OPAL_HWLOC122_hwloc_OBJ_SYSTEM, os_index = 0, name = 0x101 , memory = { total_memory = 46912502995240, local_memory = 46912502995240, page_types_len = 0, page_types = 0x0}, attr = 0x2, depth = 6900112, logical_index = 0, os_level = 6571424, next_cousin = 0x0, prev_cousin = 0x, parent = 0x0, sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 145, children = 0x2b139738, first_child = 0x2b139738, last_child = 0x0, userdata = 0x0, cpuset = 0x0, complete_cpuset = 0x0, online_cpuset = 0x644700, allowed_cpuset = 0x691970, nodeset = 0x6919e0, complete_nodeset = 0x644c90, allowed_nodeset = 0x644cb0, distances = 0x6948b0, distances_count = 690, infos = 0x0, infos_count = 0} On Sep 24, 2011, at 9:02 AM, Ralph Castain wrote: > Here's the trace: > > #0 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, > topology=0x695f10, obj=0x2b139b28) >at topology-xml.c:1094 > #1 0x2ae61b69 in hwloc___nolibxml_prepare_export (topology=0x695f10, >xmlbuffer=0x698a70 "\n topology SYSTEM \"hwloc.dtd\">\n\n os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" > complete_cpuset=\"0xf...f\" onl"..., >buflen=16384) at topology-xml.c:1193 > #2 0x2ae61be0 in hwloc__nolibxml_prepare_export (topology=0x695f10, > bufferp=0x7fffd988, buflenp=0x7fffd97c) >at topology-xml.c:1207 > #3 0x2ae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer > (topology=0x695f10, xmlbuffer=0x7fffd988, >buflen=0x7fffd97c) at topology-xml.c:1281 > #4 0x2ae529f4 in opal_hwloc_compare (topo1=0x695f10, topo2=0x6915c0, > type=22 '\026') at base/hwloc_base_dt.c:183 > #5 0x2adf348c in opal_dss_compare (value1=0x695f10, value2=0x6915c0, > type=22 '\026') at dss/dss_compare.c:39 > #6 0x2ad9b5f7 in process_orted_launch_report (fd=-1, event=1, > data=0x6444d0) at base/plm_base_launch_support.c:564 > #7 0x2ae3881f in event_process_active_single_queue (base=0x60dd60, > activeq=0x6111e0) at event.c:1329 > #8 0x2ae38c71 in event_process_active (base=0x60dd60) at event.c:1396 > #9 0x2ae3902b in opal_libevent2012_event_base_loop (base=0x60dd60, > flags=1) at event.c:1598 > #10 0x2adf080d in opal_progress () at runtime/opal_progress.c:189 > #11 0x2ad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at > base/plm_base_launch_support.c:666 > #12 0x2ada49e1 in plm_slurm_launch_job (jdata=0x67a500) at > plm_slurm_module.c:404 > #13 0x00403822 in orterun (argc=4, argv=0x7fffe1d8) at > orterun.c:817 > #14 0x00402aa3 in main (argc=4, argv=0x7fffe1d8) at main.c:13 > > And the error report > > Program received signal SIGSEGV, Segmentation fault. > 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, > topology=0x695f10, obj=0x2b139b28) >at topology-xml.c:1094 > 1094 sprintf(tmp, "%llu", (unsigned long long) > obj->memory.page_types[i].count); > (gdb) print obj > $1 = (opal_hwloc122_hwloc_obj_t) 0x2b139b28 > (gdb) print *obj > $2 = {type = 2870188824, os_index = 10922, name = 0x2b139b18 > "\b\233\023\253\252*", memory = {total_memory = 6579376, >local_memory = 6579376, page_types_len = 2870188856, page_types = > 0x2b139b38}, attr = 0x2b139b48, > depth = 2870188872, logical_index = 10922, os_level = -1424778408, > next_cousin = 0x2b139b58, > prev_cousin = 0x2b139b68, parent = 0x2b139b68, sibling_rank = > 2870188920, next_sibling = 0x2b139b78, > prev_sibling = 0x2b139b88, arity = 2870188936, children = > 0x2b139b98, first_child = 0x2b139b98, > last_child = 0x2b139ba8, userdata = 0x2b139ba8, cpuset = > 0x2b139bb8, complete_cpuset = 0x2b139bb8, > online_cpuset = 0x2b139bc8, allowed_cpuset = 0x2b139bc8, nodeset = > 0x2b139bd8, > complete_nodeset = 0x2b139bd8, allowed_nodeset = 0x2b139be8, > distances = 0x2b139be8, > distances_count = 2870189048, infos = 0x2b139bf8, infos_count = > 2870189064} > (gdb) print obj->memory > $3 = {total_memory = 6579376, local_memory = 6579376, page_types_len = > 2870188856, page_types = 0x2b139b38} > (gdb) print obj->memory.page_types > $4 = (struct opal_hwloc122_hwloc_obj_memory_page_type_s *) 0x2b139b38 > (gdb) print i > $5 = 1612 > (gdb) print obj->memory.page_types[1600] > $6 = {size = 0, count = 0} > (gdb) print obj->memory.page_types[1612] > Cannot access memory at address 0x2b13fff8 > (gdb) print obj->memory.page_types[1611] > $7 = {size = 0, count = 0} > (gdb) > > > The whole obj looks like trash to me. I looked a little more - the object > referenced is the root object: > > 1193hwloc__xml_export_object (, topology, >
Re: [hwloc-devel] Something lighter-weight than XML?
Here's the trace: #0 0x2ae61737 in hwloc__xml_export_object (output=0x7fffd890, topology=0x695f10, obj=0x2b139b28) at topology-xml.c:1094 #1 0x2ae61b69 in hwloc___nolibxml_prepare_export (topology=0x695f10, xmlbuffer=0x698a70 "\n\n\n memory.page_types[i].count); (gdb) print obj $1 = (opal_hwloc122_hwloc_obj_t) 0x2b139b28 (gdb) print *obj $2 = {type = 2870188824, os_index = 10922, name = 0x2b139b18 "\b\233\023\253\252*", memory = {total_memory = 6579376, local_memory = 6579376, page_types_len = 2870188856, page_types = 0x2b139b38}, attr = 0x2b139b48, depth = 2870188872, logical_index = 10922, os_level = -1424778408, next_cousin = 0x2b139b58, prev_cousin = 0x2b139b68, parent = 0x2b139b68, sibling_rank = 2870188920, next_sibling = 0x2b139b78, prev_sibling = 0x2b139b88, arity = 2870188936, children = 0x2b139b98, first_child = 0x2b139b98, last_child = 0x2b139ba8, userdata = 0x2b139ba8, cpuset = 0x2b139bb8, complete_cpuset = 0x2b139bb8, online_cpuset = 0x2b139bc8, allowed_cpuset = 0x2b139bc8, nodeset = 0x2b139bd8, complete_nodeset = 0x2b139bd8, allowed_nodeset = 0x2b139be8, distances = 0x2b139be8, distances_count = 2870189048, infos = 0x2b139bf8, infos_count = 2870189064} (gdb) print obj->memory $3 = {total_memory = 6579376, local_memory = 6579376, page_types_len = 2870188856, page_types = 0x2b139b38} (gdb) print obj->memory.page_types $4 = (struct opal_hwloc122_hwloc_obj_memory_page_type_s *) 0x2b139b38 (gdb) print i $5 = 1612 (gdb) print obj->memory.page_types[1600] $6 = {size = 0, count = 0} (gdb) print obj->memory.page_types[1612] Cannot access memory at address 0x2b13fff8 (gdb) print obj->memory.page_types[1611] $7 = {size = 0, count = 0} (gdb) The whole obj looks like trash to me. I looked a little more - the object referenced is the root object: 1193 hwloc__xml_export_object (, topology, hwloc_get_root_obj(topology)); I'm continuing to look in case I'm doing something stupid, but the code is pretty linear here - unpack, import, export for compare. On Sep 24, 2011, at 8:59 AM, Jeff Squyres wrote: > Here's some feedback from Ralph -- any idea what's going wrong here? > > - > > 1. I export a topology into xml using > > hwloc_topology_export_xmlbuffer(t, , ); > > I then pack and send the string. > > 2. I unpack the string on the other end and import it into a topology > hwloc_topology_init(); > if (0 != (rc = hwloc_topology_set_xmlbuffer(t, xmlbuffer, > strlen(xmlbuffer { > hwloc_topology_destroy(t); > goto cleanup; > } > hwloc_topology_load(t); > > 3. I then need to compare two topologies, so I export the topology I received > into another xml string > hwloc_topology_export_xmlbuffer(t1, , ); > > It is this export that fails, which implies to me that somehow the import > didn't work right. Note that this code worked fine with libxml2, so this is a > regression. > > > On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote: > >> Yes, I can get some testing of the ompi branch pretty quickly. I can bring >> in a new copy of this later today and see what we can see. >> >> Many thanks! >> >> >> On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote: >> >>> I pushed the new minimalistic XML import/export implementation without >>> libxml2 to the nolibxml branch. If libxml2 is available, it's still used >>> by default. --disable-libxml2 or some env variables can be used for >>> force the minimalistic implementation if needed. The minimalistic implem >>> is only guaranteed to import XML files that were generated by hwloc >>> (even if libxml was enabled there). >>> >>> I also backported most of this to the new v1.2-ompi branch (required to >>> backport some other XML cleanups from trunk). This branch will now serve >>> as a base for Open MPI's embedded hwloc. The idea is to have a complete >>> v1.2 + nolibxml somewhere so that we can at least run make check (Open >>> MPI does not embed enough to run hwloc's make check). >>> >>> How do we proceed now? Can we have the OMPI guys test the new code soon? >>> Should I wait for their feedback before merging the nolibxml branch into >>> the trunk? I'd like to merge this in v1.3 too (and basically release rc2 >>> as the actual first feature-complete RC), so getting feedback early >>> might be appreciated. >>> >>> Brice >>> >>> ___ >>> hwloc-devel mailing list >>> hwloc-de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> hwloc-devel mailing list >> hwloc-de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel > >
Re: [hwloc-devel] Something lighter-weight than XML?
Here's some feedback from Ralph -- any idea what's going wrong here? - 1. I export a topology into xml using hwloc_topology_export_xmlbuffer(t, , ); I then pack and send the string. 2. I unpack the string on the other end and import it into a topology hwloc_topology_init(); if (0 != (rc = hwloc_topology_set_xmlbuffer(t, xmlbuffer, strlen(xmlbuffer { hwloc_topology_destroy(t); goto cleanup; } hwloc_topology_load(t); 3. I then need to compare two topologies, so I export the topology I received into another xml string hwloc_topology_export_xmlbuffer(t1, , ); It is this export that fails, which implies to me that somehow the import didn't work right. Note that this code worked fine with libxml2, so this is a regression. On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote: > Yes, I can get some testing of the ompi branch pretty quickly. I can bring > in a new copy of this later today and see what we can see. > > Many thanks! > > > On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote: > >> I pushed the new minimalistic XML import/export implementation without >> libxml2 to the nolibxml branch. If libxml2 is available, it's still used >> by default. --disable-libxml2 or some env variables can be used for >> force the minimalistic implementation if needed. The minimalistic implem >> is only guaranteed to import XML files that were generated by hwloc >> (even if libxml was enabled there). >> >> I also backported most of this to the new v1.2-ompi branch (required to >> backport some other XML cleanups from trunk). This branch will now serve >> as a base for Open MPI's embedded hwloc. The idea is to have a complete >> v1.2 + nolibxml somewhere so that we can at least run make check (Open >> MPI does not embed enough to run hwloc's make check). >> >> How do we proceed now? Can we have the OMPI guys test the new code soon? >> Should I wait for their feedback before merging the nolibxml branch into >> the trunk? I'd like to merge this in v1.3 too (and basically release rc2 >> as the actual first feature-complete RC), so getting feedback early >> might be appreciated. >> >> Brice >> >> ___ >> hwloc-devel mailing list >> hwloc-de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
On Sep 22, 2011, at 11:41 AM, Brice Goglin wrote: > This is strange. I just tried building the hwloc tree with prefixing > enabled, I could not find any problem (except one missing symbol that > doesn't matter here, see next commits). > > Basically nothing has changed outside of src/topology-xml.c, the above > symbols existed before, they still exist. I don't understand why their > renaming would fail now. However, those were in #ifdef HWLOC_HAVE_XML > before, but this symbol isn't used anymore. Did you rerun autogen ? Disregard -- I think this is a problem in how we're slurping OMPI's hwloc into our tree... nothing wrong in hwloc itself... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
I have to run out ATM so I can't dig into this deeply for a few hours, but with a first take, I'm getting this error: Making all in src CC topology.lo topology.c: In function 'hwloc_discover': topology.c:1673: error: 'OPAL_HWLOC121hwloc_BACKEND_XML' undeclared (first use in this function) topology.c:1673: error: (Each undeclared identifier is reported only once topology.c:1673: error: for each function it appears in.) topology.c:1674: error: implicit declaration of function 'opal_hwloc121hwloc_look_xml' topology.c: In function 'hwloc_backend_exit': topology.c:2078: error: 'OPAL_HWLOC121hwloc_BACKEND_XML' undeclared (first use in this function) topology.c:2079: error: implicit declaration of function 'opal_hwloc121hwloc_backend_xml_exit' topology.c: In function 'opal_hwloc121hwloc_topology_set_xml': topology.c:2134: error: implicit declaration of function 'opal_hwloc121hwloc_backend_xml_init' make[1]: *** [topology.lo] Error 1 make: *** [all-recursive] Error 1 We use the hwloc prefix stuff in the OMPI embedded build; did something not get prefixed properly in the minimal XML stuff? Back in a few hours... On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote: > Yes, I can get some testing of the ompi branch pretty quickly. I can bring > in a new copy of this later today and see what we can see. > > Many thanks! > > > On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote: > >> I pushed the new minimalistic XML import/export implementation without >> libxml2 to the nolibxml branch. If libxml2 is available, it's still used >> by default. --disable-libxml2 or some env variables can be used for >> force the minimalistic implementation if needed. The minimalistic implem >> is only guaranteed to import XML files that were generated by hwloc >> (even if libxml was enabled there). >> >> I also backported most of this to the new v1.2-ompi branch (required to >> backport some other XML cleanups from trunk). This branch will now serve >> as a base for Open MPI's embedded hwloc. The idea is to have a complete >> v1.2 + nolibxml somewhere so that we can at least run make check (Open >> MPI does not embed enough to run hwloc's make check). >> >> How do we proceed now? Can we have the OMPI guys test the new code soon? >> Should I wait for their feedback before merging the nolibxml branch into >> the trunk? I'd like to merge this in v1.3 too (and basically release rc2 >> as the actual first feature-complete RC), so getting feedback early >> might be appreciated. >> >> Brice >> >> ___ >> hwloc-devel mailing list >> hwloc-de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
I pushed the new minimalistic XML import/export implementation without libxml2 to the nolibxml branch. If libxml2 is available, it's still used by default. --disable-libxml2 or some env variables can be used for force the minimalistic implementation if needed. The minimalistic implem is only guaranteed to import XML files that were generated by hwloc (even if libxml was enabled there). I also backported most of this to the new v1.2-ompi branch (required to backport some other XML cleanups from trunk). This branch will now serve as a base for Open MPI's embedded hwloc. The idea is to have a complete v1.2 + nolibxml somewhere so that we can at least run make check (Open MPI does not embed enough to run hwloc's make check). How do we proceed now? Can we have the OMPI guys test the new code soon? Should I wait for their feedback before merging the nolibxml branch into the trunk? I'd like to merge this in v1.3 too (and basically release rc2 as the actual first feature-complete RC), so getting feedback early might be appreciated. Brice
Re: [hwloc-devel] Something lighter-weight than XML?
My $0.02: for simplicity, let's force ASCII-only. If we get complaints/feature requests, we can see about updating to include non-ASCII. But then again, I'm biased because I'm an American. You guys might have different views -- e.g., you need non-ASCII for your organization's name. On Sep 5, 2011, at 11:04 AM, Brice Goglin wrote: > Regarding XML encoding: > > It seems that libxml2 rewrites the following characters as XML entities: > \n > \r > \t > " > < >> > & > > > hwloc already tells libxml2 to export as UTF-8. However, a quick check > seems to say that the output is not UTF8 when the locale isn't UTF8. We > may need to cdouble-check/clarify/fix this. > > Or we can enforce ASCII-only for all strings. Should be OK for all > strings we import from the OS. Will need to be enforced for user-given > strings (object info attributes). > > Brice > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
On Sep 5, 2011, at 2:22 AM, Brice Goglin wrote: > Samuel thinks we could stay with XML and reimplement our own > parsing/dumping without libxml2. > > My feeling about this is: > + We would have a single file format for import/export. > + Saving would likely be easy (copy-paste from the current code and/or > from the JSON export) > - Parsing would require some work (the libxml2-based parser isn't easy > to modify, but we could adapt the JSON parser) Is there a way to make the parsing easier? I.e., do we have to accept fully generic XML? Or can we restrict it somehow such that the parsing becomes much more deterministic / simpler? > - Encoding may be annoying. libxml2 does a lot of things to manage > strings properly. There's not a lot of special character in a usual XML > output, but there can be (because the user can annotate the objects). > - I am a bit afraid that we would go from a well-working XML support to > something much less reliable (do we need to be fully XML compliant so > that external programs can load our XML files and play with them?) A fair point. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/09/11 19:54, Jeff Squyres wrote: > Fail enough, Nice Freudian slip. :-) > but do the back-end nodes have libxml? Apparently so.. rpm --root /bgsys/drivers/ppcfloor/linux/OS -qa | grep -i xml libxml2-2.6.23-22 That's the I/O node filesystem, which is what the compute node kernel maps I/O's back to I believe. Mind you most people on BG do statically linking as dynamic linking is rather new there. > For us to do what we want, it would need to be available on > all nodes because the OMPI orted processes would be querying > hwloc for the local topology and then sending it to the "head" > node process (usually mpirun) for further analysis and process > mapping. Umm, not sure that'll work on a BG because you can't fork() or execve() on a BG, the IBM mpirun runs on the login node and talks to an mpirund on the service node which then launches the users code on the compute nodes via the Navigator API. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5kgRIACgkQO2KABBYQAh+EHgCfQhsNl5axcV+tHQ6jrAJW6Pq6 6EQAn3Dc4qkwoRd23KimXh9rrO0CKz9n =xlWv -END PGP SIGNATURE-
Re: [hwloc-devel] Something lighter-weight than XML?
Samuel thinks we could stay with XML and reimplement our own parsing/dumping without libxml2. My feeling about this is: + We would have a single file format for import/export. + Saving would likely be easy (copy-paste from the current code and/or from the JSON export) - Parsing would require some work (the libxml2-based parser isn't easy to modify, but we could adapt the JSON parser) - Encoding may be annoying. libxml2 does a lot of things to manage strings properly. There's not a lot of special character in a usual XML output, but there can be (because the user can annotate the objects). - I am a bit afraid that we would go from a well-working XML support to something much less reliable (do we need to be fully XML compliant so that external programs can load our XML files and play with them?) Opinions? Brice
Re: [hwloc-devel] Something lighter-weight than XML?
JSON looks a bit more verbose than YAML, but JSON also looked better for our hierarchical information, so I gave JSON a try. I just pushed the result to the new json branch. Notes: * You can only load/save from/to a memory buffer (set_jsonbufffer and export_jsonbuffer), but lstopo needed to load/save from/to a file, so I could add the corresponding routines (set_json and export_json) to the public interface to match what we have for XML * We don't care about the validity of our JSON output, but some quick tests seem to say that it's OK anyway * I tried to handle most parsing errors, it should not crash during parsing, but it may crash later after the discovery (e.g. if you get an error within a child before finishing its parent). It's not clear that it's worse than XML. Loading a bogus JSON or XML topology is a user error anyway :) * Distances needs rework (the same I did for XML recently). I didn't do it because it would make backporting to
Re: [hwloc-devel] Something lighter-weight than XML?
On Fri, Sep 02, 2011 at 05:57:11AM -0400, Jeff Squyres wrote: > JSON: sure, it's an easy format, but we're not really targeting web-ish kinds > of things here, are we? The format isn't only for web-ish. A lot of embebbed apps use it. I send an example in attach and use this site to do it: http://www.thomasfrank.se/xml_to_json.html Cheers! > > YAML: ya, that's also an easy format. > > But the goal here is to do something utterly trivial that has no support > library requirement. Unless someone has specific requirements for these > formats, I'm ok with a totally trivial and > not-necessarily-compatibilte-with-anyone-else's-format format. > > > On Sep 1, 2011, at 9:38 PM, Christopher Samuel wrote: > > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > On 02/09/11 01:30, Jeff Squyres wrote: > > > >> Is there any chance that a lighter-weight, simple string > >> parsing module could be added to hwloc? > > > > What about something based on YAML ? > > > > http://www.yaml.org/spec/1.2/spec.html > > > > Designed to be easy to read by a human.. > > > > - -- > >Christopher Samuel - Senior Systems Administrator > > VLSCI - Victorian Life Sciences Computation Initiative > > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > > http://www.vlsci.unimelb.edu.au/ > > > > -BEGIN PGP SIGNATURE- > > Version: GnuPG v1.4.11 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > > > iEYEARECAAYFAk5gM5YACgkQO2KABBYQAh8LAgCgh9dBLor3Sfiw8PCDvffZxjN1 > > j/YAnjB9vno4MY34DSxOwWT45yyU29y/ > > =/FPJ > > -END PGP SIGNATURE- > > ___ > > hwloc-devel mailing list > > hwloc-de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Degree Alaniz Marcelo Frontend Development HPC PhD Student out.json Description: application/json signature.asc Description: Digital signature
Re: [hwloc-devel] Something lighter-weight than XML?
I don't know enough about either format to say. Sent from my phone. No type good. On Sep 2, 2011, at 6:03 AM, "Samuel Thibault"wrote: > Jeff Squyres, le Fri 02 Sep 2011 11:58:05 +0200, a écrit : >> JSON: sure, it's an easy format, but we're not really targeting web-ish >> kinds of things here, are we? >> >> YAML: ya, that's also an easy format. >> >> But the goal here is to do something utterly trivial that has no support >> library requirement. Unless someone has specific requirements for these >> formats, I'm ok with a totally trivial and >> not-necessarily-compatibilte-with-anyone-else's-format format. > > If we can easily implement our own parser for json or yaml (or some > other standard format), we should simply go for one of them. > > Samuel > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
Re: [hwloc-devel] Something lighter-weight than XML?
Jeff Squyres, le Fri 02 Sep 2011 11:58:05 +0200, a écrit : > JSON: sure, it's an easy format, but we're not really targeting web-ish kinds > of things here, are we? > > YAML: ya, that's also an easy format. > > But the goal here is to do something utterly trivial that has no support > library requirement. Unless someone has specific requirements for these > formats, I'm ok with a totally trivial and > not-necessarily-compatibilte-with-anyone-else's-format format. If we can easily implement our own parser for json or yaml (or some other standard format), we should simply go for one of them. Samuel
Re: [hwloc-devel] Something lighter-weight than XML?
JSON: sure, it's an easy format, but we're not really targeting web-ish kinds of things here, are we? YAML: ya, that's also an easy format. But the goal here is to do something utterly trivial that has no support library requirement. Unless someone has specific requirements for these formats, I'm ok with a totally trivial and not-necessarily-compatibilte-with-anyone-else's-format format. On Sep 1, 2011, at 9:38 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 02/09/11 01:30, Jeff Squyres wrote: > >> Is there any chance that a lighter-weight, simple string >> parsing module could be added to hwloc? > > What about something based on YAML ? > > http://www.yaml.org/spec/1.2/spec.html > > Designed to be easy to read by a human.. > > - -- >Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk5gM5YACgkQO2KABBYQAh8LAgCgh9dBLor3Sfiw8PCDvffZxjN1 > j/YAnjB9vno4MY34DSxOwWT45yyU29y/ > =/FPJ > -END PGP SIGNATURE- > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
On Sep 1, 2011, at 9:40 PM, Christopher Samuel wrote: > Well BG/P doesn't support Open-MPI, but the service > (management) node and the front end (login) nodes are > PPC SLES10 and libxml2 is there.. > > tambo-m:~ # rpm -q libxml2 > libxml2-2.6.23-15.25.5 Fail enough, but do the back-end nodes have libxml? For us to do what we want, it would need to be available on all nodes because the OMPI orted processes would be querying hwloc for the local topology and then sending it to the "head" node process (usually mpirun) for further analysis and process mapping. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/09/11 02:01, Jeff Squyres wrote: > Blue Gene? Well BG/P doesn't support Open-MPI, but the service (management) node and the front end (login) nodes are PPC SLES10 and libxml2 is there.. tambo-m:~ # rpm -q libxml2 libxml2-2.6.23-15.25.5 cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5gNCIACgkQO2KABBYQAh/slQCePYvmBweezxSw0B+GySgdpmz8 bZIAn2MB0wg6ahQomHqWtiocCRZcYm/O =DLRU -END PGP SIGNATURE-
Re: [hwloc-devel] Something lighter-weight than XML?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/09/11 01:30, Jeff Squyres wrote: > Is there any chance that a lighter-weight, simple string > parsing module could be added to hwloc? What about something based on YAML ? http://www.yaml.org/spec/1.2/spec.html Designed to be easy to read by a human.. - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5gM5YACgkQO2KABBYQAh8LAgCgh9dBLor3Sfiw8PCDvffZxjN1 j/YAnjB9vno4MY34DSxOwWT45yyU29y/ =/FPJ -END PGP SIGNATURE-
Re: [hwloc-devel] Something lighter-weight than XML?
Jeff Squyres, le Thu 01 Sep 2011 20:31:44 +0200, a écrit : > hst: hwloc simple text I like this one. Samuel
Re: [hwloc-devel] Something lighter-weight than XML?
On Sep 1, 2011, at 12:17 PM, Brice Goglin wrote: > Support for the most useful attributes would be done within a couple > hours. The annoying thing is to support all attributes, distances, ... If you kruft up some of the infrastructure and some examples, I could volunteer some grunt work to fill in the rest. > Also we'd need to find a good name for this new backend. Something away > from "text" because we already have the txt and ncurses outputs in > lstopo :) Maybe "hwloc". hwloc_topology_export_hwlocbuffer() and "lstopo > foo.hwloc" :) Or "htx" for "hwloc tiny xml". Hah! htx might work. Or: hst: hwloc simple text hnx: hwloc NoXML htt: hwloc trivial text simple: obvious serialized: obvious string: obvious newline: because the fields are newline-delimited ...? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
Le 01/09/2011 18:01, Jeff Squyres a écrit : > I could *probably* write this, but I'm guessing you guys could write it much > faster than I could... Support for the most useful attributes would be done within a couple hours. The annoying thing is to support all attributes, distances, ... Also we'd need to find a good name for this new backend. Something away from "text" because we already have the txt and ncurses outputs in lstopo :) Maybe "hwloc". hwloc_topology_export_hwlocbuffer() and "lstopo foo.hwloc" :) Or "htx" for "hwloc tiny xml". Brice
Re: [hwloc-devel] Something lighter-weight than XML?
On Sep 1, 2011, at 11:49 AM, Brice Goglin wrote: > Did you actually find many machines/distribs that don't have libxml2 > installed by default? There are literaly hundreds of packages that depend on > libxml2 (at least in Debian) so I am not sure depending on it is really a > problem. Cray, for sure. Josh told me off-list that it's a real PITA for them to build/support libxml on the ORNL Crays. Blue Gene? Windows? > Also are there really some string space problems? No. The space savings is a minor benefit; I only included it for completeness. > Otherwise, implementing this is likely easy, especially if you find somebody > to do it :) Start from the XML export, convert it into a text export, and > write the corresponding import (starting from the XML import may be hard > because it's recursive). > > Would you need an export to a file or to a memory buffer or both? Memory buffer would be most preferable, because we're going to generate it on the back end node, pack it to a network buffer, send it, receive it on the head node, unpack it from the network buffer, and slurp it into a hwloc topology. > Last but not least: what's the deadline? Ralph is actively working on code for the RFC I sent around yesterday: http://www.open-mpi.org/community/lists/devel/2011/08/9737.php We'll probably use XML just to get it going, but it would be good to not equate "libxmpi" with "hwloc" in OMPI developers' brains. :-) So -- "sometime soon" would be nice. I could *probably* write this, but I'm guessing you guys could write it much faster than I could... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] Something lighter-weight than XML?
Jeff Squyres, le Thu 01 Sep 2011 17:31:05 +0200, a écrit : > Do you think this would be easy to implement? A quite strict format could probably be easy to implement and still be extensible. The XML will probably remain useful for people who like XSLT :) Samuel
Re: [hwloc-devel] Something lighter-weight than XML?
Did you actually find many machines/distribs that don't have libxml2 installed by default? There are literaly hundreds of packages that depend on libxml2 (at least in Debian) so I am not sure depending on it is really a problem. Also are there really some string space problems? Even when talking about 1000 nodes transferring 100kB once at the beginning on the job, it doesn't look too bad to me (and these XMLs could be cached on the frontend as long as the compute nodes don't change). Otherwise, implementing this is likely easy, especially if you find somebody to do it :) Start from the XML export, convert it into a text export, and write the corresponding import (starting from the XML import may be hard because it's recursive). Would you need an export to a file or to a memory buffer or both? Last but not least: what's the deadline? Brice Le 01/09/2011 17:30, Jeff Squyres a écrit : > We're (finally) bringing full hwloc services up in Open MPI. > > One of the things we want to do is send server topologies from back-end > compute nodes to the front-end node. The XML export/import functionality > would work for this, but a) it's a bit heavyweight, and b) it seems weird to > require XML to build MPI. > > Is there any chance that a lighter-weight, simple string parsing module could > be added to hwloc? I'm guessing that we could save a modest amount of string > space (SWAG: 20%?), but we wouldn't need a dependency on libxml, which would > be good. > > I took a lstopo --no-io foo.xml output on an older xeon machine and, while > sitting on a boring teleconf, I manually converted it in emacs to a > (slightly) simpler text format. I attached the two files. There's a modest > space savings (about 17%). But libxml clearly would not be necessary. > > Do you think this would be easy to implement? > > > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel