Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Samuel Thibault, on mer. 20 déc. 2017 18:26:37 +0100, wrote: > Brice Goglin, on mer. 20 déc. 2017 18:16:34 +0100, wrote: > > Le 20/12/2017 à 18:06, Samuel Thibault a écrit : > > > It has only one NUMA node, thus triggering the code I patched over. > > > > Well, this has been working fine for a while, since that's my daily > > development machine and all our jenkins slaves. > > > > Can you give the usually requested details about the OS, kernel, > > hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved? > > Your commit 301c0f94e0a54823bfd530c36b5f9c9d9862332b seems to have fixed > it. > > It's Debian Buster, kernel 4.14.0, and attached gathers. Mmm, it seems the x86 backend gets triggered somehow: this is the first hwloc_topology_reconnect call: #0 hwloc_topology_reconnect (topology=topology@entry=0x5577f060, flags=flags@entry=0) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:2910 #1 0x77bc93e2 in hwloc_x86_discover (backend=0x5577f890) at /home/samy/recherche/hwloc/hwloc/hwloc/topology-x86.c:1264 #2 0x77ba1595 in hwloc_discover (topology=0x5577f060) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3008 #3 hwloc_topology_load (topology=0x5577f060) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3584 #4 0x84f6 in main (argc=, argv=) at /home/samy/recherche/hwloc/hwloc/utils/lstopo/lstopo.c:995 and then the second: #0 hwloc_topology_reconnect (topology=0x5577f060, flags=0) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:2910 #1 0x77ba1707 in hwloc_discover (topology=0x5577f060) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3103 #2 hwloc_topology_load (topology=0x5577f060) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3584 #3 0x84f6 in main (argc=, argv=) at /home/samy/recherche/hwloc/hwloc/utils/lstopo/lstopo.c:995 and a third: #0 hwloc_topology_reconnect (topology=0x5577f060, flags=0) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:2910 #1 0x77ba1789 in hwloc_discover (topology=0x5577f060) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3149 #2 hwloc_topology_load (topology=0x5577f060) at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3584 #3 0x84f6 in main (argc=, argv=) at /home/samy/recherche/hwloc/hwloc/utils/lstopo/lstopo.c:995 (these are all with git 220ee3eb926ca6bfa175d9700ab56d14a554cea4) I have attached the cpuid/ directory. Samuel cpuid.tgz Description: application/gtar-compressed ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Thanks. Your machine is very similar to mine, running a similar Debian, with 4.14 too. And my local build still doesn't crash. Maybe a different compiler causing the bug to appear more often on yours (Debian's 7.2.0-16 here). Let's forget about it if it's fixed now :) Brice Le 20/12/2017 à 18:26, Samuel Thibault a écrit : > Brice Goglin, on mer. 20 déc. 2017 18:16:34 +0100, wrote: >> Le 20/12/2017 à 18:06, Samuel Thibault a écrit : >>> It has only one NUMA node, thus triggering the code I patched over. >> Well, this has been working fine for a while, since that's my daily >> development machine and all our jenkins slaves. >> >> Can you give the usually requested details about the OS, kernel, >> hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved? > Your commit 301c0f94e0a54823bfd530c36b5f9c9d9862332b seems to have fixed > it. > > It's Debian Buster, kernel 4.14.0, and attached gathers. > > Samuel > > > ___ > hwloc-devel mailing list > hwloc-devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-devel ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
I have uploaded it to debian experimental, so when it passes NEW, various arch test results will show up on https://buildd.debian.org/status/package.php?p=hwloc=experimental so you can check the results on odd systems :) Samuel ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Brice Goglin, on mer. 20 déc. 2017 18:16:34 +0100, wrote: > Le 20/12/2017 à 18:06, Samuel Thibault a écrit : > > It has only one NUMA node, thus triggering the code I patched over. > > Well, this has been working fine for a while, since that's my daily > development machine and all our jenkins slaves. > > Can you give the usually requested details about the OS, kernel, > hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved? Your commit 301c0f94e0a54823bfd530c36b5f9c9d9862332b seems to have fixed it. It's Debian Buster, kernel 4.14.0, and attached gathers. Samuel Machine (P#0 total=8022768KB DMIProductName="HP EliteBook 820 G2" DMIProductVersion=A3008C410003 DMIBoardVendor=Hewlett-Packard DMIBoardName=225A DMIBoardVersion="KBC Version 96.54" DMIBoardAssetTag= DMIChassisVendor=Hewlett-Packard DMIChassisType=10 DMIChassisVersion= DMIChassisAssetTag=5CG5201YZY DMIBIOSVendor=Hewlett-Packard DMIBIOSVersion="M71 Ver. 01.04" DMIBIOSDate=02/24/2015 DMISysVendor=Hewlett-Packard Backend=Linux LinuxCgroup=/ hwlocVersion=2.0.0a1-git ProcessName=lstopo-no-graphics) Package L#0 (P#0 total=8022768KB CPUModel="Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz" CPUVendor=GenuineIntel CPUFamilyNumber=6 CPUModelNumber=61 CPUStepping=4) NUMANode L#0 (P#0 local=8022768KB total=8022768KB) L3Cache L#0 (size=3072KB linesize=64 ways=12 Inclusive=1) L2Cache L#0 (size=256KB linesize=64 ways=8 Inclusive=0) L1dCache L#0 (size=32KB linesize=64 ways=8 Inclusive=0) L1iCache L#0 (size=32KB linesize=64 ways=8 Inclusive=0) Core L#0 (P#0) PU L#0 (P#0) PU L#1 (P#1) L2Cache L#1 (size=256KB linesize=64 ways=8 Inclusive=0) L1dCache L#1 (size=32KB linesize=64 ways=8 Inclusive=0) L1iCache L#1 (size=32KB linesize=64 ways=8 Inclusive=0) Core L#1 (P#1) PU L#2 (P#2) PU L#3 (P#3) HostBridge L#0 (buses=:[00-03]) PCI L#0 (busid=:00:02.0 id=8086:1616 class=0300(VGA) PCIVendor="Intel Corporation" PCIDevice="HD Graphics 5500") PCI L#1 (busid=:00:19.0 id=8086:15a2 class=0200(Ethernet) PCIVendor="Intel Corporation" PCIDevice="Ethernet Connection (3) I218-LM") Network L#0 (Address=48:0f:cf:28:82:c3) "enp0s25" PCIBridge L#1 (busid=:00:1c.3 id=8086:9c96 class=0604(PCIBridge) buses=:[03-03] PCIVendor="Intel Corporation" PCIDevice="Wildcat Point-LP PCI Express Root Port #4") PCI L#2 (busid=:03:00.0 id=8086:095a class=0280(Network) PCIVendor="Intel Corporation" PCIDevice="Wireless 7265") Network L#1 (Address=34:02:86:2c:6a:19) "wlo1" PCI L#3 (busid=:00:1f.2 id=8086:9c83 class=0106(SATA) PCIVendor="Intel Corporation" PCIDevice="Wildcat Point-LP SATA Controller [AHCI Mode]") Block(Disk) L#2 (Size=250059096 SectorSize=512 LinuxDeviceID=8:0 Model=MTFDDAK256MBF-1AN15ABHA Revision=M6T3 SerialNumber=14380F25F377) "sda" depth 0: 1 Machine (type #0) depth 1: 1 Package (type #1) depth 2: 1 L3Cache (type #6) depth 3:2 L2Cache (type #5) depth 4: 2 L1dCache (type #4) depth 5: 2 L1iCache (type #9) depth 6: 2 Core (type #2) depth 7:4 PU (type #3) Special depth -3: 1 NUMANode (type #13) Special depth -4: 2 Bridge (type #14) Special depth -5: 4 PCIDev (type #15) Special depth -6: 3 OSDev (type #16) Topology not from this system gather.xml Description: XML document gather.tar.bz2 Description: Binary data ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Le 20/12/2017 à 18:06, Samuel Thibault a écrit : > It has only one NUMA node, thus triggering the code I patched over. Well, this has been working fine for a while, since that's my daily development machine and all our jenkins slaves. Can you give the usually requested details about the OS, kernel, hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved? Brice ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Brice Goglin, on mer. 20 déc. 2017 17:53:54 +0100, wrote: > Le 20/12/2017 à 17:49, Samuel Thibault a écrit : > > Samuel Thibault, on mer. 20 déc. 2017 13:57:45 +0100, wrote: > >> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote: > >>> The Hardware Locality (hwloc) team is pleased to announce the first > >>> beta release for v2.0.0: > >>> > >>>http://www.open-mpi.org/projects/hwloc/ > >> I tried to build the Debian package, there are a few failures in the > >> testsuite: > >> > >> FAIL: test-lstopo.sh > >> FAIL: hwloc_bind > >> FAIL: hwloc_get_last_cpu_location > >> FAIL: hwloc_get_area_memlocation > >> FAIL: hwloc_object_userdata > >> FAIL: hwloc_backends > >> FAIL: hwloc_pci_backend > >> FAIL: hwloc_is_thissystem > >> FAIL: hwloc_topology_diff > >> FAIL: hwloc_topology_abi > >> FAIL: hwloc_obj_infos > >> FAIL: glibc-sched > >> ../.././config/test-driver: line 107: 27886 Segmentation fault "$@" > > >> $log_file 2>&1 > >> FAIL: hwloc-hello > >> ../.././config/test-driver: line 107: 27905 Segmentation fault "$@" > > >> $log_file 2>&1 > >> FAIL: hwloc-hello-cpp > >> > >> This is running inside a Debian Buster system. > > It seems to be fixed by the attached patch. > > > > I can't reproduce the issue, what's specific about your system? It has only one NUMA node, thus triggering the code I patched over. Samuel ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Le 20/12/2017 à 17:49, Samuel Thibault a écrit : > Samuel Thibault, on mer. 20 déc. 2017 13:57:45 +0100, wrote: >> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote: >>> The Hardware Locality (hwloc) team is pleased to announce the first >>> beta release for v2.0.0: >>> >>>http://www.open-mpi.org/projects/hwloc/ >> I tried to build the Debian package, there are a few failures in the >> testsuite: >> >> FAIL: test-lstopo.sh >> FAIL: hwloc_bind >> FAIL: hwloc_get_last_cpu_location >> FAIL: hwloc_get_area_memlocation >> FAIL: hwloc_object_userdata >> FAIL: hwloc_backends >> FAIL: hwloc_pci_backend >> FAIL: hwloc_is_thissystem >> FAIL: hwloc_topology_diff >> FAIL: hwloc_topology_abi >> FAIL: hwloc_obj_infos >> FAIL: glibc-sched >> ../.././config/test-driver: line 107: 27886 Segmentation fault "$@" > >> $log_file 2>&1 >> FAIL: hwloc-hello >> ../.././config/test-driver: line 107: 27905 Segmentation fault "$@" > >> $log_file 2>&1 >> FAIL: hwloc-hello-cpp >> >> This is running inside a Debian Buster system. > It seems to be fixed by the attached patch. > I can't reproduce the issue, what's specific about your system? I tried inside a debian-build-chroot, etc, ... Brice ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Samuel Thibault, on mer. 20 déc. 2017 13:57:45 +0100, wrote: > Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote: > > The Hardware Locality (hwloc) team is pleased to announce the first > > beta release for v2.0.0: > > > >http://www.open-mpi.org/projects/hwloc/ > > I tried to build the Debian package, there are a few failures in the > testsuite: > > FAIL: test-lstopo.sh > FAIL: hwloc_bind > FAIL: hwloc_get_last_cpu_location > FAIL: hwloc_get_area_memlocation > FAIL: hwloc_object_userdata > FAIL: hwloc_backends > FAIL: hwloc_pci_backend > FAIL: hwloc_is_thissystem > FAIL: hwloc_topology_diff > FAIL: hwloc_topology_abi > FAIL: hwloc_obj_infos > FAIL: glibc-sched > ../.././config/test-driver: line 107: 27886 Segmentation fault "$@" > > $log_file 2>&1 > FAIL: hwloc-hello > ../.././config/test-driver: line 107: 27905 Segmentation fault "$@" > > $log_file 2>&1 > FAIL: hwloc-hello-cpp > > This is running inside a Debian Buster system. It seems to be fixed by the attached patch. Samuel diff --git a/hwloc/topology.c b/hwloc/topology.c index d827d5f5..e0bf7beb 100644 --- a/hwloc/topology.c +++ b/hwloc/topology.c @@ -1,7 +1,7 @@ /* * Copyright © 2009 CNRS * Copyright © 2009-2017 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux + * Copyright © 2009-2012, 2017 Université Bordeaux * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. * See COPYING in top-level directory. */ @@ -1596,6 +1596,7 @@ hwloc__insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t root */ #endif + topology->modified = 1; if (hwloc_obj_type_is_memory(obj->type)) { if (!root) { root = hwloc__find_insert_memory_parent(topology, obj, report_error); @@ -3044,6 +3045,7 @@ next_cpubackend: memcpy(>attr->numanode, >machine_memory, sizeof(topology->machine_memory)); memset(>machine_memory, 0, sizeof(topology->machine_memory)); hwloc_insert_object_by_cpuset(topology, node); +hwloc_topology_reconnect(topology, 0); } else { /* if we're sure we found all NUMA nodes without their sizes (x86 backend?), * we could split topology->total_memory in all of them. ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote: > The Hardware Locality (hwloc) team is pleased to announce the first > beta release for v2.0.0: > >http://www.open-mpi.org/projects/hwloc/ I tried to build the Debian package, there are a few failures in the testsuite: FAIL: test-lstopo.sh FAIL: hwloc_bind FAIL: hwloc_get_last_cpu_location FAIL: hwloc_get_area_memlocation FAIL: hwloc_object_userdata FAIL: hwloc_backends FAIL: hwloc_pci_backend FAIL: hwloc_is_thissystem FAIL: hwloc_topology_diff FAIL: hwloc_topology_abi FAIL: hwloc_obj_infos FAIL: glibc-sched ../.././config/test-driver: line 107: 27886 Segmentation fault "$@" > $log_file 2>&1 FAIL: hwloc-hello ../.././config/test-driver: line 107: 27905 Segmentation fault "$@" > $log_file 2>&1 FAIL: hwloc-hello-cpp This is running inside a Debian Buster system. Samuel ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released
Hello, Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote: > The Hardware Locality (hwloc) team is pleased to announce the first > beta release for v2.0.0: > >http://www.open-mpi.org/projects/hwloc/ The tarball doesn't contain a netloc/ directory. This is not a problem for ./configure && make && make install, but it prevents from being able to run ./autogen.sh Samuel ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel