Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
Samuel Thibault, on mer. 20 déc. 2017 18:26:37 +0100, wrote:
> Brice Goglin, on mer. 20 déc. 2017 18:16:34 +0100, wrote:
> > Le 20/12/2017 à 18:06, Samuel Thibault a écrit :
> > > It has only one NUMA node, thus triggering the code I patched over.
> > 
> > Well, this has been working fine for a while, since that's my daily
> > development machine and all our jenkins slaves.
> > 
> > Can you give the usually requested details about the OS, kernel,
> > hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved?
> 
> Your commit 301c0f94e0a54823bfd530c36b5f9c9d9862332b seems to have fixed
> it.
> 
> It's Debian Buster, kernel 4.14.0, and attached gathers.

Mmm, it seems the x86 backend gets triggered somehow: this is the first
hwloc_topology_reconnect call:

#0  hwloc_topology_reconnect (topology=topology@entry=0x5577f060, 
flags=flags@entry=0)
at /home/samy/recherche/hwloc/hwloc/hwloc/topology.c:2910
#1  0x77bc93e2 in hwloc_x86_discover (backend=0x5577f890) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology-x86.c:1264
#2  0x77ba1595 in hwloc_discover (topology=0x5577f060) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3008
#3  hwloc_topology_load (topology=0x5577f060) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3584
#4  0x84f6 in main (argc=, argv=) at 
/home/samy/recherche/hwloc/hwloc/utils/lstopo/lstopo.c:995

and then the second:

#0  hwloc_topology_reconnect (topology=0x5577f060, flags=0) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:2910
#1  0x77ba1707 in hwloc_discover (topology=0x5577f060) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3103
#2  hwloc_topology_load (topology=0x5577f060) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3584
#3  0x84f6 in main (argc=, argv=) at 
/home/samy/recherche/hwloc/hwloc/utils/lstopo/lstopo.c:995

and a third:

#0  hwloc_topology_reconnect (topology=0x5577f060, flags=0) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:2910
#1  0x77ba1789 in hwloc_discover (topology=0x5577f060) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3149
#2  hwloc_topology_load (topology=0x5577f060) at 
/home/samy/recherche/hwloc/hwloc/hwloc/topology.c:3584
#3  0x84f6 in main (argc=, argv=) at 
/home/samy/recherche/hwloc/hwloc/utils/lstopo/lstopo.c:995

(these are all with git 220ee3eb926ca6bfa175d9700ab56d14a554cea4)

I have attached the cpuid/ directory.

Samuel


cpuid.tgz
Description: application/gtar-compressed
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Brice Goglin
Thanks.

Your machine is very similar to mine, running a similar Debian, with
4.14 too. And my local build still doesn't crash. Maybe a different
compiler causing the bug to appear more often on yours (Debian's
7.2.0-16 here). Let's forget about it if it's fixed now :)

Brice



Le 20/12/2017 à 18:26, Samuel Thibault a écrit :
> Brice Goglin, on mer. 20 déc. 2017 18:16:34 +0100, wrote:
>> Le 20/12/2017 à 18:06, Samuel Thibault a écrit :
>>> It has only one NUMA node, thus triggering the code I patched over.
>> Well, this has been working fine for a while, since that's my daily
>> development machine and all our jenkins slaves.
>>
>> Can you give the usually requested details about the OS, kernel,
>> hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved?
> Your commit 301c0f94e0a54823bfd530c36b5f9c9d9862332b seems to have fixed
> it.
>
> It's Debian Buster, kernel 4.14.0, and attached gathers.
>
> Samuel
>
>
> ___
> hwloc-devel mailing list
> hwloc-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
I have uploaded it to debian experimental, so when it passes NEW,
various arch test results will show up on 

https://buildd.debian.org/status/package.php?p=hwloc=experimental

so you can check the results on odd systems :)

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel


Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
Brice Goglin, on mer. 20 déc. 2017 18:16:34 +0100, wrote:
> Le 20/12/2017 à 18:06, Samuel Thibault a écrit :
> > It has only one NUMA node, thus triggering the code I patched over.
> 
> Well, this has been working fine for a while, since that's my daily
> development machine and all our jenkins slaves.
> 
> Can you give the usually requested details about the OS, kernel,
> hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved?

Your commit 301c0f94e0a54823bfd530c36b5f9c9d9862332b seems to have fixed
it.

It's Debian Buster, kernel 4.14.0, and attached gathers.

Samuel
Machine (P#0 total=8022768KB DMIProductName="HP EliteBook 820 G2" 
DMIProductVersion=A3008C410003 DMIBoardVendor=Hewlett-Packard DMIBoardName=225A 
DMIBoardVersion="KBC Version 96.54" DMIBoardAssetTag= 
DMIChassisVendor=Hewlett-Packard DMIChassisType=10 DMIChassisVersion= 
DMIChassisAssetTag=5CG5201YZY DMIBIOSVendor=Hewlett-Packard DMIBIOSVersion="M71 
Ver. 01.04" DMIBIOSDate=02/24/2015 DMISysVendor=Hewlett-Packard Backend=Linux 
LinuxCgroup=/ hwlocVersion=2.0.0a1-git ProcessName=lstopo-no-graphics)
  Package L#0 (P#0 total=8022768KB CPUModel="Intel(R) Core(TM) i5-5300U CPU @ 
2.30GHz" CPUVendor=GenuineIntel CPUFamilyNumber=6 CPUModelNumber=61 
CPUStepping=4)
NUMANode L#0 (P#0 local=8022768KB total=8022768KB)
L3Cache L#0 (size=3072KB linesize=64 ways=12 Inclusive=1)
  L2Cache L#0 (size=256KB linesize=64 ways=8 Inclusive=0)
L1dCache L#0 (size=32KB linesize=64 ways=8 Inclusive=0)
  L1iCache L#0 (size=32KB linesize=64 ways=8 Inclusive=0)
Core L#0 (P#0)
  PU L#0 (P#0)
  PU L#1 (P#1)
  L2Cache L#1 (size=256KB linesize=64 ways=8 Inclusive=0)
L1dCache L#1 (size=32KB linesize=64 ways=8 Inclusive=0)
  L1iCache L#1 (size=32KB linesize=64 ways=8 Inclusive=0)
Core L#1 (P#1)
  PU L#2 (P#2)
  PU L#3 (P#3)
  HostBridge L#0 (buses=:[00-03])
PCI L#0 (busid=:00:02.0 id=8086:1616 class=0300(VGA) PCIVendor="Intel 
Corporation" PCIDevice="HD Graphics 5500")
PCI L#1 (busid=:00:19.0 id=8086:15a2 class=0200(Ethernet) 
PCIVendor="Intel Corporation" PCIDevice="Ethernet Connection (3) I218-LM")
  Network L#0 (Address=48:0f:cf:28:82:c3) "enp0s25"
PCIBridge L#1 (busid=:00:1c.3 id=8086:9c96 class=0604(PCIBridge) 
buses=:[03-03] PCIVendor="Intel Corporation" PCIDevice="Wildcat Point-LP 
PCI Express Root Port #4")
  PCI L#2 (busid=:03:00.0 id=8086:095a class=0280(Network) 
PCIVendor="Intel Corporation" PCIDevice="Wireless 7265")
Network L#1 (Address=34:02:86:2c:6a:19) "wlo1"
PCI L#3 (busid=:00:1f.2 id=8086:9c83 class=0106(SATA) PCIVendor="Intel 
Corporation" PCIDevice="Wildcat Point-LP SATA Controller [AHCI Mode]")
  Block(Disk) L#2 (Size=250059096 SectorSize=512 LinuxDeviceID=8:0 
Model=MTFDDAK256MBF-1AN15ABHA Revision=M6T3 SerialNumber=14380F25F377) "sda"
depth 0:   1 Machine (type #0)
 depth 1:  1 Package (type #1)
  depth 2: 1 L3Cache (type #6)
   depth 3:2 L2Cache (type #5)
depth 4:   2 L1dCache (type #4)
 depth 5:  2 L1iCache (type #9)
  depth 6: 2 Core (type #2)
   depth 7:4 PU (type #3)
Special depth -3:  1 NUMANode (type #13)
Special depth -4:  2 Bridge (type #14)
Special depth -5:  4 PCIDev (type #15)
Special depth -6:  3 OSDev (type #16)
Topology not from this system


gather.xml
Description: XML document


gather.tar.bz2
Description: Binary data
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Brice Goglin
Le 20/12/2017 à 18:06, Samuel Thibault a écrit :
> It has only one NUMA node, thus triggering the code I patched over.

Well, this has been working fine for a while, since that's my daily
development machine and all our jenkins slaves.

Can you give the usually requested details about the OS, kernel,
hwloc-gather-topology? hwloc-gather-cpuid if the x86 backend is involved?

Brice

___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
Brice Goglin, on mer. 20 déc. 2017 17:53:54 +0100, wrote:
> Le 20/12/2017 à 17:49, Samuel Thibault a écrit :
> > Samuel Thibault, on mer. 20 déc. 2017 13:57:45 +0100, wrote:
> >> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
> >>> The Hardware Locality (hwloc) team is pleased to announce the first
> >>> beta release for v2.0.0:
> >>>
> >>>http://www.open-mpi.org/projects/hwloc/
> >> I tried to build the Debian package, there are a few failures in the
> >> testsuite:
> >>
> >> FAIL: test-lstopo.sh
> >> FAIL: hwloc_bind
> >> FAIL: hwloc_get_last_cpu_location
> >> FAIL: hwloc_get_area_memlocation
> >> FAIL: hwloc_object_userdata
> >> FAIL: hwloc_backends
> >> FAIL: hwloc_pci_backend
> >> FAIL: hwloc_is_thissystem
> >> FAIL: hwloc_topology_diff
> >> FAIL: hwloc_topology_abi
> >> FAIL: hwloc_obj_infos
> >> FAIL: glibc-sched
> >> ../.././config/test-driver: line 107: 27886 Segmentation fault  "$@" > 
> >> $log_file 2>&1
> >> FAIL: hwloc-hello
> >> ../.././config/test-driver: line 107: 27905 Segmentation fault  "$@" > 
> >> $log_file 2>&1
> >> FAIL: hwloc-hello-cpp
> >>
> >> This is running inside a Debian Buster system.
> > It seems to be fixed by the attached patch.
> >
> 
> I can't reproduce the issue, what's specific about your system?

It has only one NUMA node, thus triggering the code I patched over.

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Brice Goglin
Le 20/12/2017 à 17:49, Samuel Thibault a écrit :
> Samuel Thibault, on mer. 20 déc. 2017 13:57:45 +0100, wrote:
>> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
>>> The Hardware Locality (hwloc) team is pleased to announce the first
>>> beta release for v2.0.0:
>>>
>>>http://www.open-mpi.org/projects/hwloc/
>> I tried to build the Debian package, there are a few failures in the
>> testsuite:
>>
>> FAIL: test-lstopo.sh
>> FAIL: hwloc_bind
>> FAIL: hwloc_get_last_cpu_location
>> FAIL: hwloc_get_area_memlocation
>> FAIL: hwloc_object_userdata
>> FAIL: hwloc_backends
>> FAIL: hwloc_pci_backend
>> FAIL: hwloc_is_thissystem
>> FAIL: hwloc_topology_diff
>> FAIL: hwloc_topology_abi
>> FAIL: hwloc_obj_infos
>> FAIL: glibc-sched
>> ../.././config/test-driver: line 107: 27886 Segmentation fault  "$@" > 
>> $log_file 2>&1
>> FAIL: hwloc-hello
>> ../.././config/test-driver: line 107: 27905 Segmentation fault  "$@" > 
>> $log_file 2>&1
>> FAIL: hwloc-hello-cpp
>>
>> This is running inside a Debian Buster system.
> It seems to be fixed by the attached patch.
>

I can't reproduce the issue, what's specific about your system? I tried
inside a debian-build-chroot, etc, ...

Brice

___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel


Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
Samuel Thibault, on mer. 20 déc. 2017 13:57:45 +0100, wrote:
> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
> > The Hardware Locality (hwloc) team is pleased to announce the first
> > beta release for v2.0.0:
> > 
> >http://www.open-mpi.org/projects/hwloc/
> 
> I tried to build the Debian package, there are a few failures in the
> testsuite:
> 
> FAIL: test-lstopo.sh
> FAIL: hwloc_bind
> FAIL: hwloc_get_last_cpu_location
> FAIL: hwloc_get_area_memlocation
> FAIL: hwloc_object_userdata
> FAIL: hwloc_backends
> FAIL: hwloc_pci_backend
> FAIL: hwloc_is_thissystem
> FAIL: hwloc_topology_diff
> FAIL: hwloc_topology_abi
> FAIL: hwloc_obj_infos
> FAIL: glibc-sched
> ../.././config/test-driver: line 107: 27886 Segmentation fault  "$@" > 
> $log_file 2>&1
> FAIL: hwloc-hello
> ../.././config/test-driver: line 107: 27905 Segmentation fault  "$@" > 
> $log_file 2>&1
> FAIL: hwloc-hello-cpp
> 
> This is running inside a Debian Buster system.

It seems to be fixed by the attached patch.

Samuel
diff --git a/hwloc/topology.c b/hwloc/topology.c
index d827d5f5..e0bf7beb 100644
--- a/hwloc/topology.c
+++ b/hwloc/topology.c
@@ -1,7 +1,7 @@
 /*
  * Copyright © 2009 CNRS
  * Copyright © 2009-2017 Inria.  All rights reserved.
- * Copyright © 2009-2012 Université Bordeaux
+ * Copyright © 2009-2012, 2017 Université Bordeaux
  * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
  * See COPYING in top-level directory.
  */
@@ -1596,6 +1596,7 @@ hwloc__insert_object_by_cpuset(struct hwloc_topology 
*topology, hwloc_obj_t root
*/
 #endif
 
+  topology->modified = 1;
   if (hwloc_obj_type_is_memory(obj->type)) {
 if (!root) {
   root = hwloc__find_insert_memory_parent(topology, obj, report_error);
@@ -3044,6 +3045,7 @@ next_cpubackend:
 memcpy(>attr->numanode, >machine_memory, 
sizeof(topology->machine_memory));
 memset(>machine_memory, 0, sizeof(topology->machine_memory));
 hwloc_insert_object_by_cpuset(topology, node);
+hwloc_topology_reconnect(topology, 0);
   } else {
 /* if we're sure we found all NUMA nodes without their sizes (x86 
backend?),
  * we could split topology->total_memory in all of them.
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
> The Hardware Locality (hwloc) team is pleased to announce the first
> beta release for v2.0.0:
> 
>http://www.open-mpi.org/projects/hwloc/

I tried to build the Debian package, there are a few failures in the
testsuite:

FAIL: test-lstopo.sh
FAIL: hwloc_bind
FAIL: hwloc_get_last_cpu_location
FAIL: hwloc_get_area_memlocation
FAIL: hwloc_object_userdata
FAIL: hwloc_backends
FAIL: hwloc_pci_backend
FAIL: hwloc_is_thissystem
FAIL: hwloc_topology_diff
FAIL: hwloc_topology_abi
FAIL: hwloc_obj_infos
FAIL: glibc-sched
../.././config/test-driver: line 107: 27886 Segmentation fault  "$@" > 
$log_file 2>&1
FAIL: hwloc-hello
../.././config/test-driver: line 107: 27905 Segmentation fault  "$@" > 
$log_file 2>&1
FAIL: hwloc-hello-cpp

This is running inside a Debian Buster system.

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-20 Thread Samuel Thibault
Hello,

Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
> The Hardware Locality (hwloc) team is pleased to announce the first
> beta release for v2.0.0:
> 
>http://www.open-mpi.org/projects/hwloc/

The tarball doesn't contain a netloc/ directory. This is not a problem
for ./configure && make && make install, but it prevents from being able
to run ./autogen.sh

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel