Re: [hwloc-users] hwloc error in SuperMicro AMD Opteron 6238

2017-06-30 Thread fabricio
Em 30-06-2017 17:28, Brice Goglin escreveu: Le 30/06/2017 22:08, fabricio a écrit : Em 30-06-2017 16:21, Brice Goglin escreveu: Yes, it's possible but very easy. Before we go that way: Can you also pass HWLOC_COMPONENTS_VERBOSE=1 in the environment and send the verbose output?

Re: [hwloc-users] hwloc error in SuperMicro AMD Opteron 6238

2017-06-30 Thread Brice Goglin
Hello We have seen _many_ reports like these. But there are different kinds of errors. As far as I understand: * Julio's error is caused by the Linux kernel improperly reporting L3 cache affinities. It's specific to multi-socket 12-core processors because the kernel makes invalid assumptions

Re: [hwloc-users] hwloc error in SuperMicro AMD Opteron 6238

2017-06-30 Thread Brice Goglin
Le 30/06/2017 22:08, fabricio a écrit : > Em 30-06-2017 16:21, Brice Goglin escreveu: >> Yes, it's possible but very easy. Before we go that way: >> Can you also pass HWLOC_COMPONENTS_VERBOSE=1 in the environment and send >> the verbose output? > >

Re: [hwloc-users] hwloc error in SuperMicro AMD Opteron 6238

2017-06-30 Thread Belgin, Mehmet
We (Georgia Tech) too have been observing this on 16-core AMD AbuDhabi machines (6378). We weren’t aware of HWLOC_COMPONENTS workaround, which seems to mitigate the issue. Before: # ./lstopo * hwloc has encountered

Re: [hwloc-users] hwloc error in SuperMicro AMD Opteron 6238

2017-06-30 Thread fabricio
Em 29-06-2017 02:24, Brice Goglin escreveu: Hello Brice I'm still seeing this error message even when passing the HWLOC_COMPONENTS=x86 variable. Is it possible to generate a xml file that can silence this error? TIA, Fabricio ___ hwloc-users

Re: [hwloc-users] hwloc error in SuperMicro AMD Opteron 6238

2017-06-28 Thread Brice Goglin
Hello We've seen this issue many times (it's specific to 12-core opterons), but I am surprised it still occurs with such a recent kernel. AMD was supposed to fix the kernel in early 2016 but I forgot checking whether something was actually pushed. Anyway, you can likely ignore the issue as

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-29 Thread Brice Goglin
Le 28/10/2015 18:04, Fabian Wein a écrit : > I hope I'm still on the right list for my current problem. Hello It looks like this should go to us...@open-mpi.org now. > - > A request was made to bind a process, but at least one node does NOT > support binding processes to

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-28 Thread Fabian Wein
I hope I'm still on the right list for my current problem. Today we figured out on a similiar but older four opteron (6100) 48 cores system that mpiexec -bind-to numa is the essential key point. This I want to realize on my system. I already installed libnuma such that hwloc configure uses

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
I guess the next step would be to look at how these tasks are placed on the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it starts placing a second task per NUMA node? For OMPI, --report-bindings may help. I am not sure about MPICH. Brice Le 27/10/2015 15:52, Fabian Wein a

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
On 10/27/2015 03:42 PM, Brice Goglin wrote: I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Thanks! That was it. I now get almost perfectly

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Brice Le 27/10/2015 15:31, Fabian Wein a écrit : > Thank you very much for the file. > > When I try

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
Thank you very much for the file. When I try with PETSc, compiled with open-mpi and icc I get -- Failed to parse XML input with the minimalistic parser. If it was not generated by hwloc, try enabling full XML support with libxml2.

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
Here's the fixed XML. For the record, for each NUMA node, I extended the cpusets of the L3 to match the container NUMA node, and moved all L2 objects as children of that L3. Now you may load that XML instead of the native discovery by setting HWLOC_XMLFILE=leo2.xml in your environment. Brice Le

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
Hello Good to know. Did you see/test the kernel patch yet? If possible, could you send a link to the kernel commit when it appears upstream? Thanks Brice Le 27/10/2015 09:21, Ondřej Vlček a écrit : > Dear Brice, > thank you for your answer. Neither upgrade of BIOS nor using the latest > hwloc

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Ondřej Vlček
Dear Brice, thank you for your answer. Neither upgrade of BIOS nor using the latest hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which coused problems with 12-core AMD processors. They should upstream the changes to kernel.org soon, so that all the distros

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread vlcek
,SUSE etc.) can pick them up automatically as they create their respective next releases. Ondrej -- Původní zpráva -- Od: Brice Goglin <brice.gog...@inria.fr> Komu: Ondrej Certik <ond...@certik.cz> Datum: 24. 8. 2015 15:32:33 Předmět: Re: [hwloc-users] hwloc e

Re: [hwloc-users] hwloc error with "node interleaving" disabled

2014-09-05 Thread Brice Goglin
I checked his .output file and it seems he got the same >hardware than me. I see now why you said "yet another buggy AMD >platform" ! > >Sorry guys. > > >Date: Fri, 5 Sep 2014 13:46:25 +0200 >From: brice.gog...@inria.fr >To: hwloc-us...@open-mpi.org >

Re: [hwloc-users] hwloc error with "node interleaving" disabled

2014-09-05 Thread Jean-Pierre Adam
2014 13:46:25 +0200 From: brice.gog...@inria.fr To: hwloc-us...@open-mpi.org Subject: Re: [hwloc-users] hwloc error with "node interleaving" disabled Hello You sent the test.output file instead of test.tar.bz2 so I can't check for sure. Anyway

Re: [hwloc-users] hwloc error with "node interleaving" disabled

2014-09-05 Thread Jean-Pierre Adam
:46:25 +0200 From: brice.gog...@inria.fr To: hwloc-us...@open-mpi.org Subject: Re: [hwloc-users] hwloc error with "node interleaving" disabled Hello You sent the test.output file instead of test.tar.bz2 so I can't check for sure. Anyway I guess t

Re: [hwloc-users] hwloc error with "node interleaving" disabled

2014-09-05 Thread Brice Goglin
Hello You sent the test.output file instead of test.tar.bz2 so I can't check for sure. Anyway I guess this is yet another buggy AMD platform with magny-cours/interlagos/abu-dahbi Opterons (61xx, 62xx or 63xx). Sometimes upgrading the BIOS/kernel helps. Sometimes not. Some L3 caches will be

Re: [hwloc-users] hwloc error

2014-08-16 Thread Andrej Prsa
Hi Brice, > Your kernel looks recent enough, can you try upgrading your BIOS ? You > have version 3.0b and there's a 3.5 version at > http://www.supermicro.com/aplus/motherboard/opteron6000/sr56x0/h8qg6-f.cfm For completeness, I just tried updating bios to 3.5; hwloc still throws the same error.