Em 30-06-2017 17:28, Brice Goglin escreveu:
Le 30/06/2017 22:08, fabricio a écrit :
Em 30-06-2017 16:21, Brice Goglin escreveu:
Yes, it's possible but very easy. Before we go that way:
Can you also pass HWLOC_COMPONENTS_VERBOSE=1 in the environment and send
the verbose output?
Hello
We have seen _many_ reports like these. But there are different kinds of
errors. As far as I understand:
* Julio's error is caused by the Linux kernel improperly reporting L3
cache affinities. It's specific to multi-socket 12-core processors
because the kernel makes invalid assumptions
Le 30/06/2017 22:08, fabricio a écrit :
> Em 30-06-2017 16:21, Brice Goglin escreveu:
>> Yes, it's possible but very easy. Before we go that way:
>> Can you also pass HWLOC_COMPONENTS_VERBOSE=1 in the environment and send
>> the verbose output?
>
>
We (Georgia Tech) too have been observing this on 16-core AMD AbuDhabi machines
(6378). We weren’t aware of HWLOC_COMPONENTS workaround, which seems to
mitigate the issue.
Before:
# ./lstopo
* hwloc has encountered
Em 29-06-2017 02:24, Brice Goglin escreveu:
Hello Brice
I'm still seeing this error message even when passing the
HWLOC_COMPONENTS=x86 variable.
Is it possible to generate a xml file that can silence this error?
TIA,
Fabricio
___
hwloc-users
Hello
We've seen this issue many times (it's specific to 12-core opterons),
but I am surprised it still occurs with such a recent kernel. AMD was
supposed to fix the kernel in early 2016 but I forgot checking whether
something was actually pushed.
Anyway, you can likely ignore the issue as
Le 28/10/2015 18:04, Fabian Wein a écrit :
> I hope I'm still on the right list for my current problem.
Hello
It looks like this should go to us...@open-mpi.org now.
> -
> A request was made to bind a process, but at least one node does NOT
> support binding processes to
I hope I'm still on the right list for my current problem.
Today we figured out on a similiar but older four opteron (6100) 48
cores system that
mpiexec -bind-to numa is the essential key point.
This I want to realize on my system. I already installed libnuma such
that hwloc configure
uses
I guess the next step would be to look at how these tasks are placed on
the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it
starts placing a second task per NUMA node?
For OMPI, --report-bindings may help. I am not sure about MPICH.
Brice
Le 27/10/2015 15:52, Fabian Wein a
On 10/27/2015 03:42 PM, Brice Goglin wrote:
I guess the problem is that your OMPI uses an old hwloc internally. That
one may be too old to understand recent XML exports.
Try replacing "Package" with "Socket" everywhere in the XML file.
Thanks! That was it.
I now get almost perfectly
I guess the problem is that your OMPI uses an old hwloc internally. That
one may be too old to understand recent XML exports.
Try replacing "Package" with "Socket" everywhere in the XML file.
Brice
Le 27/10/2015 15:31, Fabian Wein a écrit :
> Thank you very much for the file.
>
> When I try
Thank you very much for the file.
When I try with PETSc, compiled with open-mpi and icc I get
--
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
Here's the fixed XML. For the record, for each NUMA node, I extended the
cpusets of the L3 to match the container NUMA node, and moved all L2
objects as children of that L3.
Now you may load that XML instead of the native discovery by setting
HWLOC_XMLFILE=leo2.xml in your environment.
Brice
Le
Hello
Good to know. Did you see/test the kernel patch yet? If possible, could
you send a link to the kernel commit when it appears upstream?
Thanks
Brice
Le 27/10/2015 09:21, Ondřej Vlček a écrit :
> Dear Brice,
> thank you for your answer. Neither upgrade of BIOS nor using the latest
> hwloc
Dear Brice,
thank you for your answer. Neither upgrade of BIOS nor using the latest
hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which
coused problems with 12-core AMD processors. They should upstream the changes
to kernel.org soon, so that all the distros
,SUSE etc.)
can pick them up automatically as they create their respective next
releases.
Ondrej
-- Původní zpráva --
Od: Brice Goglin <brice.gog...@inria.fr>
Komu: Ondrej Certik <ond...@certik.cz>
Datum: 24. 8. 2015 15:32:33
Předmět: Re: [hwloc-users] hwloc e
I checked his .output file and it seems he got the same
>hardware than me. I see now why you said "yet another buggy AMD
>platform" !
>
>Sorry guys.
>
>
>Date: Fri, 5 Sep 2014 13:46:25 +0200
>From: brice.gog...@inria.fr
>To: hwloc-us...@open-mpi.org
>
2014 13:46:25 +0200
From: brice.gog...@inria.fr
To: hwloc-us...@open-mpi.org
Subject: Re: [hwloc-users] hwloc error with "node interleaving" disabled
Hello
You sent the test.output file instead of test.tar.bz2 so I can't
check for sure. Anyway
:46:25 +0200
From: brice.gog...@inria.fr
To: hwloc-us...@open-mpi.org
Subject: Re: [hwloc-users] hwloc error with "node interleaving" disabled
Hello
You sent the test.output file instead of test.tar.bz2 so I can't
check for sure. Anyway I guess t
Hello
You sent the test.output file instead of test.tar.bz2 so I can't check
for sure. Anyway I guess this is yet another buggy AMD platform with
magny-cours/interlagos/abu-dahbi Opterons (61xx, 62xx or 63xx).
Sometimes upgrading the BIOS/kernel helps. Sometimes not.
Some L3 caches will be
Hi Brice,
> Your kernel looks recent enough, can you try upgrading your BIOS ? You
> have version 3.0b and there's a 3.5 version at
> http://www.supermicro.com/aplus/motherboard/opteron6000/sr56x0/h8qg6-f.cfm
For completeness, I just tried updating bios to 3.5; hwloc still throws
the same error.
21 matches
Mail list logo