Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
Le 28/10/2015 18:04, Fabian Wein a écrit : > I hope I'm still on the right list for my current problem. Hello It looks like this should go to us...@open-mpi.org now. > - > A request was made to bind a process, but at least one node does NOT > support binding processes to cpus. > > Node: leo > This usually is due to not having libnumactl and libnumactl-devel > installed on the node. > - > > I cannot find these packages for ubuntu 14.04. > > I find a lot of ubuntu deb packages on > https://launchpad.net/ubuntu/+source/numactl > But there I only find libnuma but no libnumactl. > > Where do I get the libnumactl and libnumactl-devel from? On Deb-based (Debian, Ubuntu etc), the right package name is "libnuma-dev". The OMPI message only cares about RPM distros. > Is this the wrong thread and the wrong list? Yeah, OpenMPI specific issues should go to OpenMPI list (hwloc is a subproject of the OpenMPI consortium, but the software projects are pretty much independent). Brice > I have a feeling that I'm quite close but just cannot reach it :( > > Thanks, > > Fabian > > > On 10/27/2015 04:05 PM, Brice Goglin wrote: >> I guess the next step would be to look at how these tasks are placed on >> the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it >> starts placing a second task per NUMA node? >> For OMPI, --report-bindings may help. I am not sure about MPICH. >> >> Brice >> >> >> >> Le 27/10/2015 15:52, Fabian Wein a écrit : >>> On 10/27/2015 03:42 PM, Brice Goglin wrote: I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. >>> >>> Thanks! That was it. >>> >>> I now get almost perfectly reproducible results. >>> >>> np speedup >>> 1 1.0 >>> 2 1.99 >>> 3 2.98 >>> 4 3.98 >>> 5 4.89 >>> 6 5.9 >>> 7 6.89 >>> 8 7.87 >>> 9 5.44 >>> 10 6.04 >>> 11 6.55 >>> 12 7.0 >>> 13 7.75 >>> 14 8.24 >>> 15 8.41 >>> 16 9.4 >>> 17 7.33 >>> 18 7.16 >>> 19 8.05 >>> 20 8.39 >>> >>> What still puzzles me is the almost perfect speedup up to eight and >>> than the >>> drop down. But for the beginning 8 is already good! >>> >>> Thanks again, >>> >>> Fabian >>> >>> ___ >>> hwloc-users mailing list >>> hwloc-us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1210.php >> >> ___ >> hwloc-users mailing list >> hwloc-us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> Link to this post: >> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1210.php >> > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1212.php
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
I hope I'm still on the right list for my current problem. Today we figured out on a similiar but older four opteron (6100) 48 cores system that mpiexec -bind-to numa is the essential key point. This I want to realize on my system. I already installed libnuma such that hwloc configure uses numa. Then I configured openmpi-1.10.0 which also uses libnuma When I compile my petsc example with MPIEXEC="orterun -bind-to numa" and run the application I get - A request was made to bind a process, but at least one node does NOT support binding processes to cpus. Node: leo This usually is due to not having libnumactl and libnumactl-devel installed on the node. - I cannot find these packages for ubuntu 14.04. Even when I compile numactl-2.0.9 from http://oss.sgi.com/projects/libnuma/ I only generates libnuma I find a lot of ubuntu deb packages on https://launchpad.net/ubuntu/+source/numactl But there I only find libnuma but no libnumactl. Where do I get the libnumactl and libnumactl-devel from? Is this the wrong thread and the wrong list? I have a feeling that I'm quite close but just cannot reach it :( Thanks, Fabian On 10/27/2015 04:05 PM, Brice Goglin wrote: I guess the next step would be to look at how these tasks are placed on the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it starts placing a second task per NUMA node? For OMPI, --report-bindings may help. I am not sure about MPICH. Brice Le 27/10/2015 15:52, Fabian Wein a écrit : On 10/27/2015 03:42 PM, Brice Goglin wrote: I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Thanks! That was it. I now get almost perfectly reproducible results. np speedup 1 1.0 2 1.99 3 2.98 4 3.98 5 4.89 6 5.9 7 6.89 8 7.87 9 5.44 10 6.04 11 6.55 12 7.0 13 7.75 14 8.24 15 8.41 16 9.4 17 7.33 18 7.16 19 8.05 20 8.39 What still puzzles me is the almost perfect speedup up to eight and than the drop down. But for the beginning 8 is already good! Thanks again, Fabian ___ hwloc-users mailing list hwloc-us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users Link to this post: http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1210.php ___ hwloc-users mailing list hwloc-us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users Link to this post: http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1210.php
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
I guess the next step would be to look at how these tasks are placed on the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it starts placing a second task per NUMA node? For OMPI, --report-bindings may help. I am not sure about MPICH. Brice Le 27/10/2015 15:52, Fabian Wein a écrit : > On 10/27/2015 03:42 PM, Brice Goglin wrote: >> I guess the problem is that your OMPI uses an old hwloc internally. That >> one may be too old to understand recent XML exports. >> Try replacing "Package" with "Socket" everywhere in the XML file. > > Thanks! That was it. > > I now get almost perfectly reproducible results. > > np speedup > 1 1.0 > 2 1.99 > 3 2.98 > 4 3.98 > 5 4.89 > 6 5.9 > 7 6.89 > 8 7.87 > 9 5.44 > 10 6.04 > 11 6.55 > 12 7.0 > 13 7.75 > 14 8.24 > 15 8.41 > 16 9.4 > 17 7.33 > 18 7.16 > 19 8.05 > 20 8.39 > > What still puzzles me is the almost perfect speedup up to eight and > than the > drop down. But for the beginning 8 is already good! > > Thanks again, > > Fabian > > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1210.php
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
On 10/27/2015 03:42 PM, Brice Goglin wrote: I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Thanks! That was it. I now get almost perfectly reproducible results. np speedup 1 1.0 2 1.99 3 2.98 4 3.98 5 4.89 6 5.9 7 6.89 8 7.87 9 5.44 10 6.04 11 6.55 12 7.0 13 7.75 14 8.24 15 8.41 16 9.4 17 7.33 18 7.16 19 8.05 20 8.39 What still puzzles me is the almost perfect speedup up to eight and than the drop down. But for the beginning 8 is already good! Thanks again, Fabian
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Brice Le 27/10/2015 15:31, Fabian Wein a écrit : > Thank you very much for the file. > > When I try with PETSc, compiled with open-mpi and icc I get > > -- > Failed to parse XML input with the minimalistic parser. If it was not > generated by hwloc, try enabling full XML support with libxml2. > -- > > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > topology discovery failed > --> Returned value Not supported (-8) instead of ORTE_SUCCESS > --- > > Without export HWLOC_XMLFILE > > I get the well known > > * hwloc has encountered what looks like an error from the operating > system. > * > * L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset > 0x003f) without inclusion! > * Error occurred in topology.c line 942 > * > * The following FAQ entry in a recent hwloc documentation may help: > * What should I do when hwloc reports "operating system" warnings? > * Otherwise please report this error message to the hwloc user's > mailing list, > * along with the output+tarball generated by the hwloc-gather-topology > script. > > And the poor scaling > > Triad:55372.8884 Rate (MB/s) > > np speedup > 1 1.0 > 2 1.03 > 3 2.98 > 4 3.98 > 5 4.95 > 6 5.96 > 7 4.15 > 8 4.73 > 9 5.36 > 10 5.94 > 11 4.79 > 12 5.25 > > which is very random upon repetition but never better than a maximal > speedup of 7. > I have 24 (48) kernels and only one was used by the time by another > process. > > Using mpich instead of open-mpi I get no message about the hwloc issue > but > the same poor and random speedups. > > I tried to check the xml file by myself via > xmllint --valid leo_brice.xml --loaddtd /usr/local/share/hwloc/hwloc.dtd > > However xmllint complains about hwloc.dtd itself > /usr/local/share/hwloc/hwloc.dtd:8: parser error : StartTag: invalid > element name > > > I have to mention that I have a mixture of hwloc. The most resent > installed locally and > an older as part of petsc. > > Any ideas? > > Thanks, > > Fabian > > > > On 10/27/2015 10:21 AM, Brice Goglin wrote: >> Here's the fixed XML. For the record, for each NUMA node, I extended >> the cpusets of the L3 to match the container NUMA node, and moved all >> L2 objects as children of that L3. >> Now you may load that XML instead of the native discovery by setting >> HWLOC_XMLFILE=leo2.xml in your environment. >> Brice >> >> >> >> Le 27/10/2015 10:08, Fabian Wein a écrit : >>> Brice, >>> >>> thank you very much for the offer. I attached the xml file >>> .. >>> >>> * hwloc 1.11.1 has encountered what looks like an error from the >>> operating system. >>> * >>> * L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset >>> 0x003f) without inclusion! >>> * Error occurred in topology.c line 981 >>> * >>> .. >>> >>> So if you can affort the time, I apprechiate it very much! >>> >>> Fabian >>> >>> >>> >>> On 10/27/2015 09:52 AM, Brice Goglin wrote: Hello This bug is about L3 cache locality only, everything else should be fine, including cache sizes. Few applications use that locality information, so I assume it doesn't matter for PETSc scaling. We can work around the bug by loading a XML topology. There's no easy way to build that correct XML, but I can do it manually if you send your current broken topology (lstopo foo.xml and send this foo.xml). Brice Le 27/10/2015 09:43, Fabian Wein a écrit : > Hello, > > I'm new to the list and new to the mpi-business, too. > > Our 4*12 Opteron 6238 system is very similar to the one from the > original poster and I get the same error message. > Any use in posting my logs? > > I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS > with kernel 3.13. and our bios is not updated. > > The system scales very fine with OpenMP but fails to give any > realistic scaling using PETSc (both for the standard > streaming benchmark and quick tests with a given application). > > As far as I understand the system is fine, just the information > gathering fails, right?! > > Do you know if the hwloc issue relates with our poor PETSc > scaling? Is > there a way to configure the topology > manually? > > To me it appears that an bios update wouldn't help, right?! I > wouldn't > try it
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
Thank you very much for the file. When I try with PETSc, compiled with open-mpi and icc I get -- Failed to parse XML input with the minimalistic parser. If it was not generated by hwloc, try enabling full XML support with libxml2. -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): topology discovery failed --> Returned value Not supported (-8) instead of ORTE_SUCCESS --- Without export HWLOC_XMLFILE I get the well known * hwloc has encountered what looks like an error from the operating system. * * L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset 0x003f) without inclusion! * Error occurred in topology.c line 942 * * The following FAQ entry in a recent hwloc documentation may help: * What should I do when hwloc reports "operating system" warnings? * Otherwise please report this error message to the hwloc user's mailing list, * along with the output+tarball generated by the hwloc-gather-topology script. And the poor scaling Triad:55372.8884 Rate (MB/s) np speedup 1 1.0 2 1.03 3 2.98 4 3.98 5 4.95 6 5.96 7 4.15 8 4.73 9 5.36 10 5.94 11 4.79 12 5.25 which is very random upon repetition but never better than a maximal speedup of 7. I have 24 (48) kernels and only one was used by the time by another process. Using mpich instead of open-mpi I get no message about the hwloc issue but the same poor and random speedups. I tried to check the xml file by myself via xmllint --valid leo_brice.xml --loaddtd /usr/local/share/hwloc/hwloc.dtd However xmllint complains about hwloc.dtd itself /usr/local/share/hwloc/hwloc.dtd:8: parser error : StartTag: invalid element name I have to mention that I have a mixture of hwloc. The most resent installed locally and an older as part of petsc. Any ideas? Thanks, Fabian On 10/27/2015 10:21 AM, Brice Goglin wrote: Here's the fixed XML. For the record, for each NUMA node, I extended the cpusets of the L3 to match the container NUMA node, and moved all L2 objects as children of that L3. Now you may load that XML instead of the native discovery by setting HWLOC_XMLFILE=leo2.xml in your environment. Brice Le 27/10/2015 10:08, Fabian Wein a écrit : Brice, thank you very much for the offer. I attached the xml file .. * hwloc 1.11.1 has encountered what looks like an error from the operating system. * * L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset 0x003f) without inclusion! * Error occurred in topology.c line 981 * .. So if you can affort the time, I apprechiate it very much! Fabian On 10/27/2015 09:52 AM, Brice Goglin wrote: Hello This bug is about L3 cache locality only, everything else should be fine, including cache sizes. Few applications use that locality information, so I assume it doesn't matter for PETSc scaling. We can work around the bug by loading a XML topology. There's no easy way to build that correct XML, but I can do it manually if you send your current broken topology (lstopo foo.xml and send this foo.xml). Brice Le 27/10/2015 09:43, Fabian Wein a écrit : Hello, I'm new to the list and new to the mpi-business, too. Our 4*12 Opteron 6238 system is very similar to the one from the original poster and I get the same error message. Any use in posting my logs? I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS with kernel 3.13. and our bios is not updated. The system scales very fine with OpenMP but fails to give any realistic scaling using PETSc (both for the standard streaming benchmark and quick tests with a given application). As far as I understand the system is fine, just the information gathering fails, right?! Do you know if the hwloc issue relates with our poor PETSc scaling? Is there a way to configure the topology manually? To me it appears that an bios update wouldn't help, right?! I wouldn't try it if it is not nesessary. I'm a user with sudo accesss, not an administrator but we have no admin for the system. Thanks, Fabian ___ hwloc-users mailing list hwloc-us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users Link to this post: http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1201.php ___ hwloc-users mailing list hwloc-us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users Link to this post: http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1204.php ___
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
Here's the fixed XML. For the record, for each NUMA node, I extended the cpusets of the L3 to match the container NUMA node, and moved all L2 objects as children of that L3. Now you may load that XML instead of the native discovery by setting HWLOC_XMLFILE=leo2.xml in your environment. Brice Le 27/10/2015 10:08, Fabian Wein a écrit : > Brice, > > thank you very much for the offer. I attached the xml file > .. > > * hwloc 1.11.1 has encountered what looks like an error from the > operating system. > * > * L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset > 0x003f) without inclusion! > * Error occurred in topology.c line 981 > * > .. > > So if you can affort the time, I apprechiate it very much! > > Fabian > > > > On 10/27/2015 09:52 AM, Brice Goglin wrote: >> Hello >> >> This bug is about L3 cache locality only, everything else should be >> fine, including cache sizes. Few applications use that locality >> information, so I assume it doesn't matter for PETSc scaling. >> We can work around the bug by loading a XML topology. There's no easy >> way to build that correct XML, but I can do it manually if you send your >> current broken topology (lstopo foo.xml and send this foo.xml). >> >> Brice >> >> >> >> Le 27/10/2015 09:43, Fabian Wein a écrit : >>> Hello, >>> >>> I'm new to the list and new to the mpi-business, too. >>> >>> Our 4*12 Opteron 6238 system is very similar to the one from the >>> original poster and I get the same error message. >>> Any use in posting my logs? >>> >>> I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS >>> with kernel 3.13. and our bios is not updated. >>> >>> The system scales very fine with OpenMP but fails to give any >>> realistic scaling using PETSc (both for the standard >>> streaming benchmark and quick tests with a given application). >>> >>> As far as I understand the system is fine, just the information >>> gathering fails, right?! >>> >>> Do you know if the hwloc issue relates with our poor PETSc scaling? Is >>> there a way to configure the topology >>> manually? >>> >>> To me it appears that an bios update wouldn't help, right?! I wouldn't >>> try it if it is not nesessary. I'm a user with sudo accesss, >>> not an administrator but we have no admin for the system. >>> >>> Thanks, >>> >>> Fabian >>> ___ >>> hwloc-users mailing list >>> hwloc-us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1201.php >> >> ___ >> hwloc-users mailing list >> hwloc-us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> Link to this post: >> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1204.php >> > > > > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1205.php
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
Hello Good to know. Did you see/test the kernel patch yet? If possible, could you send a link to the kernel commit when it appears upstream? Thanks Brice Le 27/10/2015 09:21, Ondřej Vlček a écrit : > Dear Brice, > thank you for your answer. Neither upgrade of BIOS nor using the latest > hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which > coused problems with 12-core AMD processors. They should upstream the changes > to kernel.org soon, so that all the distros (Centos,RHEL,SUSE etc.) can pick > them up automatically as they create their respective next releases. > > Ondrej > >> On Monday, August 24, 2015 15:32:12 Brice Goglin wrote: >> Hello, >> >> hwloc 1.7 is very old, I am surprised CentOS 7 doesn't have anything >> more recent, maybe not in "standard" packages? >> >> Anyway, this is a very common error on AMD 6200 and 6300 machines. >> See >> http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php#faq_os_error >> Assuming you kernel isn't too old (CentOS7 should be fine), you should >> try to upgrade the BIOS. >> >> Brice >> >> Le 24/08/2015 15:06, Ondřej Vlček a écrit : >>> Dear all, >>> >>> I have encountered hwloc error for the AMD Opteron 6300 processor family >>> >>> (see below). I am using hwloc.x86_64 v1.7-3.el7, which is its latest >>> version available in standard packages for CentOS 7. Is this something, >>> what has been already encountered and fixed in newer versions of hwloc? >>> Output from the hwloc-gather-topology.sh script is attached. >>> >>> Thank you. >>> Ondrej Vlcek >>> >>> $ hwloc-info >>> ** >>> ** * Hwloc has encountered what looks like an error from the operating >>> system. * >>> * object (L3 cpuset 0x03f0) intersection without inclusion! >>> * Error occurred in topology.c line 753 >>> * >>> * Please report this error message to the hwloc user's mailing list, >>> * along with the output from the hwloc-gather-topology.sh script. >>> ** >>> ** depth 0:1 Machine (type #1) >>> depth 1: 4 Socket (type #3) >>> depth 2: 8 NUMANode (type #2) >>>depth 3: 8 L3Cache (type #4) >>> depth 4:24 L2Cache (type #4) >>> depth 5: 24 L1iCache (type #4) >>> depth 6: 48 L1dCache (type #4) >>>depth 7: 48 Core (type #5) >>> depth 8:48 PU (type #6) >>> >>> Special depth -3: 4 Bridge (type #9) >>> Special depth -4: 6 PCI Device (type #10) >>> Special depth -5: 9 OS Device (type #11) >>> >>> >>> ___ >>> hwloc-users mailing list >>> hwloc-us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/hwloc-users/2015/08/1196.php
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
Dear Brice, thank you for your answer. Neither upgrade of BIOS nor using the latest hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which coused problems with 12-core AMD processors. They should upstream the changes to kernel.org soon, so that all the distros (Centos,RHEL,SUSE etc.) can pick them up automatically as they create their respective next releases. Ondrej > On Monday, August 24, 2015 15:32:12 Brice Goglin wrote: > Hello, > > hwloc 1.7 is very old, I am surprised CentOS 7 doesn't have anything > more recent, maybe not in "standard" packages? > > Anyway, this is a very common error on AMD 6200 and 6300 machines. > See > http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php#faq_os_error > Assuming you kernel isn't too old (CentOS7 should be fine), you should > try to upgrade the BIOS. > > Brice > > Le 24/08/2015 15:06, Ondřej Vlček a écrit : > > Dear all, > > > > I have encountered hwloc error for the AMD Opteron 6300 processor family > > > > (see below). I am using hwloc.x86_64 v1.7-3.el7, which is its latest > > version available in standard packages for CentOS 7. Is this something, > > what has been already encountered and fixed in newer versions of hwloc? > > Output from the hwloc-gather-topology.sh script is attached. > > > > Thank you. > > Ondrej Vlcek > > > > $ hwloc-info > > ** > > ** * Hwloc has encountered what looks like an error from the operating > > system. * > > * object (L3 cpuset 0x03f0) intersection without inclusion! > > * Error occurred in topology.c line 753 > > * > > * Please report this error message to the hwloc user's mailing list, > > * along with the output from the hwloc-gather-topology.sh script. > > ** > > ** depth 0:1 Machine (type #1) > > depth 1: 4 Socket (type #3) > > depth 2: 8 NUMANode (type #2) > >depth 3: 8 L3Cache (type #4) > > depth 4:24 L2Cache (type #4) > > depth 5: 24 L1iCache (type #4) > > depth 6: 48 L1dCache (type #4) > >depth 7: 48 Core (type #5) > > depth 8:48 PU (type #6) > > > > Special depth -3: 4 Bridge (type #9) > > Special depth -4: 6 PCI Device (type #10) > > Special depth -5: 9 OS Device (type #11) > > > > > > ___ > > hwloc-users mailing list > > hwloc-us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > Link to this post: > > http://www.open-mpi.org/community/lists/hwloc-users/2015/08/1196.php
Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family
Dear Brice, thank you for your answer. Neither upgrade of BIOS nor using the latest hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which coused problems with 12-core AMD processors. They should upstream the changes to kernel.org soon, so that all the distros (Centos,RHEL,SUSE etc.) can pick them up automatically as they create their respective next releases. Ondrej -- Původní zpráva -- Od: Brice Goglin <brice.gog...@inria.fr> Komu: Ondrej Certik <ond...@certik.cz> Datum: 24. 8. 2015 15:32:33 Předmět: Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family " Hello, hwloc 1.7 is very old, I am surprised CentOS 7 doesn't have anything more recent, maybe not in "standard" packages? Anyway, this is a very common error on AMD 6200 and 6300 machines. See http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php#faq_os_ error (http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php#faq_os_error) Assuming you kernel isn't too old (CentOS7 should be fine), you should try to upgrade the BIOS. Brice Le 24/08/2015 15:06, Ondřej Vlček a écrit : " Dear all, I have encountered hwloc error for the AMD Opteron 6300 processor family (see below). I am using hwloc.x86_64 v1.7-3.el7, which is its latest version available in standard packages for CentOS 7. Is this something, what has been already encountered and fixed in newer versions of hwloc? Output from the hwloc-gather-topology.sh script is attached. Thank you. Ondrej Vlcek $ hwloc-info * Hwloc has encountered what looks like an error from the operating system. * * object (L3 cpuset 0x03f0) intersection without inclusion! * Error occurred in topology.c line 753 * * Please report this error message to the hwloc user's mailing list, * along with the output from the hwloc-gather-topology.sh script. depth 0:1 Machine (type #1) depth 1: 4 Socket (type #3) depth 2: 8 NUMANode (type #2) depth 3: 8 L3Cache (type #4) depth 4:24 L2Cache (type #4) depth 5: 24 L1iCache (type #4) depth 6: 48 L1dCache (type #4) depth 7: 48 Core (type #5) depth 8:48 PU (type #6) Special depth -3: 4 Bridge (type #9) Special depth -4: 6 PCI Device (type #10) Special depth -5: 9 OS Device (type #11) ___ hwloc-users mailing list hwloc-us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users Link to this post: http://www.open-mpi.org/community/lists/hwloc-users/2015/08/1196.php " ___ hwloc-users mailing list hwloc-us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users Link to this post: http://www.open-mpi.org/community/lists/hwloc-users/2015/ 08/1197.php"