Le 11/01/2014 00:27, Jeff Squyres (jsquyres) a écrit :
> Jeff Becker (CC'ed) reported to me a failure with hwloc 1.7.2 (in OMPI
> trunk). I had him verify this with a standalone hwloc 1.7.2, and then had
> him try standalone hwloc 1.8 as well -- all got the same failure.
>
> Here's what he's see
Le 11/01/2014 01:58, Chris Samuel a écrit :
> On Sat, 11 Jan 2014 11:54:17 AM Chris Samuel wrote:
>
>> We've got both an older Altix XE cluster and a UV10 (both running RHEL) I
>> can test on if it's useful?
> Forgot I already had both 1.7.2 and 1.8 built for both - all fine (RHEL6.4).
>
This was
Hello,
Linux says socket 0 contains processors 0-7 and socket 1 contains 8-15,
while NUMA node 0 contains processors 0-3+8-11 and NUMA node 1 contains
processors 4-7+12-15. Given why I read about Opteron 6320 online, the
problem is that NUMA 0 should be replaced with two NUMA nodes with
processors
Maybe try to disable some dependencies such as pci in hwloc
(--disable-pci), I wouldn't be surprised if there were issues there.
If that helps, please let us know what was enabled before (libpciaccess
(default), or libpci/pciutils (--enable-libpci)).
Brice
Le 18/01/2014 07:23, Robin Scher a écr
Hello,
The CPUModel attribute should be only in Socket or machine/root objects.
At least, that's what I documented and what I seem to see in the code.
Did you actually see any other place?
So it may just mean that the CPUModel is not available on your machine?
Or maybe the code below is buggy som
Hello,
Is anybody familiar with ARM CPUs?
I am adding more CPU information because Intel needs more:
CPUVendor=GenuineIntel
CPUModel=Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
CPUModelNumber=45
CPUFamilyNumber=6
Would something similar be useful for ARM? What are the fields below
from /proc/cpuinf
Le 28/01/2014 09:46, Robin Scher a écrit :
> Hi, thanks for responding.
>
> The CPUModel is definitely available on this machine. A 32 bit process
> on the same machine correctly finds the model name using code that
> calls the cpuid inline assembly to get it, and the machine itself is a
> VM runn
Le 28/01/2014 09:57, Brice Goglin a écrit :
> I will debug a bit more to see if it's actually a 64bit cpuid problem
> on windows.
The x86 backend is entirely disabled in the 64bit windows build because
configure fails to compile the cpuid assembly (in my mingw64 with gcc 4.7).
It
Le 28/01/2014 13:00, Samuel Thibault a écrit :
> Brice Goglin, le Tue 28 Jan 2014 12:46:24 +0100, a écrit :
>> 42: xchg %ebx,%rbx
>>
>> I guess having both ebx and rbx on these lines isn't OK. On Linux, I get
>> rsi instead of ebx, no problem.
>>
>> S
Le 28/01/2014 14:31, Brice Goglin a écrit :
> Le 28/01/2014 13:00, Samuel Thibault a écrit :
>> Brice Goglin, le Tue 28 Jan 2014 12:46:24 +0100, a écrit :
>>> 42: xchg %ebx,%rbx
>>>
>>> I guess having both ebx and rbx on these lines isn't OK. On Linux,
models executing in the same SMP system)."
>>
>> He passed the question on to another ARM guy, asking for further detail.
>> I'll pass on what he says.
>>
>>
>>
>> On Jan 28, 2014, at 3:39 AM, Brice Goglin wrote:
>>
>>> Hello,
The bridge cannot be "not connected to anything". All objects have a
parent (and are a child of that parent) except the very-top root object.
Theoretically, the bridge could be connected anywhere. In practice it's
connected to a NUMA node, a root object, or (rarely) a group of numa nodes.
The prob
en-mpi.org/community/lists/hwloc-devel/2014/01/4043.php
Le 29/01/2014 06:50, Robin Scher a écrit :
> Hi Brice
>
> This works great now. Thank you for your help!
> -robin
>
> Robin Scher
> ro...@uberware.net
> +1 (213) 448-0443
>
>
>
> On Jan 28, 2014, at 7:4
Hello,
Your BIOS reports invalid L3 cache information. On these processors, the
L3 is shared by 6 cores, it covers 6 cores of an entire half-socket NUMA
node. But the BIOS says that some L3 are shared between 4 cores, others
by 6 cores. And worse it says that some L3 is shared by some cores from
a
Hello Brock,
Some people reported the same issue in the past and that's why we added
the "nvml" objects. CUDA reorders devices by "performance".
Batch-schedulers are somehow supposed to use "nvml" for managing GPUs
without actually using them with CUDA directly. And the "nvml" order is
the "normal
r following the PCI bus order?
We may want to talk to NVIDIA to get a clarification about all this.
Brice
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Feb 5, 2014, at 1:19 A
GPU L#3 "nvml2"
> GPU L#5 "nvml3"
> GPU L#7 "nvml0"
> GPU L#9 "nvml1"
>
> Is the L# always going to be in the oder I would expect? Because then I
> already have my map then.
Brice
>
> Brock P
Le 13/02/2014 22:25, Jiri Hladky a écrit :
> Hi Brice,
>
> when compiling hwloc-1.8.1 I have seen these warnings. Could you
> please check them?
fread() warnings come from fread() on kernel sysfs files, so it's very
unlikely that we read totally buggy data from there. One day we'll fix
this, maybe
Le 25/03/2014 07:51, Biddiscombe, John A. a écrit :
>
> I'm compiling hwloc using clang (bgclang++11 from ANL) to run on IO
> nodes af a BGQ. It seems to have compiled ok, and when I run lstopo, I
> get an output like this (below), which looks reasonable, but there are
> 15 sockets instead of 16. I
x=/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/hwloc-1.8.1
>
>should I rerun with something set?
>
>Thanks
>
>JB
>
>
>From: hwloc-users [mailto:hwloc-users-boun...@open-mpi.org] On Behalf
>Of Brice Goglin
>Sent: 25 March 2014 08:04
>To: Hardware locality user list
&g
ere we
> are trying to customise the IO.
>
>
>
> JB
>
>
>
> *From:*Brice Goglin [mailto:brice.gog...@inria.fr]
> *Sent:* 25 March 2014 08:43
> *To:* Hardware locality user list; Biddiscombe, John A.
> *Subject:* Re: [hwloc-users] BGQ question.
>
>
&
Le 26/03/2014 01:00, Christopher Samuel a écrit :
> On 26/03/14 01:34, Biddiscombe, John A. wrote:
>
> > If I compile on the login node, but run lstopo on the ION, I get
> > this (wrong, below)
>
> If you build this with GCC (the standard system one, not the
> cross-compiler for BGQ) does it still
Hello,
This is the main corner case of hwloc-distrib. It can return objects
only, not groups of objects. The distrib algorithms is:
1) start at the root, where there are M children, and you have to
distribute N processes
2) if there are no children, or if N is 1, return the entire object
3) split
hat this is considered a corner case. Could you
> please consider fixing this?
>
> Thanks,
> Tim
>
> Brice Goglin wrote:
>> Hello,
>>
>> This is the main corner case of hwloc-distrib. It can return objects
>> only, not groups of objects. The distrib algorit
gt;
> On Sun, Mar 30, 2014 at 05:32:38PM +0200, Brice Goglin wrote:
>> Don't worry, binding multithreaded processes is not a corner case. I was
>> rather talking about the general "distributing less processes than there
>> are object and returning cpusets as large as po
Le 01/04/2014 10:43, Jiri Hladky a écrit :
> Hi Brice,
>
> I see some compiler warnings when building rpm package for Fedora:
>
> topology-windows.c: In function 'hwloc_win_get_VirtualAllocExNumaProc':
> topology-windows.c:338:30: warning: assignment from incompatible
> pointer type [enabled by def
oc-gui package) is still much
> lower compared to lstopo-no-graphics
> B) Compile it without libXNVCtrl but it will reduce the functionality.
>
> Is there any 3rd option? I guess not. It seems like A) is the best
> choice for Fedora.
>
> Any ideas on that?
>
> Thanks!
has latest version. If I should check some BIOS information,
> I have access to hardware. Tell me what variables from SMBIOS you want
> to see?
>
>
> On Fri, Jan 31, 2014 at 1:07 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Hello,
>
> Your BI
Hello,
This list is for hwloc users (hwloc is a Open MPI subproject).
You likely want Open MPI users instead: us...@open-mpi.org
Brice
Le 16/04/2014 18:44, flavienne sayou a écrit :
> Hello,
> I am Flavienne and I am a master student.
> I wrote a script which have to backup sequentials applicatio
Please run "hwloc-gather-topology simics" and send the resulting
simics.tar.bz2 that it will create. However, I assume that the simulator
returns buggy x86 cpuid information, so we'll see if we want/can easily
workaround the bug or just let simics developers fix it.
Brice
Le 29/04/2014 01:15, Fri
Aside of the BIOS config, are you sure that you have the exact same BIOS
*version* in each node? (can check in /sys/class/dmi/id/bios_*) Same
Linux kernel too?
Also, recently we've seen somebody fix such problems by unplugging and
replugging some CPUs on the motherboard. Seems crazy but it happene
> Thanks much,
>
> Craig
>
>
> On Wednesday, May 28, 2014 1:39 PM, Brice Goglin
> wrote:
>
>
> Aside of the BIOS config, are you sure that you have the exact same
> BIOS *version* in each node? (can check in /sys/class/dmi/id/bios_*)
> Same Linux kernel too?
&g
Le 28/05/2014 14:57, Craig Kapfer a écrit :
>
>
> Hmm ... the slurm config defines that all nodes have 4 sockets with 16
> cores per socket (which corresponds to the hardware--all nodes are the
> same). Slurm node config is as follows:
>
> NodeName=n[001-008] RealMemory=258452 Sockets=4 CoresPerS
Le 28/05/2014 15:46, Craig Kapfer a écrit :
> Wait, I'm sorry, I must be missing something, please bear with me!
>
> By the way, your discussion of groups 1 and 2 below is wrong.
> Group 2 doesn't say that NUMA node == socket, and it doesn't
> report 8 sockets of 8 cores each. It report
gt;
> Thanks,
>
> Andrew
>
>> -Original Message-
>> From: Brice Goglin [mailto:brice.gog...@inria.fr]
>> Sent: Monday, May 5, 2014 1:03 PM
>> To: Friedley, Andrew
>> Subject: Re: [hwloc-users] divide by zero error?
>>
>> Thanks.
>&
amples of 6348 (all characteristics are same).
>
>
> On Tue, Apr 1, 2014 at 6:59 PM, Yury Vorobyov <mailto:teupol...@gmail.com>> wrote:
>
> The BIOS has latest version. If I should check some BIOS
> information, I have access to hardware. Tell me wh
Hello,
A quick look in Open MPI source code seems to say that it's manipulating
XML topologies in these lines.
Please go into your hwloc-1.9 build directory, and run "tests/xmlbuffer"
(you will may have to build it with run "make xmlbuffer -C tests").
If it works, try running "make check".
Also
Le 09/07/2014 23:30, Nick Papior Andersen a écrit :
> Dear Brice
>
> Here are my findings (apologies for not doing make check on before-hand!)
>
> 2014-07-09 20:42 GMT+00:00 Brice Goglin <mailto:brice.gog...@inria.fr>>:
>
> Hello,
>
> A quick look in Op
4 23:42, Nick Papior Andersen a écrit :
> Dear Brice
>
>
> 2014-07-09 21:34 GMT+00:00 Brice Goglin <mailto:brice.gog...@inria.fr>>:
>
> Le 09/07/2014 23:30, Nick Papior Andersen a écrit :
>> Dear Brice
>>
>> Here are my findings (apologies for not
This commit should fix it.
https://github.com/open-mpi/hwloc/commit/f46c983df58a41ec8f994f30f57154bd78392de8.patch
Brice
Le 09/07/2014 23:42, Nick Papior Andersen a écrit :
> Dear Brice
>
>
> 2014-07-09 21:34 GMT+00:00 Brice Goglin <mailto:brice.gog...@inria.fr>>:
>
&
Hello,
Your platform reports buggy L3 cache locality information. This is very
common on AMD 62xx and 63xx platforms unfortunately.
You have 8 L3 caches (one per 6-core NUMA node, two per socket), but the
platform report 11 L3 caches instead:
Socket s1, 2 and 4 report one L3 above 2 cores, one L3
Le 15/08/2014 14:59, Andrej Prsa a écrit :
> Hi Brice,
>
>> Your kernel looks recent enough, can you try upgrading your BIOS ? You
>> have version 3.0b and there's a 3.5 version at
>> http://www.supermicro.com/aplus/motherboard/opteron6000/sr56x0/h8qg6-f.cfm
> Flashing bios is not the easiest optio
Le 16/08/2014 18:37, Andrej Prsa a écrit :
> Hi Brice,
>
>> Your kernel looks recent enough, can you try upgrading your BIOS ? You
>> have version 3.0b and there's a 3.5 version at
>> http://www.supermicro.com/aplus/motherboard/opteron6000/sr56x0/h8qg6-f.cfm
> For completeness, I just tried updatin
Le 19/08/2014 18:38, Aulwes, Rob a écrit :
> Hi,
>
> I'm trying to write a custom C++ allocator that wraps hwloc calls.
> I've tried using various hwloc_alloc* functions to set the memory
> bindings, but when I call hwloc_get_area_membind_nodeset to verify, I
> don't get the same policy I passed t
* sizeof (T));
> hwloc_set_area_membind_nodeset(_topo, p, cnt * sizeof (T),
>
> mem_nodeset, HWLOC_MEMBIND_NEXTTOUCH, 0);
>
> where
>
> mem_nodeset = hwloc_topology_get_complete_nodeset(_topo);
>
> Thanks,Rob
>
> From: Brice Goglin
ould like to try 'replicate'.
>
> From: Brice Goglin mailto:brice.gog...@inria.fr>>
> Reply-To: Hardware locality user list <mailto:hwloc-us...@open-mpi.org>>
> Date: Tue, 19 Aug 2014 18:55:57 +0200
> To: Hardware locality user list <mailto:hwloc-us...
any doc?
>
> Thanks for the help! Rob
>
> From: Brice Goglin mailto:brice.gog...@inria.fr>>
> Reply-To: Hardware locality user list <mailto:hwloc-us...@open-mpi.org>>
> Date: Tue, 19 Aug 2014 19:03:56 +0200
> To: Hardware locality user list <mailto:hw
thout the STRICT flag. And I'll see if I add a good example
somewhere.
Brice
Le 19/08/2014 19:00, Aulwes, Rob a écrit :
> nope, no error. is there a way to find out what policies are
> supported? I would like to try 'replicate'.
>
> From: Brice Goglin mailto:brice.gog.
I added a new doc/examples/ repository to better show how to use
bitmaps, cpu and memory binding etc.
https://github.com/open-mpi/hwloc/tree/master/doc/examples
If you see anything missing, don't hesitate to ask.
Brice
Le 19/08/2014 19:10, Aulwes, Rob a écrit :
> ok, in the meantime, is th
Hello
You sent the test.output file instead of test.tar.bz2 so I can't check
for sure. Anyway I guess this is yet another buggy AMD platform with
magny-cours/interlagos/abu-dahbi Opterons (61xx, 62xx or 63xx).
Sometimes upgrading the BIOS/kernel helps. Sometimes not.
Some L3 caches will be missi
Don't be sorry, I used "yet another" to complain about all these buggy AMD
platforms, and not to complain about their owners ;)
Bug reports are always welcome, that's why the big warning says you should
report it.
Also these warnings vary a little bit with the platform and processor model so
i
t_numanode_obj_by_os_index?
>
> Thanks,Rob
>
>
> *From:* hwloc-users [hwloc-users-boun...@open-mpi.org] on behalf of
> Brice Goglin [brice.gog...@inria.fr]
> *Sent:* Thursday, September 04, 2014 6:25 AM
> *To:* hwloc-us...@open-mpi.org
> *Subject:* Re: [hwloc-users] setting
Can you send the output of configure, the generated config.log and your
unmodified Xutil.h? My solaris/openindiana doesn't have that problem.
thanks
Brice
Le 16/09/2014 14:43, Siegmar Gross a écrit :
> Hi,
>
> today I installed hwloc-1.9.1 on my machines (Solaris 10 Sparc (tyr),
> Solaris 10 x86
What is errno after load() failing?
Brice
On 17 septembre 2014 17:43:13 UTC+02:00, "Aulwes, Rob" wrote:
>Hi,
>
>A call to hwloc_topology_load is failing, but all that is returned is
>–1. Are there error reporting routines that can be called to get more
>details about the error? The doc for hwlo
$ errno 24
EMFILE 24 Too many open files
Ohoh that's a new one :)
Can you do a strace of the program and send the output?
If the file is big, you can send it to me in a private mail.
Brice
Le 17/09/2014 18:14, Aulwes, Rob a écrit :
> ERRNO = 24.
>
> From: Brice Goglin ma
Thanks,
I just pushed a fix. Can you verify that this tarball enables X
automatically and properly?
https://ci.inria.fr/hwloc/job/master-0-tarball/lastSuccessfulBuild/artifact/hwloc-master-20140918.1131.git005a7e8.tar.gz
I am looking at the warnings and make check failures you sent.
Brice
Le 1
Hello
Are there any graphical formats in lstopo -h ? If so maybe Cairo can export to
png etc but it cant draw a x11 window?
Check whether X11/Xlib.h and X11/Xutil.h are available.
Brice
On 24 septembre 2014 18:08:31 UTC+02:00, Dennis Jacobfeuerborn
wrote:
>Hi,
>I just compiled hwloc for Cen
Le 25/09/2014 02:22, Dennis Jacobfeuerborn a écrit :
> So I just recompiled again but using version 1.4.3 and the graphical
> output options reappeared. I also tried version 1.5.2 and this version
> will not show the graphical output options anymore so it seems something
> has changed between 1.4 a
Le 29/09/2014 19:01, Aulwes, Rob a écrit :
> Hi,
>
> I'm trying to allocate and bind memory on the same NUMA domain as the
> calling thread. The code I use is as follows.
>
> /* retrieve the single PU where the current thread actually
> runs within this process binding */
>
>
> i
Yes. Most of locality info comes from /sys/... on Linux.
Brice
Le 29/09/2014 22:59, Vishwanath Venkatesan a écrit :
> Thanks for the quick response, yes lstopo -l does make the numbers
> contiguous.
> Another question I had was, how does hwloc populate the information
> that certain cpus share a p
Dennis,
Did you have an opinion about this?
I am going to release the final hwloc v1.10 soon. So if there's
something to fix, I'd rather know it quickly.
thanks
Brice
Le 25/09/2014 07:47, Brice Goglin a écrit :
> Le 25/09/2014 02:22, Dennis Jacobfeuerborn a écrit :
>> So I ju
Le 08/10/2014 01:52, Jiri Hladky a écrit :
> 2) I have also some trouble with symlinks. The trouble is this:
>
> * when installed with ./configure && make && make install
> then hwloc-ls is symlink to lstopo-no-graphics and man pages
> { lstopo-no-graphics.1, hwloc-ls.1 } are symlinks to
Le 08/10/2014 01:52, Jiri Hladky a écrit :
> Hi Brice,
>
> glad to see the new version is out! :-)
>
> I have bumped into couple of minor problems when building new RPM for
> Fedora:
>
> 1) desktop file
> desktop-file-validate hwloc-ls.desktop.back
> hwloc-ls.desktop.back: error: file contains key
Le 09/10/2014 00:55, Jiri Hladky a écrit :
>
> * if building without cairo/X11 support, lstopo and lstopo.1 are
> symlinks. Packagers can choose to ignore lstopo and lstopo.1.
> lstopo.desktop isn't installed.
>
>
> Could you please make (in the next version)
> lstopo-no-graphics.1
> a
Le 09/10/2014 00:49, Jiri Hladky a écrit :
> Hi Brice,
>
> this sounds perfectly reasonable to me. I will make the arrangements
> on packing side.
>
> Perhaps you could add this in README file?
>
The README file is autogenerated from the huge doxygen text, which is
really for users, not for packa
Hello,
There's an R&D engineer position opening in my research team at Inria
Bordeaux (France) for developing hwloc and netloc software. All details
available at
http://runtime.bordeaux.inria.fr/goglin/201410-Engineer-hwloc+netloc.en.pdf
or French version
http://runtime.bordeaux.inria.fr/goglin/
Hello,
There's an R&D engineer position opening in my research team at Inria
Bordeaux (France) for developing hwloc and netloc software (both Open
MPI subprojects).
All details available at
http://runtime.bordeaux.inria.fr/goglin/201410-Engineer-hwloc+netloc.en.pdf
or French version
http://runt
Le 18/11/2014 14:46, Diego Regueira a écrit :
> Hi, I'm getting an error from the lstopo command.
> Please, check the attachments.
>
> Thanks
Hello,
It's a very common problem on AMD platforms unfortunately.
http://www.open-mpi.org/projects/hwloc/doc/v1.10.0/a00028.php#faq_os_error
In your case,
Hello,
Thanks, I can reproduce the problem on Debian with -O3 -m32.
The issue is that -O3 makes gcc inline more. We have function A call B
multiple times, and B calls C which contains asm with a label. So in the
end A contains the asm label from C multiple times.
Google says we should use local lab
mas.vando...@gmail.com>
>
>
> On Wed, Nov 19, 2014 at 10:42 PM, Brice Goglin <mailto:brice.gog...@inria.fr>> wrote:
>
> Hello,
> Thanks, I can reproduce the problem on Debian with -O3 -m32.
> The issue is that -O3 makes gcc inline more. We have functio
> Makefile:615: recipe for target 'check-recursive' failed
> make: *** [check-recursive] Error 1
>
> I attached the output of all of the steps and the logs. Let me know if
> you need something else.
>
> Thanks!
>
> Thomas Van Doren
> thomas.vando...@gmail.com <mailto:thoma
Le 21/11/2014 01:57, Thomas Van Doren a écrit :
> Hi Brice
>
> Thank you for the quick response! That patch fixes the build issue and
> hwloc works as expected (make check has 1 failure on 32bit, but that
> also happens on master so I didn't worry about it).
This was an overzealous assertion in th
Le 11/12/2014 21:51, Brock Palen a écrit :
> When a system has HT enabled is one core presented the real one and one the
> fake partner? Or is that not the case?
>
> If wanting to test behavior without messing with the bios how do I select
> just the 'real cores' if this is the case?
>
> I a
Hello
I am seeing assert failures on AIX 6.1 because our PU os_index is off by
one. They go from -1 to 62 instead of 0 to 63.
We have a comment saying
/* It seems logical processors are numbered from 1 here, while the
* bindprocessor functions numbers them from 0... */
This contradicts
Hello,
As explained in another mail, this yet another buggy AMD L3 cache
information reported by the hardware. The only way to *fix* this is to
tell your machine vendor to fix the L3 cache information.
The only thing we can do is remove the hwloc warning (if you don't care
about cache or NUMA aff
Hello
We don't have PCI support on Windows unfortunately. And on non-Linux
platforms, you would have PCI devices without their locality, not really
useful.
The hwloc I/O doc says:
"Note that I/O discovery requires significant help from the operating
system. The pciaccess library (the development
Hello,
hwloc_topology_init(&topology);
hwloc_topology_set_flags(topology, HWLOC_TOPOLOGY_FLAG_IO_DEVICES);
hwloc_topology_load(topology);
Then you can use hwloc_get_next_pcidev() to iterate over the entire list
PCI devices. If you want to know whether it's connected to a specific
NUMA node, start
t's enough for "comparing distances".
Brice
Le 09/01/2015 10:30, Pradeep Kiruvale a écrit :
> Hi Brice,
>
> Thanks for the reply. Is it possible to get the distance matrix for
> each cpu and the pci device from these hwloc apis?
>
> Regards,
> Pradee
Hello
This is a widespread problem with AMD machines. Buggy platform reporting
invalid L3 cache information in this case. Upgrading the BIOS may help.
Anyway, I guess Slurm doesn't care much about L3 cache affinity, so you
can ignore the error by setting HWLOC_HIDE_ERRORS=1 in the environment.
More
Hello
This is a widespread problem with AMD machines. Buggy platforms
reporting invalid L3 cache information in this case. Upgrading the BIOS
may help.
If your program doesn't care about cache affinity, you can hide/ignore
the message by setting HWLOC_HIDE_ERRORS=1 in the environment.
More detail
Hello,
This is yet another example of buggy AMD topology information unfortunately.
See
http://www.open-mpi.org/projects/hwloc/doc/v1.10.1/a00028.php#faq_os_error
In your case, NUMA and processor package/socket information are
conflicting because NUMA information is buggy. Upgrading the BIOS may
Hello,
That's an interesting question:
Even if the GPU is physically-located inside the die, it is exposed as a
"virtual" PCI device (vendor number 1002 and model number 130f), and
that's how we detect it, and that's how the driver configures it. Many
components of the CPU die are configured throu
Le 02/06/2015 23:27, Fabricio Cannini a écrit :
> Hello there
>
> Is there a way to link 'libcudart.so' and 'libnvidia-ml.so' solely to
> their respective plugin .so files, not the main libraries/executables?
>
> This is the './configure' line i'm using:
>
> ./configure --enable-shared --enable-sta
Le 04/06/2015 00:00, Fabricio Cannini a écrit :
> Hi Brice, thanks for answering.
>
> Strangely, xml_libxml and pci works fine as plugins, but nvml and cuda
> not. I had no trouble making the 'pci' and 'xml_libxml' plugins link
> to their respective libraries, leaving 'libhwloc.so' alone, but no
>
Le 04/06/2015 00:53, Fabricio Cannini a écrit :
> On 03-06-2015 19:45, Brice Goglin wrote:
>> Le 04/06/2015 00:00, Fabricio Cannini a écrit :
>>> Hi Brice, thanks for answering.
>>>
>>> Strangely, xml_libxml and pci works fine as plugins, but nvml and cuda
>&
Le 04/06/2015 01:02, Fabricio Cannini a écrit :
> LDFLAGS = -L/usr/local/cuda-6.5/lib64 -lcudart -L/usr/lib64/nvidia -lnvidia-ml
Does this line come from your environment? hwloc isn't supposed to set
LDFLAGS unless it comes from the environment. I guess that's where your
problems comes from.
Bric
Le 04/06/2015 01:17, Fabricio Cannini a écrit :
> On 03-06-2015 20:10, Brice Goglin wrote:
>> Le 04/06/2015 01:02, Fabricio Cannini a écrit :
>>> LDFLAGS = -L/usr/local/cuda-6.5/lib64 -lcudart -L/usr/lib64/nvidia
>>> -lnvidia-ml
>>
>> Does this line come from
CUDA releases before 4.0 didn't support this attribute, the #ifdef
cannot work anymore on recent CUDA releases, I'll fix that, thanks.
Interesting to know that NUMAScale machines use PCI domains.
Brice
Le 04/06/2015 14:13, Imre Kerr a écrit :
> Hi,
> Never mind, I figured it out. hwloc_cudart_ge
Hello
I don't see any significant change in v1.11 regarding embedding,
especially with respect to CONFIGURE_DEPENDENCIES.
Does v1.10 work when running autogen with the same versions of
automake/libtool/autoconf? I am using 1.14.1/2.4.2/2.69 here.
If you enter hwloc-1.11.0/tests/embedded, does ".
Hello
The 3.13 kernel reports invalid L3 cache information in sysfs. 0x3f0 is
not possible on this processor, it should be either 0x3f or 0xfc
(there's exactly one L3 per NUMA node, with the same 6 cores in them).
Can you check whether the BIOS is also the same on these machines? (see
files in /s
09/07/2015 16:26, Åke Sandgren a écrit :
> Yes the BIOS is the same.
>
> Anything else i should check?
>
> On 07/09/2015 04:10 PM, Brice Goglin wrote:
>> Hello
>>
>> The 3.13 kernel reports invalid L3 cache information in sysfs. 0x3f0 is
>> not possible on this
, Åke Sandgren a écrit :
> Attached tar file with data from both systems. See Readme file for
> kernel versions
>
> On 07/09/2015 07:54 PM, Brice Goglin wrote:
>> Can you send the output of this command on both nodes?
>> cat /sys/devices/system/cpu/cpu{?,??}/cache/index3/sha
Hello
In 1.11, they are attached to root. In theory they should be attached to Numa
nodes, so you iterate under those. However their locality information isn't
easy to find/trust (are we sure "DIMM A3" is in first numa node?) so we just
attach to root for now. It's not clear we'll fix that anyt
Hello,
hwloc 1.7 is very old, I am surprised CentOS 7 doesn't have anything
more recent, maybe not in "standard" packages?
Anyway, this is a very common error on AMD 6200 and 6300 machines.
See
http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php#faq_os_error
Assuming you kernel isn't to
gether :)
Brice
Message transféré
Sujet : EuroMPI 2015 Call for Participation - Early deadline Sept 1st
Date : Wed, 26 Aug 2015 10:41:39 +0200
De : Brice Goglin
Pour : Open MPI Users
EuroMPI 2015 Call for participation
EuroMPI 2015 in-cooperation status with ACM and SIG
ir respective next releases.
>
> Ondrej
>
>> On Monday, August 24, 2015 15:32:12 Brice Goglin wrote:
>> Hello,
>>
>> hwloc 1.7 is very old, I am surprised CentOS 7 doesn't have anything
>> more recent, maybe not in "standard" packages?
>>
>&g
Hello
This bug is about L3 cache locality only, everything else should be
fine, including cache sizes. Few applications use that locality
information, so I assume it doesn't matter for PETSc scaling.
We can work around the bug by loading a XML topology. There's no easy
way to build that correct XM
P#0 cpuset
> 0x003f) without inclusion!
> * Error occurred in topology.c line 981
> *
> ..
>
> So if you can affort the time, I apprechiate it very much!
>
> Fabian
>
>
>
> On 10/27/2015 09:52 AM, Brice Goglin wrote:
>> Hello
>>
>> This bug is
he same poor and random speedups.
>
> I tried to check the xml file by myself via
> xmllint --valid leo_brice.xml --loaddtd /usr/local/share/hwloc/hwloc.dtd
>
> However xmllint complains about hwloc.dtd itself
> /usr/local/share/hwloc/hwloc.dtd:8: parser error : StartTag: invalid
&
écrit :
> On 10/27/2015 03:42 PM, Brice Goglin wrote:
>> I guess the problem is that your OMPI uses an old hwloc internally. That
>> one may be too old to understand recent XML exports.
>> Try replacing "Package" with "Socket" everywhere in the XML file.
>
1 - 100 of 464 matches
Mail list logo