Re: [OMPI users] [omx-devel] Open-mx issue with ompi 1.6.1

2012-09-12 Thread Brice Goglin
> open-mx-devel to CC when you reply. >> >> Brice >> >> >> Le 07/09/2012 00:10, Brice Goglin a écrit : >>> Hello Doug, >>> >>> Did you use the same Open-MX version when it worked fine? Same kernel >>> too? >>> Any chance

Re: [OMPI users] [omx-devel] Open-mx issue with ompi 1.6.1

2012-09-10 Thread Brice Goglin
I replied a couple days ago (with OMPI users in CC) but got an error last night: Action: failed Status: 5.0.0 (permanent failure) Diagnostic-Code: smtp; 5.4.7 - Delivery expired (message too old) 'timeout' (delivery attempts: 0) I resent the mail this morning, it looks like it wasn't delivered

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
we're talking about 1,6MB only here. So there's still something else eating all the memory. /proc/meminfo (MemFree) and numactl -H should again help. Brice > > > > 2012/9/6 Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> > > Le 06/09/2012 1

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 09:56, Gabriele Fatigati a écrit : > Hi Brice, hi Jeff, > > >Can you add some printf inside hwloc_linux_set_area_membind() in > src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or > not? > > I added printf inside that function, but ENOMEM does not come from there.

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
osed that these two case > was the two unique possibly. > > From the hwloc documentation: > > -1 with errno set to ENOSYS if the action is not supported > -1 with errno set to EXDEV if the binding cannot be enforced > > > Any other binding failure reason?

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
upported > -1 with errno set to EXDEV if the binding cannot be enforced > > > Any other binding failure reason? The memory available is enought. > > 2012/9/5 Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> > > Hello Gabriele, > > The on

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
Hello Gabriele, The only limit that I would think of is the available physical memory on each NUMA node (numactl -H will tell you how much of each NUMA node memory is still available). malloc usually only fails (it returns NULL?) when there no *virtual* memory anymore, that's different. If you

Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Brice Goglin
Le 28/08/2012 14:23, Samuel Thibault a écrit : > Gabriele Fatigati, le Tue 28 Aug 2012 14:19:44 +0200, a écrit : >> I'm using hwloc 1.5. I would to see how GPUs are connected with the processor >> socket using lstopo command. > About connexion with the socket, there is indeed no real graphical >

Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Brice Goglin
Hello, For now, you have to look at PCI ids. NVIDIA GPUs have "10de:" as vendor/device ids, that's what is shown in your boxes on the right. We should have better GPU support in the future. Right now, we only use what Linux knows, and it knows pretty much nothing about NVIDIA GPUs because of

Re: [hwloc-users] [EXTERNAL] Re: hwloc_get_latency() failures and confusion

2012-08-06 Thread Brice Goglin
Le 07/08/2012 00:36, Wheeler, Kyle Bruce a écrit : > A, that's key! The documentation currently says "Look at ancestor > objects from the bottom to the top until one of them contains a > distance matrix that matches the objects exactly", which suggests to > me that it will traverse the object

Re: [hwloc-users] hwloc_get_latency() failures and confusion

2012-08-06 Thread Brice Goglin
Le 06/08/2012 23:47, Wheeler, Kyle Bruce a écrit : > Hello, > > I'm failing to understand what hwloc (v1.5) is doing. I'm trying to use > hwloc_get_latency() to determine the distance between two cores. > > The two cores are on different sockets. According to libnuma's numactl, the > latency

Re: [hwloc-users] anyone seen problems with PCI on RHEL 6?

2012-07-03 Thread Brice Goglin
I think I remember a similar report but I can't find it in the archives. RHEL bugzilla found https://bugzilla.redhat.com/show_bug.cgi?id=740630 which is solved in pciutils >= 3.1.4-11 Which pciutils do you have? Brice Le 03/07/2012 01:48, Carl Smith a écrit : > I happened to run "lstopo

Re: [hwloc-users] Hwloc error.

2012-05-30 Thread Brice Goglin
We don't need any other info on the hwloc side. And we thank you for testing the big hwloc warning code :) For HP: * If you're lucky, the BIOS may talk about the number of NUMA nodes (either on the usual messages during boot, or in the BIOS configuration menu). See if it says 2 on the broken node

Re: [hwloc-users] Hwloc error.

2012-05-30 Thread Brice Goglin
Le 30/05/2012 17:22, Samuel Thibault a écrit : > Hello, > > John Hanks, le Wed 30 May 2012 17:03:47 +0200, a écrit : >> * Hwloc has encountered what looks like an error from the operating system. >> * >> * object intersection without inclusion! >> * Error occurred in topology.c line 594 > There is

Re: [hwloc-users] Understanding hwloc-ps output

2012-05-30 Thread Brice Goglin
the OMPI v1.6 SVN branch) > > > On May 30, 2012, at 9:54 AM, Brice Goglin wrote: > >> Hello Youri, >> When using openmpi 1.4.4 with --np 2 --bind-to-core --bycore” it reports the >> following: >>> [hostname:03339] [[17125,0],0] odls:default:fork binding child >&

Re: [hwloc-users] hwloc_get_last_cpu_location on AIX

2012-05-29 Thread Brice Goglin
ing at get_last_cpu_location() for entire processes instead of individual threads. Brice Le 08/05/2012 14:41, Brice Goglin a écrit : > Le 08/05/2012 14:33, Hendryk Bockelmann a écrit : >> Hello, >> >> I just ran into trouble using hwloc_get_last_cpu_location on our >> PO

Re: [hwloc-users] hwloc - Build problem.

2012-05-20 Thread Brice Goglin
Hello Anatoly, You likely need to add libxml2.a to your link command-line. And some others may be missing later. Instead of linking with src/.libs/libhwloc.a, you should run "make install" and use libhwloc.a from there (use --prefix= to tell configure where to install). Once hwloc is installed,

Re: [hwloc-users] hwloc_get_last_cpu_location on AIX

2012-05-08 Thread Brice Goglin
Le 08/05/2012 14:33, Hendryk Bockelmann a écrit : > Hello, > > I just ran into trouble using hwloc_get_last_cpu_location on our > POWER6 cluster with AIX6.1 > My plan is to find out if the binding of the job-scheduler was correct > for MPI-tasks and OpenMP-threads. This is what I want to use: > >

Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-23 Thread Brice Goglin
run out of retries I default to hwloc_get_last_cpu_location(... HWLOC_CPUBIND_THREAD) -- since presumably that can't fail and the result is technically valid given hwloc_get_last_cpu_location() semantics (it reads state that's inherently transient). On Apr 23, 2012, at 7:53 AM, Brice Goglin

Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-21 Thread Brice Goglin
On 21/04/2012 23:36, Vlad wrote: Will try this within a day or two. At the moment I am simply using a retry loop on ENOSYS and usually no more than one retry is needed. Ok thanks. You are probably correct. I was thinking of this code from

Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-21 Thread Brice Goglin
On 21/04/2012 23:08, Vlad wrote: Greetings, I use hwloc-1.4.1 stable on Red Hat 5 and am seeing a possible concurrency issue not covered by the "Thread Safety" guidelines: - I start a small number (4) of threads, each of which does some work and periodically executes

Re: [hwloc-users] Using distances

2012-04-21 Thread Brice Goglin
On 21/04/2012 13:15, Jeffrey Squyres wrote: On Apr 21, 2012, at 7:09 AM, Brice Goglin wrote: I assume you have the entire distance (latency) matrix between all NUMA nodes as usually reported by the BIOS. const struct hwloc_distance_s *distances = hwloc_get_whole_distance_matrix_by_type

Re: [hwloc-users] Using distances

2012-04-21 Thread Brice Goglin
On 21/04/2012 12:23, Jeffrey Squyres wrote: I'm trying to use hwloc distances in Open MPI (e.g., find the distance from each OpenFabrics device to the PU(s) where this process is bound), and I'm a bit confused by the distances documentation. If I have a WHOLE_SYSTEM topology, and I know that

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-12 Thread Brice Goglin
child > [[43552,1],0] to cpus 000f > > Regards, > Tetsuya Mishima > >> Here's a better patch. Still only compile tested :) >> Brice >> >> >> Le 11/04/2012 10:36, Brice Goglin a écrit : >> >> A quick look at the code seems to confirm my feeling. get/

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
Here's a better patch. Still only compile tested :) Brice Le 11/04/2012 10:36, Brice Goglin a écrit : > A quick look at the code seems to confirm my feeling. get/set_module() > callbacks manipulate arrays of logical indexes, and they do not convert > them back to physical indexes befor

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
A quick look at the code seems to confirm my feeling. get/set_module() callbacks manipulate arrays of logical indexes, and they do not convert them back to physical indexes before binding. Here's a quick patch that may help. Only compile tested... Brice Le 11/04/2012 09:49, Brice Goglin

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
Le 11/04/2012 09:06, tmish...@jcity.maeda.co.jp a écrit : > Hi, Brice. > > I installed the latest hwloc-1.4.1. > Here is the output of lstopo -p. > > [root@node03 bin]# ./lstopo -p > Machine (126GB) > Socket P#0 (32GB) > NUMANode P#0 (16GB) + L3 (5118KB) > L2 (512KB) + L1 (64KB) + Core

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
Can you send the output of lstopo -p ? (you'll have to install hwloc) Brice tmish...@jcity.maeda.co.jp a écrit : Hi, I updated openmpi from version 1.5.4 to 1.5.5. Then, an execution speed of my application becomes quite slower than before, due to wrong core bindings. As far as I checked, it

Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-14 Thread Brice Goglin
We debugged this in private emails with Hartmut. His 48-core platform is now detected properly. Everything got fixed with a patch functionnally-identical to what Samuel sent earlier. There's a bit of work before we can commit the fix, but Windows support for more than 32 cores will be officially

Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-14 Thread Brice Goglin
Le 13/03/2012 19:08, Hartmut Kaiser a écrit : >> - hwloc_bitmap_from_ith_ulong(obj->cpuset, GroupMask[i].Group, >> GroupMask[i].Mask); >> + hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*GroupMask[i].Group, >> GroupMask[i].Mask & 0xfff); There's a missing 'f' above. Here's

Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-13 Thread Brice Goglin
Le 13/03/2012 17:04, Hartmut Kaiser a écrit : >>> But the problems I was seeing were not MSVC specific. It's a >>> proliferation of arcane (non-POSIX) function use (like strcasecmp, >>> etc.) missing use of HAVE_UNISTD_H, HAVE_STRINGS_H to wrap >>> non-standard headers, unsafe mixing of >>>

Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-13 Thread Brice Goglin
Le 13/03/2012 17:04, Hartmut Kaiser a écrit : >>> But the problems I was seeing were not MSVC specific. It's a >>> proliferation of arcane (non-POSIX) function use (like strcasecmp, >>> etc.) missing use of HAVE_UNISTD_H, HAVE_STRINGS_H to wrap >>> non-standard headers, unsafe mixing of >>>

Re: [hwloc-users] receive 0x0 from hwloc_cuda_get_device_cpuset

2012-02-21 Thread Brice Goglin
Le 21/02/2012 15:42, Albert Solernou a écrit : > Hi, > I have several questions in order to fix this issue from the machine > side. > > 1) I realised that on this machine neither libcpuset nor cpuset-utils > are installed. Could this be related to the problem? No, Linux "cpuset" are very

Re: [hwloc-users] bind process to built cpuset

2012-02-21 Thread Brice Goglin
t > > On Tue 21 Feb 2012 09:46:46 GMT, Albert Solernou wrote: >> Thank you very much, Brice! >> >> Best, >> Albert >> >> On Mon 20 Feb 2012 18:09:55 GMT, Brice Goglin wrote: >>> Le 20/02/2012 19:06, Brice Goglin a écrit : >>>>

Re: [hwloc-users] bind process to built cpuset

2012-02-20 Thread Brice Goglin
Le 20/02/2012 19:06, Brice Goglin a écrit : > Le 20/02/2012 17:41, Albert Solernou a écrit : >> Hi, >> I'd like to bind a process to a cpuset, so that when it spawns on >> several threads, those are trapped on that cpuset. >> >> In order to do so, I want to defin

Re: [hwloc-users] bind process to built cpuset

2012-02-20 Thread Brice Goglin
Le 20/02/2012 17:41, Albert Solernou a écrit : > Hi, > I'd like to bind a process to a cpuset, so that when it spawns on > several threads, those are trapped on that cpuset. > > In order to do so, I want to define my own cpuset. Let's say I want it > to include HWLOC_OBJ_CORE 2 and 5. How can I

Re: [hwloc-users] receive 0x0 from hwloc_cuda_get_device_cpuset

2012-02-16 Thread Brice Goglin
Le 16/02/2012 15:26, Albert Solernou a écrit : > Is there anything easy that the administrators of the cluster could > do? How could I persuade them that this is an easy task to do? They could upgrade the BIOS. But your machine is old and people didn't care much about I/O affinity in Intel

Re: [hwloc-users] PCI devices in the topology

2012-02-10 Thread Brice Goglin
Le 10/02/2012 21:46, Jeff Squyres a écrit : > On Feb 10, 2012, at 3:37 PM, Brice Goglin wrote: > >> All objects of the same type are *always* at the same depth (for caches >> and groups, replace "same type" with "same type and same level" so that >> L1

Re: [hwloc-users] PCI devices in the topology

2012-02-10 Thread Brice Goglin
Le 10/02/2012 21:16, Jeff Squyres a écrit : > When PCI devices are put into the tree, do they potentially make other > objects be a different depths? > > For example, http://www.open-mpi.org/projects/hwloc/devel09-pci.png has a PCI > bridge hanging off a socket. Are the cores on sockets P0

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Brice Goglin
Le 31/01/2012 19:02, Dave Love a écrit : >> FWIW, the Linux kernel (at least up to 3.2) still reports wrong L2 and >> L1i cache information on AMD Bulldozer. Kernel bug reported at >> https://bugzilla.kernel.org/show_bug.cgi?id=42607 > I assume that isn't relevant for open-mpi, just other things.

Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)

2012-01-31 Thread Brice Goglin
Le 31/01/2012 14:24, Jeff Squyres a écrit : > On Jan 31, 2012, at 6:18 AM, Dave Love wrote: > >> Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it >> also bites on Magny-Cours, but all our systems are currently busy and I >> can't check. >> >> It does work, at least basically,

[hwloc-users] hwloc and HTX device ?

2012-01-27 Thread Brice Goglin
Hello, I'd like to see what hwloc reports on AMD machines with a HTX card (hypertransport expansion card). The most widely known case would likely be a 3-5-years old AMD cluster with Pathscale Infinipath network cards. But I think there are also some accelerators such as clearspeed, and the

[hwloc-users] removing old cpuset API?

2012-01-19 Thread Brice Goglin
Dear hwloc users, The cpuset API (hwloc_cpuset_*) was replaced by the bitmap API (hwloc_bitmap_*) in v1.1.0, back in december 2010. We kept backward compatibility by #defin'ing the old API on top of the new one. So you may stil use the old API in your application (but you would get "deprecated"

Re: [hwloc-users] Memory replication on a linux NUMA server

2012-01-05 Thread Brice Goglin
Hello François, Replicate is not supported on Linux (and that is not going to change soon unfortunately). For now you should replicate manually. Best wishes to you too! Brice Le 05/01/2012 11:33, François Galea a écrit : > Hello, > > I am working on a Linux amd64 NUMA server running SUSE

Re: [hwloc-users] hwloc download link broken

2012-01-03 Thread Brice Goglin
Le 03/01/2012 05:32, gareth.willi...@csiro.au a écrit : > > On the page: http://www.open-mpi.org/projects/hwloc/ the 'download > page' link: http://www.open-mpi.org/software/hwloc/v1.3.1/ is broken. > > > > But http://www.open-mpi.org/software/hwloc/v1.3/ works so my work is > not stalled J > >

Re: [OMPI users] [hwloc-devel] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-12-07 Thread Brice Goglin
Le 07/12/2011 23:00, Rayson Ho a écrit : > We are using hwloc-1.2.2 for topology binding in Open Grid > Scheduler/Grid Engine 2011.11, and a user is encountering similar > issues: > > http://gridengine.org/pipermail/users/2011-December/002126.html > > In Open MPI, there is the configure switch

Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-30 Thread Brice Goglin
Le 30/11/2011 08:44, Stefan Eilemann a écrit : > Let me know if I can help. We would be quite interested in this feature. You can help by asking the relevant people for help :) * ask the OpenCL board to add an device query property that tells us the locality of a device. If they return the BusID

Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
> Hwloc optional build support status (more details can be found above): > > Probe / display PCI devices: yes > Graphical output (Cairo):yes > XML output: full "XML output" should be "XML input/output" or "XML support". > Memory support: binding, set policy,

Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Hello Stefan, hwloc 1.3 already has support for PCI device detection. These new objects contain a "class" field that can help you know if it's a NIC/GPU/... However it's hard to know which PCI device is eth0 or eth1, so we also try to add some OS device inside PCI device. If you're using Linux,

Re: [hwloc-users] How to combine hwloc-bind and mpirun

2011-11-10 Thread Brice Goglin
Le 10/11/2011 13:13, Rafael R. Pappalardo a écrit : > I am trying to send a MPI job to selected cores on a 64 cores machine. With > taskset I use: > > mpirun -np 8 taskset -c 1,3,5,7,9,11,13,15 program > > but if I substitute taskset by hwloc-bind doing > > mpirun -np 8 hwloc-bind core:1 core:3

Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-29 Thread Brice Goglin
Le 28/09/2011 23:02, Blosch, Edwin L a écrit : > Jeff, > > I've tried it now adding --without-libnuma. Actually that did NOT fix the > problem, so I can send you the full output from configure if you want, to > understand why this "hwloc" function is trying to use a function which > appears

Re: [OMPI users] Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-28 Thread Brice Goglin
Le 28/09/2011 17:55, Blosch, Edwin L a écrit : > > I am getting some undefined references in building OpenMPI 1.5.4 and I > would like to know how to work around it. > > > > The errors look like this: > > > > /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o): > In

Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
ble? I just said "you have to touch right after malloc." Brice > > 2011/9/25 Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> > > Le 25/09/2011 20:57, Gabriele Fatigati a écrit : > > after done this, memory is allocated not

Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 20:57, Gabriele Fatigati a écrit : > after done this, memory is allocated not in a local node of thread > that does set_membind and malloc, but in node of thread that touches > it. And I don't understand this behaviour :( Memory is allocated when first-touched. If there's no

Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 20:27, Gabriele Fatigati a écrit : > if(tid==0){ > > set_membind(HWLOCMEMBIND_BIND, node 0) > malloc(array)... > > } > > if (tid==1){ > set_membind(HWLOCMEMBIND_BIND, node 1) > > for(i...) > array(i) > } > > end parallel region > > > array is allocated on node 1, not node 0 as I

Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 12:19, Gabriele Fatigati a écrit : > Hi Brice, > > >The flag says "when the first touch occurs and the physical memory is > allocated for real, don't allocate on the local node (default), but > >rather allocate where specified by set_membind". > > If is it already allocated for real,

Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 11:14, Gabriele Fatigati a écrit : > > I report my questions in a different way (in the first question i did > a mistake): > > > 1) I don't understand the means of set_membind() function. Why I > should to allocate in a node "near" my cpuset and not in my local node > (where thread

Re: [hwloc-users] hwloc set membind function

2011-09-22 Thread Brice Goglin
Le 22/09/2011 12:20, Gabriele Fatigati a écrit : > NUMA node(s) near the specified cpuset. > > What does "nodes near the specified cpuset" means? The node wherethe > specified cpuset lives? > Set the default memory binding policy of the current process or thread > to prefer the The node near

Re: [OMPI users] #cpus/socket

2011-09-13 Thread Brice Goglin
Le 13/09/2011 18:59, Peter Kjellström a écrit : > On Tuesday, September 13, 2011 09:07:32 AM nn3003 wrote: >> Hello ! >> >> I am running wrf model on 4x AMD 6172 which is 12 core CPU. I use OpenMPI >> 1.4.3 and libgomp 4.3.4. I have binaries compiled for shared-memory and >> distributed-memory

Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Brice Goglin
Le 12/09/2011 14:17, Gabriele Fatigati a écrit : > Mm, and why? In a hybrid code ( MPI + OpenMP), my idea is to bind a > single MPI process in one core, and his threads in other cores. > Otherwise I have all threads that runs on a single core. > The usual way to do that is to first bind the

Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Brice Goglin
Le 12/09/2011 13:58, Gabriele Fatigati a écrit : > Hi Brice, > > but in the manual is not written that get_cpubind() returns the > logical OR of the binding of all threads... I ever understand that > returns the bind of the calloer, where the caller can be process or > thread.. A process is a

Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Brice Goglin
Le 12/09/2011 13:29, Gabriele Fatigati a écrit : > Hi Birce, > > I'm so confused.. > > I'm binding MPI processes with set_cpu_bind and it works well. The > problem is when I try to bind process and threads. > > It seem that thread process influence bind of main thread. > > And from hwloc manual:

Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Brice Goglin
Le 12/09/2011 12:52, Gabriele Fatigati a écrit : > Dear hwloc users, > > I'm binding process in a NUMA node and also associated OpenMP threads. > I've noted that, if I bind execution of all on different cores in > the same NUMA node , it works well. > > If I bind process in NUMA node 0 for

Re: [OMPI users] openmpi 1.5.4 paffinity with Magny-Cours

2011-09-09 Thread Brice Goglin
Le 09/09/2011 21:03, Kaizaad Bilimorya a écrit : > > We seem to have an issue similar to this thread > > "Bug in openmpi 1.5.4 in paffinity" > http://www.open-mpi.org/community/lists/users/2011/09/17151.php > > Using the following version of hwloc (from EPEL repo - we run CentOS 5.6) > > $

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-06 Thread Brice Goglin
contains hwloc 1.2.0) > > > On Sep 6, 2011, at 1:43 AM, Brice Goglin wrote: > >> Le 05/09/2011 21:29, Brice Goglin a écrit : >>> Dear Ake, >>> Could you try the attached patch? It's not optimized, but it's probably >>> going in the right direction.

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-06 Thread Brice Goglin
Le 05/09/2011 21:29, Brice Goglin a écrit : > Dear Ake, > Could you try the attached patch? It's not optimized, but it's probably > going in the right direction. > (and don't forget to remove the above comment-out if you tried it). Actually, now that I've seen your entire topology,

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-05 Thread Brice Goglin
Le 04/09/2011 23:30, Brice Goglin a écrit : > Le 04/09/2011 22:35, Ake Sandgren a écrit : >> On Sun, 2011-09-04 at 22:13 +0200, Brice Goglin wrote: >>> Hello, >>> >>> Could you log again on this node (with same cgroups enabled), run >>> hwlo

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-04 Thread Brice Goglin
Le 04/09/2011 22:35, Ake Sandgren a écrit : > On Sun, 2011-09-04 at 22:13 +0200, Brice Goglin wrote: >> Hello, >> >> Could you log again on this node (with same cgroups enabled), run >> hwloc-gather-topology >> and send the resulting .output and .tar.bz2? &

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-04 Thread Brice Goglin
Hello, Could you log again on this node (with same cgroups enabled), run hwloc-gather-topology and send the resulting .output and .tar.bz2? Send them to the hwloc-devel or open a ticket on https://svn.open-mpi.org/trac/hwloc (or send them to me in private if you don't want to subscribe).

[hwloc-users] Re : Re : hwloc topology check initializing

2011-09-03 Thread Brice Goglin
nitializing Date : sam., sept. 3, 2011 15:26 Hi Brice, but it works only if the user assing NULL to topology. hwloc_topology_init() does not check the argument passed ? There are no ways to check if topology is initialized or not? Thanks. 2011/9/3 Brice Goglin <brice.gog...@inria.fr>

[hwloc-users] Re : hwloc topology check initializing

2011-09-03 Thread Brice Goglin
Assign NULL to the topology when declaring the variable. It will be changed into something else when init() is called. Brice - Reply message - De : "Gabriele Fatigati" Pour : "Hardware locality user list" Objet : [hwloc-users] hwloc

Re: [hwloc-users] hwloc_get_last_cpu_location and PU

2011-08-29 Thread Brice Goglin
Yes Brice Le 29/08/2011 16:15, Gabriele Fatigati a écrit : > Dear hwloc users, > > hwloc_get_last_cpu_location() return last CPU where process/thread > ran.On SMT machine, it return the PU where process/thread ran ? > > Thanks a lot. > > -- > Ing. Gabriele Fatigati > > HPC specialist > >

Re: [hwloc-users] Numa availability

2011-08-28 Thread Brice Goglin
Le 28/08/2011 12:14, Gabriele Fatigati a écrit : > Dear hwloc users, > > what happens if I use hwloc on a non-NUMA machine? I suppose memory > binding has no sense because there is not a memory locality concept. > And regards execution binding? are there some difference on a non-NUMA > machine?

Re: [hwloc-users] Bind current thread to a specific cpu

2011-08-18 Thread Brice Goglin
Are you talking about logical ids (the one given by hwloc) or physical/OS ids (the one given by the OS and possibly in strange order) ? You should avoid using physical ids, but... If logical, you can hwloc_get_obj_by_type() to get the corresponding object, then use its ->cpuset. If physical, you

[hwloc-users] Re : lstopo on multiple machines

2011-08-16 Thread Brice Goglin
Hello Seb, Hwloc only looks at the local machine, there's no support for multinode topology detection so far. We are considering adding it but we don't know yet what users want to do with it, if it should be in the core or not, automatic or nor. Your feedback is welcome. Brice - Reply

[hwloc-users] Re : hwloc varning flag

2011-08-15 Thread Brice Goglin
No it just means that valgrind could properly check how hwloc uses mbind. But I checked the hwloc code again, things look ok, and the kernel is happy with our mbind parameters. Brice - Reply message - De : "Gabriele Fatigati" <g.fatig...@cineca.it> Pour?: "Bri

Re: [hwloc-users] hwloc varning flag

2011-08-14 Thread Brice Goglin
FWIW it's worth, it's a "bug" in valgrind. The manpage of mbind does not exactly match the kernel requirements on mbind parameters. And valgrind fails at respecting the manpage anyway. See https://bugs.kde.org/show_bug.cgi?id=280083 for the mess... Brice Le 13/08/2011 15:07, Br

Re: [hwloc-users] hwloc varning flag

2011-08-13 Thread Brice Goglin
I think I am seeing this too on a custom program, so probably not your application's fault. Brice Le 13/08/2011 10:37, Gabriele Fatigati a écrit : > > > Dearhwloc users and developers, > > I'm using hwloc 1.2 stable version Intel 11 compiled and checking my > little application with valgrind

Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
ine, > PU# are sequential (page 17), and in a non NUMA machine are not > sequential? ( page 16) > > 2011/8/1 Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> > > You're confusing object types with index types. > > PU is an object

Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
gt; "P#" prefix means "physical index". > > But from the hwloc manual, page 58: > > > HWLOC_OBJ_PU: Processing Unit, or (Logical) Processor.. > > > but it is in conflict with what you said :( > > > 2011/8/1 Brice Goglin <brice.gog...@inria.

Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
Le 01/08/2011 14:47, Gabriele Fatigati a écrit : > Hi Brice, > > so, if I inderstand well, PU P# numbers are not the same specified > as HWLOC_OBJ_PU flag? > > 2011/8/1 Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> > > Le 01/08/2011 12:

Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
Le 01/08/2011 12:16, Gabriele Fatigati a écrit : > Hi, > > reading a hwloc-v1.2-a4 manual, on page 15, i look an example > with 4-socket 2-core machine with hyperthreading. > > Core id's are not exclusive as said before. PU's id are exclusive but > not physically sequential (I suppose) > > PU P#0

Re: [hwloc-users] on using hwloc_get_area_membind_nodeset

2011-07-06 Thread Brice Goglin
be I can get away with a single call to get_mempolicy (no need to > check for all the pages in the memory area). > Thanks again > > best regards > alfredo > > > On Tue, Jul 5, 2011 at 8:34 PM, Brice Goglin <brice.gog...@inria.fr> wrote: >> Hello, >> >>

Re: [hwloc-users] on using hwloc_get_area_membind_nodeset

2011-07-05 Thread Brice Goglin
Le 05/07/2011 20:13, Alfredo Buttari a écrit : > Hi all, > if I understand correctly this routine can tell on which NUMA node(s) > a specific memory area resides, is this correct? > Will this routine work on any memory area allocated with any > allocation routine other than those provided by

Re: [OMPI users] Program hangs when using OpenMPI and CUDA

2011-06-05 Thread Brice Goglin
Le 05/06/2011 00:15, Fengguang Song a écrit : > Hi, > > I'm confronting a problem when using OpenMPI 1.5.1 on a GPU cluster. My > program uses MPI to exchange data > between nodes, and uses cudaMemcpyAsync to exchange data between Host and GPU > devices within a node. > When the MPI message size

Re: [OMPI users] anybody tried OMPI with gpudirect?

2011-03-09 Thread Brice Goglin
penib BTL flags. > > -- mca btl_openib_flags 304 > > Rolf > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Brice Goglin > Sent: Monday, February 28, 2011 11:16 AM > To: us...@open-mpi.org > Subject

Re: [OMPI users] anybody tried OMPI with gpudirect?

2011-02-28 Thread Brice Goglin
Le 28/02/2011 19:49, Rolf vandeVaart a écrit : > For the GPU Direct to work with Infiniband, you need to get some updated OFED > bits from your Infiniband vendor. > > In terms of checking the driver updates, you can do a grep on the string > get_driver_pages in the file/proc/kallsyms. If it is

Re: [OMPI users] anybody tried OMPI with gpudirect?

2011-02-28 Thread Brice Goglin
Le 28/02/2011 17:30, Rolf vandeVaart a écrit : > Hi Brice: > Yes, I have tired OMPI 1.5 with gpudirect and it worked for me. You > definitely need the patch or you will see the behavior just as you described, > a hang. One thing you could try is disabling the large message RDMA in OMPI > and

[OMPI users] anybody tried OMPI with gpudirect?

2011-02-28 Thread Brice Goglin
ever looked at this? FWIW, we're using OMPI 1.5, OFED 1.5.2, Intel MPI 4.0.0.28 and SLES11 w/ and w/o the gpudirect patch. Thanks Brice Goglin

Re: [hwloc-users] hwloc-ps output - how to verify process binding on the core level?

2011-02-14 Thread Brice Goglin
Le 14/02/2011 07:43, Siew Yin Chan a écrit : > >> >> > > No. Each hwloc-bind command in the mpirun above doesn't know that > there are other hwloc-bind instances on the same machine. All of > them bind their process to all cores in the first socket. > > => Agree. For socket:0.core:0-3

Re: [hwloc-users] Problem getting cpuset of MPI task

2011-02-09 Thread Brice Goglin
Le 09/02/2011 16:53, Hendryk Bockelmann a écrit : > Since I am new to hwloc there might be a misunderstanding from my > side, but I have a problem getting the cpuset of MPI tasks. I just > want to run a simple MPI program to see on which cores (or CPUs in > case of hyperthreading or SMT) the tasks

Re: [hwloc-users] Identifying NIC in a topology using HWLOC

2010-12-27 Thread Brice Goglin
Hello Saktheesh, NICs do not appear in the topology yet. This is under development in the libpci branch. You can take a look at https://svn.open-mpi.org/svn/hwloc/branches/libpci and tell us what you think of the interface. If you're talking about infiniband NICs, hwloc/openfabrics-verbs.h

Re: [hwloc-users] hwloc@SC10

2010-11-12 Thread Brice Goglin
; > Drop by the Cisco booth for the exact schedule; we're right next to the main > SciNet NOC. > > See you there! > > > > On Nov 8, 2010, at 11:22 AM, Brice Goglin wrote: > > >> Hello, >> For those of you going to SC10 @ New Orleans next week, you should kno

Re: [hwloc-users] xmlbuffer test failure

2010-11-05 Thread Brice Goglin
Here's the patch :) Le 05/11/2010 08:10, Brice Goglin a écrit : > Interesting, you don't have any hugepage information, it's probably > disabled in the kernel. Can you apply th attached patch and check that > the XML output only contains a single "page_type" line and th

Re: [hwloc-users] xmlbuffer test failure

2010-11-05 Thread Brice Goglin
Looks like there's something specific to your machine. Can you send the XML output of lstopo ? thanks Brice Le 05/11/2010 05:41, ryuuta a écrit : > Hi, > > I'd like to report the failure of the one of the tests run by 'make > check': > > exported to buffer 0x8546408 length 3070 > re-exported

Re: [hwloc-users] hwloc_set/get_thread_cpubind

2010-07-15 Thread Brice Goglin
Le 14/07/2010 20:28, Αλέξανδρος Παπαδογιαννάκης a écrit : > hwloc_set_thread_cpubind and hwloc_get_thread_cpubind are missing from the > html documentation > http://www.open-mpi.org/projects/hwloc/doc/v1.0.1/group__hwlocality__binding.php > > It may be

Re: [hwloc-users] Getting a graphics view for anon graphic system...

2010-07-02 Thread Brice Goglin
Le 09/06/2010 21:52, Brice Goglin a écrit : > Le 09/06/2010 21:41, Jeff Squyres a écrit : > >> On Jun 6, 2010, at 4:03 PM, Olivier Cessenat wrote: >> >> >> >>> What you write is clear to computer scientists, but I failed to figure >&g

Re: [hwloc-users] hwloc sockets support on solaris

2010-06-23 Thread Brice Goglin
Le 23/06/2010 22:27, Jeff Squyres a écrit : > Hm. We should be. Here's the hwloc plugin code for setting CPU affinity > (it's static because it's invoked by function pointer): > > static int module_set(opal_paffinity_base_cpu_set_t mask) > { > int i, ret = OPAL_SUCCESS; > hwloc_cpuset_t

Re: [hwloc-users] hwloc sockets support on solaris

2010-06-23 Thread Brice Goglin
I see this in the solaris binding core: if (hwloc_cpuset_weight(hwloc_set) != 1) { errno = EXDEV; return -1; } OMPI doesn't get this error ? Brice Le 23/06/2010 21:56, Terry Dontje a écrit : > Does hwloc think it supports binding processes to sockets or multiple > cpus? I am

Re: [hwloc-users] hwloc on cray

2010-06-23 Thread Brice Goglin
Hello Norman, I don't think anybody ever tried. But we have an entry in the TODO list saying "port to cray catamount" :) If anybody wants to port hwloc on cray, we'd be happy to help. Getting us an access on a Cray machine might also help :) Brice Le 23/06/2010 04:05, Norman Lo a écrit : >

<    1   2   3   4   5   >