Re: [hwloc-users] setting memory bindings

2014-08-19 Thread Aulwes, Rob
I'll give this a try. Thanks Brice! From: Brice Goglin > Reply-To: Hardware locality user list > List-Post: hwloc-users@lists.open-mpi.org Date: Tue, 19 Aug 2014 19:26:17 +0200 To:

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-19 Thread Reuti
Hi, Am 19.08.2014 um 19:06 schrieb Oscar Mojica: > I discovered what was the error. I forgot include the '-fopenmp' when I > compiled the objects in the Makefile, so the program worked but it didn't > divide the job in threads. Now the program is working and I can use until 15 > cores for

Re: [hwloc-users] setting memory bindings

2014-08-19 Thread Brice Goglin
You have to pass HWLOC_MEMBIND_STRICT if you want an error code when the policy isn't supported. Assuming you get the nodeset of your current binding with get_area_membind_nodeset() in bindset, you can do something like this (untested): hwloc_bitmap_t bindset, totalset, newset; int i; /* get

Re: [hwloc-users] setting memory bindings

2014-08-19 Thread Aulwes, Rob
ok, in the meantime, is there a way to manually 'replicate'? That is, if I allocate a node, I would like to find out which NUMA domain it resides in, and then allocate replicates to other domains. Are there example codes that show how to use the bitmaps for this? I've been unsuccessful in

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-19 Thread Oscar Mojica
Reuti I discovered what was the error. I forgot include the '-fopenmp' when I compiled the objects in the Makefile, so the program worked but it didn't divide the job in threads. Now the program is working and I can use until 15 cores for machine in the queue one.q. Anyway i would like to try

Re: [hwloc-users] setting memory bindings

2014-08-19 Thread Aulwes, Rob
nope, no error. is there a way to find out what policies are supported? I would like to try 'replicate'. From: Brice Goglin > Reply-To: Hardware locality user list > List-Post:

Re: [hwloc-users] setting memory bindings

2014-08-19 Thread Brice Goglin
Le 19/08/2014 18:38, Aulwes, Rob a écrit : > Hi, > > I'm trying to write a custom C++ allocator that wraps hwloc calls. > I've tried using various hwloc_alloc* functions to set the memory > bindings, but when I call hwloc_get_area_membind_nodeset to verify, I > don't get the same policy I passed

[hwloc-users] setting memory bindings

2014-08-19 Thread Aulwes, Rob
Hi, I'm trying to write a custom C++ allocator that wraps hwloc calls. I've tried using various hwloc_alloc* functions to set the memory bindings, but when I call hwloc_get_area_membind_nodeset to verify, I don't get the same policy I passed to alloc. Are there example codes that show how to

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
I am also filing a bug at Adaptive Computing since, while I do set CUDA_VISIBLE_DEVICES myself, the default value set by Torque in that case is also wrong. Maxime Le 2014-08-19 10:47, Rolf vandeVaart a écrit : Glad it was solved. I will submit a bug at NVIDIA as that does not seem like a

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Rolf vandeVaart
Glad it was solved. I will submit a bug at NVIDIA as that does not seem like a very friendly way to handle that error. >-Original Message- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime >Boissonneault >Sent: Tuesday, August 19, 2014 10:39 AM >To: Open MPI Users

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
Hi, I believe I found what the problem was. My script set the CUDA_VISIBLE_DEVICES based on the content of $PBS_GPUFILE. Since the GPUs were listed twice in the $PBS_GPUFILE because of the two nodes, I had CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 instead of

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Rolf vandeVaart
Hi: This problem does not appear to have anything to do with MPI. We are getting a SEGV during the initial call into the CUDA driver. Can you log on to gpu-k20-08, compile your simple program without MPI, and run it there? Also, maybe run dmesg on gpu-k20-08 and see if there is anything in

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
Hi, I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not give me much more information. [mboisson@gpu-k20-07 simple_cuda_mpi]$ ompi_info | grep debug Prefix: /software-gpu/mpi/openmpi/1.8.1-debug_gcc4.8_nocuda Internal debug support: yes Memory debugging

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
Indeed, there were those to problems. I took the code from here and simplified it. http://cudamusing.blogspot.ca/2011/08/cuda-mpi-and-infiniband.html However, even with the modified code here http://pastebin.com/ax6g10GZ The symptoms are still the same. Maxime Le 2014-08-19 07:59, Alex A.

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Alex A. Granovsky
Also you need to check return code from cudaMalloc before calling cudaFree - the pointer may be invalid as you did not initialized cuda properly. Alex -Original Message- From: Maxime Boissonneault Sent: Tuesday, August 19, 2014 2:19 AM To: Open MPI Users Subject: Re: [OMPI users]

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Alex A. Granovsky
Hello, I think your cuda program may be incorrect. Add proper cudaSetDevice call at the beginning and check it again. Kind regards, Alex Granovsky -Original Message- From: Maxime Boissonneault Sent: Tuesday, August 19, 2014 2:19 AM To: Open MPI Users Subject: Re: [OMPI users]

[hwloc-users] I'd like to add you to my professional network on LinkedIn

2014-08-19 Thread Yury Vorobyov
Hi Hardware, Id like to add you to my professional network on LinkedIn. - Yury Accept:

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-19 Thread Mike Dubman
so, it seems you have old ofed w/o this parameter. Can you install latest Mellanox ofed? or check which community ofed has it? On Tue, Aug 19, 2014 at 9:34 AM, Rio Yokota wrote: > Here is what "modinfo mlx4_core" gives > > filename: > >

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-19 Thread Rio Yokota
Here is what "modinfo mlx4_core" gives filename: /lib/modules/3.13.0-34-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko version:2.2-1 license:Dual BSD/GPL description:Mellanox ConnectX HCA low-level driver author: Roland Dreier srcversion: