Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Jeff Squyres (jsquyres)
I agree with Gilles -- when you compile with one MPI implementation, but then accidentally use the mpirun/mpiexec from a different MPI implementation to launch it, it's quite a common symptom to see an MPI_COMM_WORLD size of 1 (i.e., each MPI process is rank 0 in MPI_COMM_WORLD). Make sure

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Gilles Gouaillardet
First you can run mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl if all tasks report they believe they are task 0, then this is the origin of the problem. then you can run ldd mpirun ldd xphl they should use the same mpi flavor then mpirun -machinefile ~/machinefile -np 4

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Heerdt, Lanze M.
I have run a hello world program for any number of processes. If I say "-n 16" I get 4 responses from each node saying "Hello world! I am process (0-15) of 16 on RPI-0(1-4)" so I know the cluster Can work how I want it to. I also tested with just normal hostname and I see the names of each of

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Gilles Gouaillardet
At first glance, it seems all mpi tasks believe they are rank zero and comm world size is 1 (!) Did you compile xhpl with OpenMPI (and not a stub library for serial version only) ? can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do not mix MPI librairies (e.g.

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Ralph Castain
I don't know enough about HPL to resolve the problem. However, I would suggest that you first just try to run the example programs in the examples directory to ensure you have everything working. If they work, then the problem is clearly in the HPL arena. I do note that your image reports that

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Jeff Squyres (jsquyres)
Unless the compiler can find the MXM headers/libraries without the --with-mxm value. E.g.,: ./configure CPPFLAGS=-I/path/to/mxm/headers LDFLAGS=-L/path/to/mxm/libs --with-mxm ... (or otherwise sets the compiler/linker default search paths, etc.) It seems like however it is happening, somehow

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
in that case, OPAL_CHECK_PACKAGE will disqualify mxm because it will not find mxm_api.h header file in _OPAL_CHECK_PACKAGE_HEADER macro. from https://github.com/open-mpi/ompi/blob/master/config/ompi_check_mxm.m4#L43 from config.log generated after "./configure --with-mxm" configure:263059:

[OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Heerdt, Lanze M.
I realize this may be a bit off topic, but since what I am doing seems to be a pretty commonly done thing I am hoping to find someone who has done it before/can help since I've been at my wits end for so long they are calling me Mr. Whittaker. I am trying to run HPL on a Raspberry Pi cluster.

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Jeff Squyres (jsquyres)
Mike -- I don't think that's right. If you just pass "--with-mxm", then $with_mxm will equal "yes", and therefore neither of those two blocks of code are executed. Hence, ompi_check_mxm_libdir will be empty. Right? > On May 26, 2015, at 1:28 PM, Mike Dubman

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
if just "./configure" was used - it can detect mxm only if it is installed in /usr/include/... by default mxm is installed in /opt/mellanox/mxm/... I just checked with: "./configure" and it did not detect mxm which is installed in the system space "./configure --with-mxm" and it did not detect

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread David Shrader
Hello Mike, I'm still working on getting you my config.log, but I thought I would chime in about that line 36. In my case, that code path is not executed because with_mxm is empty (I don't use --with-mxm on the configure line since libmxm.so is in system space and configure picks up on it

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread David Shrader
Hello Mike, I'm glad that I could be of help. Just as an FYI, right now our admins are still hosting the fca libraries in /opt, but they would like to have it in system-space just as they have done with mxm. I haven't worked my way through all of the fca-related logic in configure yet, so I

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
David, Could you please send me your config.log file? Looking into config/ompi_check_mxm.m4 macro I don`t understand how it could happen. Thanks a lot. On Tue, May 26, 2015 at 6:41 PM, Mike Dubman wrote: > Hello David, > Thanks for info and patch - will fix ompi

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
Hello David, Thanks for info and patch - will fix ompi configure logic with your patch. mxm can be installed in the system and user spaces - both are valid and supported logic. M On Tue, May 26, 2015 at 5:50 PM, David Shrader wrote: > Hello Mike, > > This particular

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread David Shrader
Hello Mike, This particular instance of mxm was installed using rpms that were re-rolled by our admins. I'm not 100% sure where they got them (HPCx or somewhere else). I myself am not using HPCx. Is there any particular reason why mxm shouldn't be in system space? If there is, I'll share it

Re: [OMPI users] Problems running linpack benchmark on old Sunfire opteron nodes

2015-05-26 Thread Rolf vandeVaart
I think we bumped up a default value in Open MPI 1.8.5. To go back to the old 64Mbyte value try running with: --mca mpool_sm_min_size 67108864 Rolf From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aurélien Bouteiller Sent: Tuesday, May 26, 2015 10:10 AM To: Open MPI Users Subject:

Re: [OMPI users] Problems running linpack benchmark on old Sunfire opteron nodes

2015-05-26 Thread Aurélien Bouteiller
You can also change the location of tmp files with the following mca option: -mca orte_tmpdir_base /some/place ompi_info --param all all -l 9 | grep tmp MCA orte: parameter "orte_tmpdir_base" (current value: "", data source: default, level: 9 dev/all, type: string)

Re: [OMPI users] MXM problem

2015-05-26 Thread Timur Ismagilov
It does not work for single node: 1) host: $  $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off -host node5 -mca pml yalla -x MXM_TLS=ud,self,shm --prefix $HPCX_MPI_DIR -mca plm_base_verbose 5  -mca oob_base_verbose 10 -mca rml_base_verbose 10 --debug-daemons  -np 1

Re: [OMPI users] MXM problem

2015-05-26 Thread Timur Ismagilov
1. mxm_perf_test - OK. 2. no_tree_spawn  - OK. 3. ompi yalla and "--mca pml cm --mca mtl mxm" still does not work (I use prebuild ompi-1.8.5 from hpcx-v1.3.330) 3.a) host:$  $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off -host node5,node153  --mca pml cm --mca mtl

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Mike Dubman
btw, what is a rationale to run in chroot env? is it dockers-like env? does "ibv_devinfo -v" works for you from chroot env? On Tue, May 26, 2015 at 7:08 AM, Rahul Yadav wrote: > Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of > the chroot

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Rahul Yadav
Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of the chroot environment. Thanks Rahul On Mon, May 25, 2015 at 9:03 PM, Ralph Castain wrote: > Well, it isn’t finding any MXM cards on NAE27 - do you have any there? > > You can’t use yalla without MXM

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Ralph Castain
Well, it isn’t finding any MXM cards on NAE27 - do you have any there? You can’t use yalla without MXM cards on all nodes > On May 25, 2015, at 8:51 PM, Rahul Yadav wrote: > > We were able to solve ssh problem. > > But now MPI is not able to use component yalla. We are

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Gilles Gouaillardet
Rahul, per the logs, it seems the /sys pseudo filesystem is not mounted in your chroot. at first, can you make sure this is mounted and try again ? Cheers, Gilles On 5/26/2015 12:51 PM, Rahul Yadav wrote: We were able to solve ssh problem. But now MPI is not able to use component yalla.

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Rahul Yadav
We were able to solve ssh problem. But now MPI is not able to use component yalla. We are running following command mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1 /root/app2 : -n 1 --hostfile /root/host2 /root/backend command is run in chroot environment on JARVICENAE27