Re: [OMPI users] OpenMPI fails with np > 65

2014-08-11 Thread Lenny Verkhovsky
I don't think so, It's always the 66th node, even if I swap between 65th and 66th I also get the same error when setting np=66, while having only 65 hosts in hostfile (I am using only tcp btl ) Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com

[OMPI users] bus error with openmpi-1.8.2rc4r32485 and gcc-4.9.0

2014-08-11 Thread Siegmar Gross
Hi, thank you very much to everybody who tried to solve my bus error problem on Solaris 10 Sparc. I thought that you found and fixed it, so that I installed openmpi-1.8.2rc4r32485 on my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc1), openSUSE Linux 12.1 x86_64 (linpc1)) with

[OMPI users] SIGSEV with openmpi-1.8.2rc4r32485 on Solaris for Sun C and Java

2014-08-11 Thread Siegmar Gross
Hi, yesterday I installed openmpi-1.8.2rc4r32485 on my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc1), openSUSE Linux 12.1 x86_64 (linpc1)) with Sun C 5.12. A small Java program breaks with SIGSEV on my Solaris systems. tyr java 118 ssh linpc1 linpc1 fd1026 101 mpiexec -np 1 java

Re: [OMPI users] bus error with openmpi-1.8.2rc4r32485 and gcc-4.9.0

2014-08-11 Thread Kawashima, Takahiro
Siegmar, Ralph, I'm sorry to response so late since last week. Ralph fixed the problem in r32459 and it was merged to v1.8 in r32474. But in v1.8 an additional custom patch is needed because the db/dstore source codes are different between trunk and v1.8. I'm preparing and testing the custom

Re: [OMPI users] bus error with openmpi-1.8.2rc4r32485 and gcc-4.9.0

2014-08-11 Thread Kawashima, Takahiro
Hi Ralph, Your commit r32459 fixed the bus error by correcting opal/dss/dss_copy.c. It's OK for trunk because mca_dstore_hash calls dss to copy data. But it's insufficient for v1.8 because mca_db_hash doesn't call dss and copies data itself. The attached patch is the minimum patch to fix it in

[OMPI users] update to problem with rankfiles

2014-08-11 Thread Siegmar Gross
Hi, yesterday I installed openmpi-1.8.2rc4r32485 on my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc0, sunpc1), openSUSE Linux 12.1 x86_64 (linpc0, linpc1)) with Sun C 5.12. Today I was playing around a little bit more with rankfiles and found the following things which may be

Re: [OMPI users] problem compiling openmpi-1.8.1 on Mac running Mavericks

2014-08-11 Thread Jeff Squyres (jsquyres)
This usually indicates an error with the compiler on your machine. As Ralph implied, this may indicate that you don't have Xcode installed (and therefore don't have a compiler). You can look in config.log to be sure, or send it here (compress first, please), and we'll let you know. On Aug

Re: [OMPI users] MPI-I/O issues

2014-08-11 Thread Rob Latham
On 08/10/2014 07:32 PM, Mohamad Chaarawi wrote: Update: George suggested that I try with the 1.8.2 rc3 and that one resolves the hindexed_block segfault that I was seeing with ompi. the I/O part now works with ompio, but needs the patches from Rob in ROMIO to work correctly. The 2nd issue

Re: [OMPI users] OpenMPI fails with np > 65

2014-08-11 Thread Ralph Castain
Okay, let's start with the basics :-) How was this configured? What environment are you running in (rsh, slurm, ??)? If you configured --enable-debug, then please run it with --mca plm_base_verbose 5 --debug-daemons and send the output On Aug 11, 2014, at 12:07 AM, Lenny Verkhovsky

Re: [OMPI users] MPI-I/O issues

2014-08-11 Thread George Bosilca
The patch related to ticket #4597 is zapping only the datatypes where the user explicitly provided a zero count. We can argue about LB and UB, but I have a hard time understanding the rationale of allowing zero count only for LB and UB. If it is required by the standard we can easily support it

Re: [OMPI users] Open MPI 1.8.1: "make all" error: symbol `Lhwloc1' is already defined

2014-08-11 Thread Jeff Squyres (jsquyres)
The problem appears to be occurring in the hwloc component in OMPI. Can you download hwloc 1.7.2 (standalone) and try to build that on the target machine and see what happens? http://www.open-mpi.org/software/hwloc/v1.7/ On Aug 10, 2014, at 11:16 AM, Jorge D'Elia

Re: [OMPI users] Open MPI disappeared after Mavericks upgrade

2014-08-11 Thread Yang, David
Doug, I tried it and didn’t find anything. Thanks for the suggestion, though. David Correspondence/TSPA On Aug 10, 2014, at 10:55 AM, Douglas L Reeder > wrote: David, Try “locate mpirun”, or “find / -name mpirun -print. Doug On Aug 10,

Re: [OMPI users] Open MPI disappeared after Mavericks upgrade

2014-08-11 Thread Jeff Squyres (jsquyres)
Then it sounds like OS X wiped out your Open MPI install. It's probably safe to re-install. On Aug 11, 2014, at 11:09 AM, Yang, David wrote: > Doug, > > I tried it and didn’t find anything. Thanks for the suggestion, though. > > > David > > Correspondence/TSPA > > > >

Re: [OMPI users] MPI-I/O issues

2014-08-11 Thread George Bosilca
On Mon, Aug 11, 2014 at 10:41 AM, Rob Latham wrote: > > > On 08/11/2014 08:54 AM, George Bosilca wrote: > >> The patch related to ticket #4597 is zapping only the datatypes where >> the user explicitly provided a zero count. >> >> We can argue about LB and UB, but I have a hard

[OMPI users] Filem could not be found for one user

2014-08-11 Thread Maxime Boissonneault
Hi, I am getting a weird error when running mpiexec with one user : [mboisson@gpu-k20-14 helios_test]$ mpiexec -np 2 mdrunmpi -ntomp 10 -s prod_s6_01kcal_bb_dr -deffnm testout -- A requested component was not found, or was

Re: [OMPI users] Filem could not be found for one user

2014-08-11 Thread Ralph Castain
Check their environment for MCA params, and in their home directory for any user-level MCA param file On Mon, Aug 11, 2014 at 10:39 AM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Hi, > I am getting a weird error when running mpiexec with one user : > >

Re: [OMPI users] problem compiling openmpi-1.8.1 on Mac running Mavericks

2014-08-11 Thread Yang, David
Jeff, Doug, I do have Xcode installed. Attached is the log file. Here again is the screen dump: [9]:yangmp:xyang:% ./configure --prefix=/opt/openmpi-1.8.1 == Configuring Open MPI

Re: [OMPI users] problem compiling openmpi-1.8.1 on Mac running Mavericks

2014-08-11 Thread Jeff Squyres (jsquyres)
Something is not right in your Xcode setup -- perhaps you need to install the Xcode command line tools? Here's the relevant config.log output: - configure:5967: gcc -o conftestconftest.c >&5 conftest.c:10:19: fatal error: stdio.h: No such file or directory #include

Re: [OMPI users] problem compiling openmpi-1.8.1 on Mac running Mavericks

2014-08-11 Thread Ralph Castain
If this is an updated system (i.e., you updated the OS to Mavericks), did you remember to re-install Xcode? Mavericks requires an updated version of Xcode, and you have to reinstall the cmd line tools as well. On Mon, Aug 11, 2014 at 1:11 PM, Jeff Squyres (jsquyres) wrote:

Re: [OMPI users] problem compiling openmpi-1.8.1 on Mac running Mavericks

2014-08-11 Thread Yang, David
Xcode was the culprit! I had the latest Xcode, but I didn’t have the command line tool installed. Now Open MPI compiled ok! Thanks! David Correspondence/TSPA On Aug 11, 2014, at 2:13 PM, Ralph Castain > wrote: If this is an updated system