Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread Noam Bernstein
> On Nov 23, 2016, at 5:26 PM, r...@open-mpi.org wrote: > > It looks like the library may not have been fully installed on that node - > can you see if the prefix location is present, and that the LD_LIBRARY_PATH > on that node is correctly set? The referenced component did not exist prior >

Re: [OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path

2016-11-23 Thread George Bosilca
Christof, Don't use "-ffpe-trap=invalid,zero,overflow" on the pdlaiect.f file. This file implements checks for special corner cases (division by NaN and by 0) and will always trigger if you set the FPR trap. I talked with some of the ScaLAPACK developers, and their assumption is that this looks

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread r...@open-mpi.org
It looks like the library may not have been fully installed on that node - can you see if the prefix location is present, and that the LD_LIBRARY_PATH on that node is correctly set? The referenced component did not exist prior to the 2.0 series, so I’m betting that your LD_LIBRARY_PATH isn’t

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread Noam Bernstein
> On Nov 23, 2016, at 3:45 PM, George Bosilca wrote: > > Thousands reasons ;) Still trying to check if 2.0.1 fixes the problem, and discovered that earlier runs weren’t actually using the version I intended. When I do use 2.0.1, I get the following errors:

Re: [OMPI users] non-shared fs, executable in different directories

2016-11-23 Thread Jason Patton
I think I may have solved this, in case anyone is curious or wants to yell about how terrible it is :). In the ssh wrapper script, when ssh-ing, before launching orted: export HOME=${your_working_directory} \; (If $HOME means something for you jobs, then maybe this isn't a good solution.) Got

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread George Bosilca
Thousands reasons ;) https://raw.githubusercontent.com/open-mpi/ompi/v2.x/NEWS George. On Wed, Nov 23, 2016 at 1:08 PM, Noam Bernstein wrote: > On Nov 23, 2016, at 3:02 PM, George Bosilca wrote: > > Noam, > > I do not recall exactly

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread Noam Bernstein
> On Nov 23, 2016, at 3:02 PM, George Bosilca wrote: > > Noam, > > I do not recall exactly which version of Open MPI was affected, but we had > some issues with the non-reentrancy of our memory allocator. More recent > versions (1.10 and 2.0) will not have this issue. Can

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread George Bosilca
Noam, I do not recall exactly which version of Open MPI was affected, but we had some issues with the non-reentrancy of our memory allocator. More recent versions (1.10 and 2.0) will not have this issue. Can you update to a newer version of Open MPI (1.10 or maybe 2.0) and see if you can

[OMPI users] non-shared fs, executable in different directories

2016-11-23 Thread Jason Patton
I would like to mpirun across nodes that do not share a filesystem and might have the executable in different directories. For example, node0 has the executable at /tmp/job42/mpitest and node1 has it at /tmp/job100/mpitest. If you can grant me that I have a ssh wrapper script (that gets set as

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread Noam Bernstein
> On Nov 17, 2016, at 3:22 PM, Noam Bernstein > wrote: > > Hi - we’ve started seeing over the last few days crashes and hangs in > openmpi, in a code that hasn’t been touched in months, and an openmpi > installation (v. 1.8.5) that also hasn’t been touched in

Re: [OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path

2016-11-23 Thread Christof Koehler
Hello everybody, as promised I started to test on my laptop (which has only two physical cores, in case that matters). As I discovered the story is not as simple as I assumed. I was focusing on xdsyevr when testing on the workstation and overlooked the others. On the cluster the only test which

Re: [OMPI users] Follow-up to Open MPI SC'16 BOF

2016-11-23 Thread Nathan Hjelm
Integration is already in the 2.x branch. The problem is the way we handle the info key is a bit of a hack. We currently pull out one info key and pass it down to the mpool as a string. Ideally we want to just pass the info object so each mpool can define its own info keys. That requires the