Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-29 Thread Saliya Ekanayake
I meant it works now, sorry for the confusion. Running the test revealed a warning on memory registration, which we fixed by setting unlimited in ulimit -l. Then running OMPI sample worked too. Thank you, saliya On Sun, Dec 28, 2014 at 11:18 PM, Ralph Castain wrote: > So

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Ralph Castain
So you are saying the test worked, but you are still encountering an error when executing an MPI job? Or are you saying things now work? > On Dec 28, 2014, at 5:58 PM, Saliya Ekanayake wrote: > > Thank you Ralph. This produced the warning on memory limits similar to [1] >

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Saliya Ekanayake
Thank you Ralph. This produced the warning on memory limits similar to [1] and setting ulimit -l unlimited worked. [1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html Saliya On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain wrote: > Have the admin try

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Ralph Castain
Have the admin try running the ibv_ud_pingpong test - that will exercise the portion of the system under discussion. > On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake wrote: > > What I heard from the administrator is that, > > "The tests that work are the simple utilities

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Saliya Ekanayake
What I heard from the administrator is that, "The tests that work are the simple utilities ib_read_lat and ib_read_bw that measures latency and bandwith between two nodes. They are part of the "perftest" repo package." On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" wrote: >

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Saliya Ekanayake
This happens at MPI_Init. I've attached the full error message. The sys admin mentioned Infiniband utility tests ran OK. I'll contact him for more details and let you know. Thank you, Saliya On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Where

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Ralph Castain
Might also be worth checking to ensure that UD is enabled on your IB installation as we depend upon it for wireup of IB connections. > On Dec 28, 2014, at 12:18 AM, Gilles Gouaillardet > wrote: > > Where does the error occurs ? > MPI_Init ? > MPI_Finalize ? >

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Gilles Gouaillardet
Where does the error occurs ? MPI_Init ? MPI_Finalize ? In between ? In the first case, the bug is likely a mishandled error case, which means OpenMPI is unlikely the root cause of the crash. Did you check infniband is up and running on your cluster ? Cheers, Gilles Saliya Ekanayake

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-12-27 Thread Saliya Ekanayake
It's been a while on this, but we are still having trouble getting OpenMPI to work with Infiniband on this cluster. We tried with latest 1.8.4 as well, but it's still the same. To recap, we get the following error when MPI initializes (in the simple Hello world C example) with Infiniband.

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-10 Thread Saliya Ekanayake
Thank you Jeff, I'll try this and let you know. Saliya On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" wrote: > I am sorry for the delay; I've been caught up in SC deadlines. :-( > > I don't see anything blatantly wrong in this output. > > Two things: > > 1. Can you try

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-10 Thread Jeff Squyres (jsquyres)
I am sorry for the delay; I've been caught up in SC deadlines. :-( I don't see anything blatantly wrong in this output. Two things: 1. Can you try a nightly v1.8.4 snapshot tarball? This will check to see if whatever the bug is has been fixed for the upcoming release:

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-09 Thread Saliya Ekanayake
Hi Jeff, You are probably busy, but just checking if you had a chance to look at this. Thanks, Saliya On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake wrote: > Hi Jeff, > > I've attached a tar file with information. > > Thank you, > Saliya > > On Tue, Nov 4, 2014 at 4:18

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-06 Thread Saliya Ekanayake
Hi Jeff, I've attached a tar file with information. Thank you, Saliya On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) wrote: > Looks like it's failing in the openib BTL setup. > > Can you send the info listed here? > > http://www.open-mpi.org/community/help/ >

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-04 Thread Jeff Squyres (jsquyres)
Looks like it's failing in the openib BTL setup. Can you send the info listed here? http://www.open-mpi.org/community/help/ On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake wrote: > Hi, > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It builds >

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-04 Thread Saliya Ekanayake
Hi Howard, I just tried with 1.8.3. as well and it produces the same error. We have another cluster where both versions work fine, which is why I was curious as what kind of things could cause this. Thank you, Saliya On Tue, Nov 4, 2014 at 1:31 PM, Howard Pritchard wrote:

Re: [OMPI users] What could cause a segfault in OpenMPI?

2014-11-04 Thread Howard Pritchard
Hello Saliya, Would you mind trying to reproduce the problem using the latest 1.8 release - 1.8.3? Thanks, Howard 2014-11-04 11:10 GMT-07:00 Saliya Ekanayake : > Hi, > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It > builds fine, but when I try to

[OMPI users] What could cause a segfault in OpenMPI?

2014-11-04 Thread Saliya Ekanayake
Hi, I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It builds fine, but when I try to run even the simplest hello.c program it'll cause a segfault. Any suggestions on how to correct this? The steps I did and error message are below. 1. Built OpenMPI 1.8.1 on the cluster. The