Also, the error message suggested that TCP is not the issue here -- the TCP hangups are likely because some other process exited unexpectedly.
Indeed: ----- mpirun noticed that process rank 0 with PID 4989 on node compute-0-1 exited on signal 4 (Illegal instruction). ----- This might be the real issue. Getting a corefile, as was already suggested, might be the best way to go forward. > On Sep 2, 2016, at 5:50 AM, John Hearns via users <users@lists.open-mpi.org> > wrote: > > Mahmood, as Giles says start by looking at how that application is compiled > and linked. > Run 'ldd' on the executable and look closely at the libraries. Do this on a > compute node if you can. > > There was a discussion on another mailign list recently about how to > fingerpritn executables and see which architecture it was compiled for. > My mind is a blank at the moment as to what that discussion concluded. Sorry. > And if this was on OpenMPI I am doubly sorry! > > > On 2 September 2016 at 10:37, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > Did you ran > ulimit -c unlimited > before invoking mpirun ? > > if your application can be ran with only one tasks, you can try to run it > under gdb. > you will hopefully be able to see where the illegal instruction occurs. > > since you are running on AMD processors, you have to make sure you are not > using any third party library that was optimized for Intel processors (e.g. > that uses AVX (SSE ?) instructions) > > Cheers, > > Gilles > > On Friday, September 2, 2016, Mahmood Naderan <mahmood...@gmail.com> wrote: > >Are you running under a batch manager ? > >On which architecture ? > Currently I am not using the job manager (which is actually PBS). I am > running from the terminal. > > The machines are AMD Opteron 64 bit > > > >Hopefully you will get a core file that points you to the illegal instruction > Where is that core file. I can not find it. > > BTW, the openmpi is 1.6.5 > > > -- > Regards, > Mahmood > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users