Re: [OMPI users] Restart after code hangs

2016-06-17 Thread Alex Kaiser
An outside monitor should work. My outline of the monitor script (with advice from the sys admin) has opportunities for bugs with environment variables and such. I wanted to make sure there was not a simpler solution, or one that is less error prone. Modifying the main routine which calls the

Re: [OMPI users] Restart after code hangs

2016-06-17 Thread Ralph Castain
Sadly, no - there was some possibility of using a file monitor we had for awhile, but that isn’t in the 1.6 series. So I fear your best bet is to periodically output some kind of marker, and have a separate process that monitors to see if it is being updated. Either way would require modifying

Re: [OMPI users] OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

2016-06-17 Thread Nicolas Joly
On Fri, Jun 17, 2016 at 07:03:29PM +0900, Gilles Gouaillardet wrote: > Romio is imported from a not update mpich. > Could you give the latest mpich a try ? > > That will be helpful to figure out whether this bug has already been fixed. Just installed mpich-3.2 ... and results remains unchanged

Re: [OMPI users] OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

2016-06-17 Thread Gilles Gouaillardet
Romio is imported from a not update mpich. Could you give the latest mpich a try ? That will be helpful to figure out whether this bug has already been fixed. Cheers, Gilles Nicolas Joly wrote: >On Fri, Jun 17, 2016 at 10:15:28AM +0200, Vincent Huber wrote: >> Dear Mr.

Re: [OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

2016-06-17 Thread Nicolas Joly
On Fri, Jun 17, 2016 at 10:15:28AM +0200, Vincent Huber wrote: > Dear Mr. Joly, > > > I have tried your code on my MacBook Pro (cf. infra for details) to detail > that behavior. Thanks for testing. > Looking at openmpi-1.10.3/ompi/mca/io/romio/romio/adio/comon/ad_fstype.c to > get the list of

Re: [OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

2016-06-17 Thread Vincent Huber
Dear Mr. Joly, I have tried your code on my MacBook Pro (cf. infra for details) to detail that behavior. Looking at openmpi-1.10.3/ompi/mca/io/romio/romio/adio/comon/ad_fstype.c to get the list of file system I can test, I have tried the following: mpirun -np 2 ./sample ufs:data.txt mpirun -np 2

Re: [MTT users] Error while test-get

2016-06-17 Thread Gilles Gouaillardet
Hi, the message says the URI perl module is not found in redhat, yum install perl-URI will do the trick generally speaking, you can install perl modules with CPAN perl -MCPAN -e 'install URI' Cheers, Gilles On 6/17/2016 4:02 PM, Abhishek Joshi wrote: Hi, On trying to do a test-get

Re: [OMPI users] Restart after code hangs

2016-06-17 Thread Alex Kaiser
Dear Dr. Correa, This is indeed the structure, it is a CFD program. Most of what you are suggesting is my current workflow, including saving, sending emails upon a crash and restarting. The problem is that the code does not crash but hangs. If it is deadlocked then it sits there spinning cycles

Re: [OMPI users] Restart after code hangs

2016-06-17 Thread Alex Kaiser
Dear Dr. Castain, I'm using 1.6.5, which is pre-built on NYU's cluster. Is there any other info which would be helpful? Partial output follows. Thanks, Alex -bash-4.1$ ompi_info Package: Open MPI l...@soho.es.its.nyu.edu Distribution Open MPI: 1.6.5 ... C compiler family name: GNU C compiler