Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Ralph Castain

On Aug 7, 2014, at 9:17 AM, Gus Correa  wrote:

> On 08/07/2014 11:49 AM, Ralph Castain wrote:
>> 
>> On Aug 7, 2014, at 8:47 AM, Reuti > > wrote:
>> 
>>> Am 07.08.2014 um 17:28 schrieb Gus Correa:
>>> 
 I guess Control-C will kill only the mpirun process.
 You may need to kill the (two) jules.exe processes separately,
 say, with kill -9.
 ps -u "yourname"
 will show what you have running.
>>> 
>>> Shouldn't Open MPI clean this up in a proper way when Control-C is
>>> pressed?
>> 
>> So far as I know, it does...
>> 
> 
> How about processes in D state, waiting for a slow/busy NFS server?
> Could this prevent Control-C to do the right thing?

Should be okay - basically, we hit it with a sigterm and then follow-up with a 
sigkill.

> 
> Gus Correa
> 
>>> 
>>> But maybe there is something left in /tmp like
>>> "openmpi-sessions-...@..." which needs to be removed.
>>> 
>>> -- Reuti
>>> 
>>> 
 On 08/07/2014 11:16 AM, Jane Lewis wrote:
> Hi all,
> 
> This is a really simple problem (I hope) where I’ve introduced MPI to a
> complex numerical model which I have to kill occasionally with Control-C
> as I don’t want it running forever.
> 
> I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and
> mpi_finalize()– there are no send/receive calls going on at the moment –
> and I only have two instances. My startup command is:
> 
> #/bin/bash
> 
> /opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe
> 
> where hostfile has one entry : localhost
> 
> The result of terminating the process with Control-C at the command
> prompt from where I launched it, is that I am then unable to run it
> again. I get the
> 
> “mpirun has exited due to process rank 0 with PID 10094 on node
> metclcv10.local exiting improperly. There are two reasons this could
> occur:…” error each time despite checking running processes for
> stragglers, closing my terminal, or changing node.
> 
> I have spent several hours searching for an answer to this, if it’s
> already somewhere then please point me in the right direction.
> 
> many thanks in advance
> 
> Jane
> 
> For info:
> 
> #ompi_info -v ompi full --parsable
> 
> package:Open MPI root@centos-6-3.localdomain
>  Distribution
> 
> ompi:version:full:1.6.2
> 
> ompi:version:svn:r27344
> 
> ompi:version:release_date:Sep 18, 2012
> 
> orte:version:full:1.6.2
> 
> orte:version:svn:r27344
> 
> orte:version:release_date:Sep 18, 2012
> 
> opal:version:full:1.6.2
> 
> opal:version:svn:r27344
> 
> opal:version:release_date:Sep 18, 2012
> 
> mpi-api:version:full:2.1
> 
> ident:1.6.2
> 
> I’m using centos-6-3 and FORTRAN.
> 
> Jane Lewis
> 
> Deputy Technical Director, Reading e-Science Centre
> 
> Department of Meteorology
> 
> University of Reading, UK
> 
> Tel: +44 (0)118 378 5173
> 
> http://www.resc.reading.ac.uk 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/08/24938.php
> 
 
 ___
 users mailing list
 us...@open-mpi.org 
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post:
 http://www.open-mpi.org/community/lists/users/2014/08/24939.php
 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org 
>>> Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this
>>> post:http://www.open-mpi.org/community/lists/users/2014/08/24940.php
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/24941.php
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24945.php



Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Gus Correa

On 08/07/2014 11:49 AM, Ralph Castain wrote:


On Aug 7, 2014, at 8:47 AM, Reuti > wrote:


Am 07.08.2014 um 17:28 schrieb Gus Correa:


I guess Control-C will kill only the mpirun process.
You may need to kill the (two) jules.exe processes separately,
say, with kill -9.
ps -u "yourname"
will show what you have running.


Shouldn't Open MPI clean this up in a proper way when Control-C is
pressed?


So far as I know, it does...



How about processes in D state, waiting for a slow/busy NFS server?
Could this prevent Control-C to do the right thing?

Gus Correa



But maybe there is something left in /tmp like
"openmpi-sessions-...@..." which needs to be removed.

-- Reuti



On 08/07/2014 11:16 AM, Jane Lewis wrote:

Hi all,

This is a really simple problem (I hope) where I’ve introduced MPI to a
complex numerical model which I have to kill occasionally with Control-C
as I don’t want it running forever.

I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and
mpi_finalize()– there are no send/receive calls going on at the moment –
and I only have two instances. My startup command is:

#/bin/bash

/opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe

where hostfile has one entry : localhost

The result of terminating the process with Control-C at the command
prompt from where I launched it, is that I am then unable to run it
again. I get the

“mpirun has exited due to process rank 0 with PID 10094 on node
metclcv10.local exiting improperly. There are two reasons this could
occur:…” error each time despite checking running processes for
stragglers, closing my terminal, or changing node.

I have spent several hours searching for an answer to this, if it’s
already somewhere then please point me in the right direction.

many thanks in advance

Jane

For info:

#ompi_info -v ompi full --parsable

package:Open MPI root@centos-6-3.localdomain
 Distribution

ompi:version:full:1.6.2

ompi:version:svn:r27344

ompi:version:release_date:Sep 18, 2012

orte:version:full:1.6.2

orte:version:svn:r27344

orte:version:release_date:Sep 18, 2012

opal:version:full:1.6.2

opal:version:svn:r27344

opal:version:release_date:Sep 18, 2012

mpi-api:version:full:2.1

ident:1.6.2

I’m using centos-6-3 and FORTRAN.

Jane Lewis

Deputy Technical Director, Reading e-Science Centre

Department of Meteorology

University of Reading, UK

Tel: +44 (0)118 378 5173

http://www.resc.reading.ac.uk 



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/24938.php



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/24939.php



___
users mailing list
us...@open-mpi.org 
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2014/08/24940.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/24941.php





Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Gus Correa

On 08/07/2014 11:28 AM, Gus Correa wrote:

I guess Control-C will kill only the mpirun process.
You may need to kill the (two) jules.exe processes separately,
say, with kill -9.
ps -u "yourname"
will show what you have running.



Something may have been left behind by Control-C,
although as Reuti and Ralph said that is unusual.
Besides the jules.exe and mpiexec,
check for the orted process also,
which may need to be killed manually.
Just in case: Better not run MPI programs as root,
but as a regular user.




On 08/07/2014 11:16 AM, Jane Lewis wrote:

Hi all,

This is a really simple problem (I hope) where I’ve introduced MPI to a
complex numerical model which I have to kill occasionally with Control-C
as I don’t want it running forever.

I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and
mpi_finalize()– there are no send/receive calls going on at the moment –
and I only have two instances. My startup command is:

#/bin/bash

/opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe

where hostfile has one entry : localhost

The result of terminating the process with Control-C at the command
prompt from where I launched it, is that I am then unable to run it
again. I get the

“mpirun has exited due to process rank 0 with PID 10094 on node
metclcv10.local exiting improperly. There are two reasons this could
occur:…” error each time despite checking running processes for
stragglers, closing my terminal, or changing node.

I have spent several hours searching for an answer to this, if it’s
already somewhere then please point me in the right direction.

many thanks in advance

Jane

For info:

#ompi_info -v ompi full --parsable

package:Open MPI root@centos-6-3.localdomain Distribution

ompi:version:full:1.6.2

ompi:version:svn:r27344

ompi:version:release_date:Sep 18, 2012

orte:version:full:1.6.2

orte:version:svn:r27344

orte:version:release_date:Sep 18, 2012

opal:version:full:1.6.2

opal:version:svn:r27344

opal:version:release_date:Sep 18, 2012

mpi-api:version:full:2.1

ident:1.6.2

I’m using centos-6-3 and FORTRAN.

Jane Lewis

Deputy Technical Director, Reading e-Science Centre

Department of Meteorology

University of Reading, UK

Tel: +44 (0)118 378 5173

http://www.resc.reading.ac.uk 



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/24938.php



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/24939.php




Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Jeff Squyres (jsquyres)
Can you try upgrading?  1.6.x is super old.  1.8.1 is the current stable 
release.


On Aug 7, 2014, at 11:16 AM, Jane Lewis  wrote:

> Hi all,
>  
> This is a really simple problem (I hope) where I’ve introduced MPI to a 
> complex numerical model which I have to kill occasionally with Control-C as I 
> don’t want it running forever.
> I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and 
> mpi_finalize()– there are no send/receive calls going on at the moment – and 
> I only have two instances. My startup command is:
>  
> #/bin/bash
> /opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe
>  
> where hostfile has one entry : localhost
>  
> The result of terminating the process with Control-C at the command prompt 
> from where I launched it, is that I am then unable to run it again. I get the
> “mpirun has exited due to process rank 0 with PID 10094 on node 
> metclcv10.local exiting improperly. There are two reasons this could occur:…” 
> error each time despite checking running processes for stragglers, closing my 
> terminal, or changing node.
>  
> I have spent several hours searching for an answer to this, if it’s already 
> somewhere then please point me in the right direction.
>  
> many thanks in advance
> Jane
>  
> For info:
> #ompi_info -v ompi full --parsable
> package:Open MPI root@centos-6-3.localdomain Distribution
> ompi:version:full:1.6.2
> ompi:version:svn:r27344
> ompi:version:release_date:Sep 18, 2012
> orte:version:full:1.6.2
> orte:version:svn:r27344
> orte:version:release_date:Sep 18, 2012
> opal:version:full:1.6.2
> opal:version:svn:r27344
> opal:version:release_date:Sep 18, 2012
> mpi-api:version:full:2.1
> ident:1.6.2
>  
> I’m using centos-6-3 and FORTRAN.
>  
>  
> Jane Lewis
> Deputy Technical Director, Reading e-Science Centre
> Department of Meteorology
> University of Reading, UK
> Tel: +44 (0)118 378 5173
> http://www.resc.reading.ac.uk
>  
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24938.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Ralph Castain

On Aug 7, 2014, at 8:47 AM, Reuti  wrote:

> Am 07.08.2014 um 17:28 schrieb Gus Correa:
> 
>> I guess Control-C will kill only the mpirun process.
>> You may need to kill the (two) jules.exe processes separately,
>> say, with kill -9.
>> ps -u "yourname"
>> will show what you have running.
> 
> Shouldn't Open MPI clean this up in a proper way when Control-C is pressed?

So far as I know, it does...

> 
> But maybe there is something left in /tmp like "openmpi-sessions-...@..." 
> which needs to be removed.
> 
> -- Reuti
> 
> 
>> On 08/07/2014 11:16 AM, Jane Lewis wrote:
>>> Hi all,
>>> 
>>> This is a really simple problem (I hope) where I’ve introduced MPI to a
>>> complex numerical model which I have to kill occasionally with Control-C
>>> as I don’t want it running forever.
>>> 
>>> I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and
>>> mpi_finalize()– there are no send/receive calls going on at the moment –
>>> and I only have two instances. My startup command is:
>>> 
>>> #/bin/bash
>>> 
>>> /opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe
>>> 
>>> where hostfile has one entry : localhost
>>> 
>>> The result of terminating the process with Control-C at the command
>>> prompt from where I launched it, is that I am then unable to run it
>>> again. I get the
>>> 
>>> “mpirun has exited due to process rank 0 with PID 10094 on node
>>> metclcv10.local exiting improperly. There are two reasons this could
>>> occur:…” error each time despite checking running processes for
>>> stragglers, closing my terminal, or changing node.
>>> 
>>> I have spent several hours searching for an answer to this, if it’s
>>> already somewhere then please point me in the right direction.
>>> 
>>> many thanks in advance
>>> 
>>> Jane
>>> 
>>> For info:
>>> 
>>> #ompi_info -v ompi full --parsable
>>> 
>>> package:Open MPI root@centos-6-3.localdomain Distribution
>>> 
>>> ompi:version:full:1.6.2
>>> 
>>> ompi:version:svn:r27344
>>> 
>>> ompi:version:release_date:Sep 18, 2012
>>> 
>>> orte:version:full:1.6.2
>>> 
>>> orte:version:svn:r27344
>>> 
>>> orte:version:release_date:Sep 18, 2012
>>> 
>>> opal:version:full:1.6.2
>>> 
>>> opal:version:svn:r27344
>>> 
>>> opal:version:release_date:Sep 18, 2012
>>> 
>>> mpi-api:version:full:2.1
>>> 
>>> ident:1.6.2
>>> 
>>> I’m using centos-6-3 and FORTRAN.
>>> 
>>> Jane Lewis
>>> 
>>> Deputy Technical Director, Reading e-Science Centre
>>> 
>>> Department of Meteorology
>>> 
>>> University of Reading, UK
>>> 
>>> Tel: +44 (0)118 378 5173
>>> 
>>> http://www.resc.reading.ac.uk 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/08/24938.php
>>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/24939.php
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24940.php



Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Reuti
Am 07.08.2014 um 17:28 schrieb Gus Correa:

> I guess Control-C will kill only the mpirun process.
> You may need to kill the (two) jules.exe processes separately,
> say, with kill -9.
> ps -u "yourname"
> will show what you have running.

Shouldn't Open MPI clean this up in a proper way when Control-C is pressed?

But maybe there is something left in /tmp like "openmpi-sessions-...@..." which 
needs to be removed.

-- Reuti


> On 08/07/2014 11:16 AM, Jane Lewis wrote:
>> Hi all,
>> 
>> This is a really simple problem (I hope) where I’ve introduced MPI to a
>> complex numerical model which I have to kill occasionally with Control-C
>> as I don’t want it running forever.
>> 
>> I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and
>> mpi_finalize()– there are no send/receive calls going on at the moment –
>> and I only have two instances. My startup command is:
>> 
>> #/bin/bash
>> 
>> /opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe
>> 
>> where hostfile has one entry : localhost
>> 
>> The result of terminating the process with Control-C at the command
>> prompt from where I launched it, is that I am then unable to run it
>> again. I get the
>> 
>> “mpirun has exited due to process rank 0 with PID 10094 on node
>> metclcv10.local exiting improperly. There are two reasons this could
>> occur:…” error each time despite checking running processes for
>> stragglers, closing my terminal, or changing node.
>> 
>> I have spent several hours searching for an answer to this, if it’s
>> already somewhere then please point me in the right direction.
>> 
>> many thanks in advance
>> 
>> Jane
>> 
>> For info:
>> 
>> #ompi_info -v ompi full --parsable
>> 
>> package:Open MPI root@centos-6-3.localdomain Distribution
>> 
>> ompi:version:full:1.6.2
>> 
>> ompi:version:svn:r27344
>> 
>> ompi:version:release_date:Sep 18, 2012
>> 
>> orte:version:full:1.6.2
>> 
>> orte:version:svn:r27344
>> 
>> orte:version:release_date:Sep 18, 2012
>> 
>> opal:version:full:1.6.2
>> 
>> opal:version:svn:r27344
>> 
>> opal:version:release_date:Sep 18, 2012
>> 
>> mpi-api:version:full:2.1
>> 
>> ident:1.6.2
>> 
>> I’m using centos-6-3 and FORTRAN.
>> 
>> Jane Lewis
>> 
>> Deputy Technical Director, Reading e-Science Centre
>> 
>> Department of Meteorology
>> 
>> University of Reading, UK
>> 
>> Tel: +44 (0)118 378 5173
>> 
>> http://www.resc.reading.ac.uk 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/24938.php
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/24939.php
> 



Re: [OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Gus Correa

I guess Control-C will kill only the mpirun process.
You may need to kill the (two) jules.exe processes separately,
say, with kill -9.
ps -u "yourname"
will show what you have running.


On 08/07/2014 11:16 AM, Jane Lewis wrote:

Hi all,

This is a really simple problem (I hope) where I’ve introduced MPI to a
complex numerical model which I have to kill occasionally with Control-C
as I don’t want it running forever.

I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and
mpi_finalize()– there are no send/receive calls going on at the moment –
and I only have two instances. My startup command is:

#/bin/bash

/opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe

where hostfile has one entry : localhost

The result of terminating the process with Control-C at the command
prompt from where I launched it, is that I am then unable to run it
again. I get the

“mpirun has exited due to process rank 0 with PID 10094 on node
metclcv10.local exiting improperly. There are two reasons this could
occur:…” error each time despite checking running processes for
stragglers, closing my terminal, or changing node.

I have spent several hours searching for an answer to this, if it’s
already somewhere then please point me in the right direction.

many thanks in advance

Jane

For info:

#ompi_info -v ompi full --parsable

package:Open MPI root@centos-6-3.localdomain Distribution

ompi:version:full:1.6.2

ompi:version:svn:r27344

ompi:version:release_date:Sep 18, 2012

orte:version:full:1.6.2

orte:version:svn:r27344

orte:version:release_date:Sep 18, 2012

opal:version:full:1.6.2

opal:version:svn:r27344

opal:version:release_date:Sep 18, 2012

mpi-api:version:full:2.1

ident:1.6.2

I’m using centos-6-3 and FORTRAN.

Jane Lewis

Deputy Technical Director, Reading e-Science Centre

Department of Meteorology

University of Reading, UK

Tel: +44 (0)118 378 5173

http://www.resc.reading.ac.uk 



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/24938.php





[OMPI users] Newbie query - mpirun will not run if it's previously been killed with Control-C

2014-08-07 Thread Jane Lewis
Hi all,

This is a really simple problem (I hope) where I've introduced MPI to a complex 
numerical model which I have to kill occasionally with Control-C as I don't 
want it running forever.
I have only used mpi_init(), mpi_comm_size(), mpi_comm_rank() and 
mpi_finalize()- there are no send/receive calls going on at the moment - and I 
only have two instances. My startup command is:

#/bin/bash
/opt/openmpi/bin/mpirun  -np 2 -hostfile hostfile jules.exe

where hostfile has one entry : localhost

The result of terminating the process with Control-C at the command prompt from 
where I launched it, is that I am then unable to run it again. I get the
"mpirun has exited due to process rank 0 with PID 10094 on node metclcv10.local 
exiting improperly. There are two reasons this could occur:..." error each time 
despite checking running processes for stragglers, closing my terminal, or 
changing node.

I have spent several hours searching for an answer to this, if it's already 
somewhere then please point me in the right direction.

many thanks in advance
Jane

For info:
#ompi_info -v ompi full --parsable
package:Open MPI root@centos-6-3.localdomain Distribution
ompi:version:full:1.6.2
ompi:version:svn:r27344
ompi:version:release_date:Sep 18, 2012
orte:version:full:1.6.2
orte:version:svn:r27344
orte:version:release_date:Sep 18, 2012
opal:version:full:1.6.2
opal:version:svn:r27344
opal:version:release_date:Sep 18, 2012
mpi-api:version:full:2.1
ident:1.6.2

I'm using centos-6-3 and FORTRAN.


Jane Lewis
Deputy Technical Director, Reading e-Science Centre
Department of Meteorology
University of Reading, UK
Tel: +44 (0)118 378 5173
http://www.resc.reading.ac.uk