On Apr 23, 2011, at 12:07 PM, Reuti wrote: > Am 23.04.2011 um 19:58 schrieb Ralph Castain: > >> >> On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote: >> >>>> What about setsid and pushing it in a new >>>> seesion instead of using& in the script? >>> >>> :-) That works. Thanks! >>> >>> NB, the working script looks like: >>> >>> setsid bash -c "mpirun command>& out"& >>> tail -f out >>> >> >> Yes - but now you can't kill mpirun when something goes wrong....<shrug> > > You can still send a sigint from the command line to the mpirun process or > its process group besides killall.
Yes - or I could just have run tail in a separate shell and avoided the entire email thread and problem... :-) Whatever...so long as peace returns. > > -- Reuti > > >>> Thanks, >>> Pablo >>> >>> >>> On 23/04/11 18:39, Reuti wrote: >>>> Am 23.04.2011 um 19:33 schrieb Ralph Castain: >>>> >>>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote: >>>>> >>>>>>> I'm not sure what you are actually trying to accomplish >>>>>> I simply want a script that runs the equivalent of: >>>>>> >>>>>> mpirun command>& out& >>>>>> tail -f out >>>>>> >>>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can >>>>>> certainly do this without mpirun, >>>>> I don't think that's true. If both commands are in a script, then at >>>>> least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to >>>>> -both- processes. >>>> What about setsid and pushing it in a new seesion instead of using& in >>>> the script? >>>> >>>> -- Reuti >>>> >>>> >>>>> At least when I test it, even non-mpirun processes will abort. >>>>> >>>>>> it's not unreasonable to expect to be able to do the same with mpirun. >>>>> I'm afraid it won't work, per my earlier comments. >>>>> >>>>>> I need mpirun to either ignore the SIGINT or not receive it at all -- >>>>>> and as per your comments, ignoring it is not an option. >>>>>> >>>>>> Let me rephrase my question then. With the following script: >>>>>> >>>>>> mpirun command>& out& >>>>>> tail -f out >>>>>> >>>>>> SIGINT stops tail AND mpirun. That's OK. The following: >>>>>> >>>>>> ( >>>>>> trap : SIGINT >>>>>> mpirun command>& out& >>>>>> ) >>>>>> tail -f out >>>>>> >>>>>> has the same effect, idicating that mpirun overrides previous traps in >>>>>> the same subshell. That's OK too. However the following: >>>>>> >>>>>> ( >>>>>> trap : SIGINT >>>>>> ( >>>>>> mpirun command>& out& >>>>>> ) >>>>>> ) >>>>>> tail -f out >>>>>> >>>>>> also has the same effect. How is mpirun overriding the trap in the >>>>>> *parent* subshell so that it ends up getting the SIGINT that was >>>>>> supposedly blocked at that level? Am I missing something trivial? How >>>>>> can I avoid this? >>>>> I keep telling you - you can't. The better way to do this is to execute >>>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c >>>>> tail without mpirun seeing it. >>>>> >>>>> But you are welcome to not believe me and continue thrashing... :-/ >>>>> >>>>>> Thanks, >>>>>> Pablo >>>>>> >>>>>> >>>>>> On 23/04/11 16:27, Ralph Castain wrote: >>>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote: >>>>>>> >>>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job >>>>>>>>>> should continue. >>>>>>>>> I don't think that is true at all. When you hit ctrl-C, >>>>>>>>> every process executing in the script receives it. Mpirun >>>>>>>>> traps the ctrl-c and immediately terminates all running >>>>>>>>> MPI procs. >>>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the >>>>>>>> desired behaviour of the script, not that this is what ought >>>>>>>> to happen in general. My question is how to achieve this >>>>>>>> behaviour, since I'm having trouble working around mpirun >>>>>>>> catching sigint. >>>>>>> Like I said in my other response, you can't - mpirun automatically >>>>>>> traps sigint and terminates the job in order to ensure proper cleanup >>>>>>> during abnormal terminations. >>>>>>> >>>>>>> I'm not sure what you are actually trying to accomplish, but there are >>>>>>> other signals that don't cause termination. For example, we trap and >>>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of >>>>>>> use. >>>>>>> >>>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to >>>>>>> ignore it. >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Pablo >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 23/04/11 15:12, Ralph Castain wrote: >>>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: >>>>>>>>>> >>>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The >>>>>>>>>>> script needs to run an MPI job in the background and tail -f the >>>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should >>>>>>>>>>> continue. >>>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process >>>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and >>>>>>>>> immediately terminates all running MPI procs. >>>>>>>>> >>>>>>>>> >>>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, >>>>>>>>>>> and kills the job immediately. I've tried workarounds involving >>>>>>>>>>> nohup, disown, trap, subshells (including calling the script from >>>>>>>>>>> within itself), etc, to no avail. >>>>>>>>>>> >>>>>>>>>>> The problem is that this doesn't happen if I run the command >>>>>>>>>>> directly instead, without mpirun. Attached is a script that >>>>>>>>>>> reproduces the problem. It runs a simple counting script in the >>>>>>>>>>> background which takes 10 seconds to run, and tails the output. If >>>>>>>>>>> called with "nompi" as first argument, it will simply run bash -c >>>>>>>>>>> "$SCRIPT">& "$out"&, and with "mpi" it will do the same with >>>>>>>>>>> 'mpirun -np 1' prepended. The output I get is: >>>>>>>>>> what about: >>>>>>>>>> >>>>>>>>>> ( trap "" sigint; exec mpiexec ...)& >>>>>>>>>> >>>>>>>>>> i.e. replace the subshell with changed interrupt handling with the >>>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This >>>>>>>>>> can be checked in /proc/<pid>/status >>>>>>>>>> >>>>>>>>>> -- Reuti >>>>>>>>>> >>>>>>>>>>> $ ./ompi_bug.sh mpi >>>>>>>>>>> mpi: >>>>>>>>>>> 1 >>>>>>>>>>> 2 >>>>>>>>>>> 3 >>>>>>>>>>> 4 >>>>>>>>>>> ^C >>>>>>>>>>> $ ./ompi_bug.sh nompi >>>>>>>>>>> nompi: >>>>>>>>>>> 1 >>>>>>>>>>> 2 >>>>>>>>>>> 3 >>>>>>>>>>> 4 >>>>>>>>>>> ^C >>>>>>>>>>> $ cat output.* >>>>>>>>>>> mpi: >>>>>>>>>>> 1 >>>>>>>>>>> 2 >>>>>>>>>>> 3 >>>>>>>>>>> 4 >>>>>>>>>>> mpirun: killing job... >>>>>>>>>>> >>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme >>>>>>>>>>> exited on signal 0 (Unknown signal 0). >>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>> mpirun: clean termination accomplished >>>>>>>>>>> >>>>>>>>>>> nompi: >>>>>>>>>>> 1 >>>>>>>>>>> 2 >>>>>>>>>>> 3 >>>>>>>>>>> 4 >>>>>>>>>>> 5 >>>>>>>>>>> 6 >>>>>>>>>>> 7 >>>>>>>>>>> 8 >>>>>>>>>>> 9 >>>>>>>>>>> 10 >>>>>>>>>>> Done >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This convinces me that there is something strange with OpenMPI, >>>>>>>>>>> since I expect no difference in signal handling when running a >>>>>>>>>>> simple command with or without mpirun in the middle. >>>>>>>>>>> >>>>>>>>>>> I've tried looking for options to change this behaviour, but I >>>>>>>>>>> don't seem to find any. Is there one, preferably in the form of an >>>>>>>>>>> environment variable? Or is this a bug? >>>>>>>>>>> >>>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also >>>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Pablo >>>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users