On Apr 23, 2011, at 12:07 PM, Reuti wrote:

> Am 23.04.2011 um 19:58 schrieb Ralph Castain:
> 
>> 
>> On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote:
>> 
>>>> What about setsid and pushing it in a new
>>>> seesion instead of using&  in the script?
>>> 
>>> :-) That works. Thanks!
>>> 
>>> NB, the working script looks like:
>>> 
>>> setsid bash -c "mpirun command>&  out"&
>>> tail -f out
>>> 
>> 
>> Yes - but now you can't kill mpirun when something goes wrong....<shrug>
> 
> You can still send a sigint from the command line to the mpirun process or 
> its process group besides killall.

Yes - or I could just have run tail in a separate shell and avoided the entire 
email thread and problem... :-)

Whatever...so long as peace returns.


> 
> -- Reuti
> 
> 
>>> Thanks,
>>> Pablo
>>> 
>>> 
>>> On 23/04/11 18:39, Reuti wrote:
>>>> Am 23.04.2011 um 19:33 schrieb Ralph Castain:
>>>> 
>>>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>>>>> 
>>>>>>> I'm not sure what you are actually trying to accomplish
>>>>>> I simply want a script that runs the equivalent of:
>>>>>> 
>>>>>> mpirun command>&   out&
>>>>>> tail -f out
>>>>>> 
>>>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can 
>>>>>> certainly do this without mpirun,
>>>>> I don't think that's true. If both commands are in a script, then at 
>>>>> least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to 
>>>>> -both- processes.
>>>> What about setsid and pushing it in a new seesion instead of using&  in 
>>>> the script?
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> At least when I test it, even non-mpirun processes will abort.
>>>>> 
>>>>>> it's not unreasonable to expect to be able to do the same with mpirun.
>>>>> I'm afraid it won't work, per my earlier comments.
>>>>> 
>>>>>> I need mpirun to either ignore the SIGINT or not receive it at all -- 
>>>>>> and as per your comments, ignoring it is not an option.
>>>>>> 
>>>>>> Let me rephrase my question then. With the following script:
>>>>>> 
>>>>>> mpirun command>&   out&
>>>>>> tail -f out
>>>>>> 
>>>>>> SIGINT stops tail AND mpirun. That's OK. The following:
>>>>>> 
>>>>>> (
>>>>>> trap : SIGINT
>>>>>> mpirun command>&   out&
>>>>>> )
>>>>>> tail -f out
>>>>>> 
>>>>>> has the same effect, idicating that mpirun overrides previous traps in 
>>>>>> the same subshell. That's OK too. However the following:
>>>>>> 
>>>>>> (
>>>>>> trap : SIGINT
>>>>>> (
>>>>>> mpirun command>&   out&
>>>>>> )
>>>>>> )
>>>>>> tail -f out
>>>>>> 
>>>>>> also has the same effect. How is mpirun overriding the trap in the 
>>>>>> *parent* subshell so that it ends up getting the SIGINT that was 
>>>>>> supposedly blocked at that level? Am I missing something trivial? How 
>>>>>> can I avoid this?
>>>>> I keep telling you - you can't. The better way to do this is to execute 
>>>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c 
>>>>> tail without mpirun seeing it.
>>>>> 
>>>>> But you are welcome to not believe me and continue thrashing... :-/
>>>>> 
>>>>>> Thanks,
>>>>>> Pablo
>>>>>> 
>>>>>> 
>>>>>> On 23/04/11 16:27, Ralph Castain wrote:
>>>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>>>>>> 
>>>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>>>>>> should continue.
>>>>>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>>>>>> every process executing in the script receives it. Mpirun
>>>>>>>>> traps the ctrl-c and immediately terminates all running
>>>>>>>>> MPI procs.
>>>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>>>>>> desired behaviour of the script, not that this is what ought
>>>>>>>> to happen in general. My question is how to achieve this
>>>>>>>> behaviour, since I'm having trouble working around mpirun
>>>>>>>> catching sigint.
>>>>>>> Like I said in my other response, you can't - mpirun automatically 
>>>>>>> traps sigint and terminates the job in order to ensure proper cleanup 
>>>>>>> during abnormal terminations.
>>>>>>> 
>>>>>>> I'm not sure what you are actually trying to accomplish, but there are 
>>>>>>> other signals that don't cause termination. For example, we trap and 
>>>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of 
>>>>>>> use.
>>>>>>> 
>>>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to 
>>>>>>> ignore it.
>>>>>>> 
>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pablo
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>>>>>> 
>>>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The 
>>>>>>>>>>> script needs to run an MPI job in the background and tail -f the 
>>>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should 
>>>>>>>>>>> continue.
>>>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process 
>>>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and 
>>>>>>>>> immediately terminates all running MPI procs.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, 
>>>>>>>>>>> and kills the job immediately. I've tried workarounds involving 
>>>>>>>>>>> nohup, disown, trap, subshells (including calling the script from 
>>>>>>>>>>> within itself), etc, to no avail.
>>>>>>>>>>> 
>>>>>>>>>>> The problem is that this doesn't happen if I run the command 
>>>>>>>>>>> directly instead, without mpirun. Attached is a script that 
>>>>>>>>>>> reproduces the problem. It runs a simple counting script in the 
>>>>>>>>>>> background which takes 10 seconds to run, and tails the output. If 
>>>>>>>>>>> called with "nompi" as first argument, it will simply run bash -c 
>>>>>>>>>>> "$SCRIPT">&    "$out"&, and with "mpi" it will do the same with 
>>>>>>>>>>> 'mpirun -np 1' prepended. The output I get is:
>>>>>>>>>> what about:
>>>>>>>>>> 
>>>>>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>>>>>> 
>>>>>>>>>> i.e. replace the subshell with changed interrupt handling with the 
>>>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This 
>>>>>>>>>> can be checked in /proc/<pid>/status
>>>>>>>>>> 
>>>>>>>>>> -- Reuti
>>>>>>>>>> 
>>>>>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>>>>>> mpi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> ^C
>>>>>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>>>>>> nompi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> ^C
>>>>>>>>>>> $ cat output.*
>>>>>>>>>>> mpi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> mpirun: killing job...
>>>>>>>>>>> 
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme 
>>>>>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>>> 
>>>>>>>>>>> nompi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> 5
>>>>>>>>>>> 6
>>>>>>>>>>> 7
>>>>>>>>>>> 8
>>>>>>>>>>> 9
>>>>>>>>>>> 10
>>>>>>>>>>> Done
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> This convinces me that there is something strange with OpenMPI, 
>>>>>>>>>>> since I expect no difference in signal handling when running a 
>>>>>>>>>>> simple command with or without mpirun in the middle.
>>>>>>>>>>> 
>>>>>>>>>>> I've tried looking for options to change this behaviour, but I 
>>>>>>>>>>> don't seem to find any. Is there one, preferably in the form of an 
>>>>>>>>>>> environment variable? Or is this a bug?
>>>>>>>>>>> 
>>>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also 
>>>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Pablo
>>>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to