[In case this is still relevant.] Reuti <[email protected]> writes:
>> I've tried screen a bit before, thanks. Someone else had idea which >> might work even if the admin doesn't increase wallclock time. To >> qlogin, *then* start screen and start the debugging process, then >> detatch and logout. Then qlogin into the *same node* and >> reattach. I'm going to experiment with that, see if it works. > > Well, this would violate the granted scheduling, and AFAICS the screen > session will be terminated in a proper way due to the attached > additonal group ID. > > NB: the ownership of the generated /dev/pts/x is wrong and needs to be > fixed to have access to it as a user (in case you want to test it on > your own). That's fixed in the SGE development version. Isn't there a general solution to debugging something that crashes after a long time? Why not checkpoint at an appropriate interval and then restart under the debugger? A single-node job is likely to work OK under DMTCP, which is easy to use. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
