Am 16.03.2011 um 19:35 schrieb Lane Schwartz: > <snip> > The job gets queued up and assigned to run, and the stderr and stdout files > are created. When a checkpointable job starts, condor and DMTCP each print a > small log message. That log message shows up in the logs. But no output from > my program appears. SGE lists my job's status as "r" but when I ssh in to the > machine where the job is running and run ps aux, ps lists my job's status as > suspended. > > When I launch my checkpointable jobs locally (not using qsub) they run and > produce immediate output. When I run those same jobs using qsub, they go into > "r" status, but never produce output and appear to not be actually running. > > On a related topic, using 6.2u5p1 I've had mixed results following the > checkpointing interface tutorial at > http://gridscheduler.sourceforge.net/howto/checkpointing.html. The initial > examples describe setting up a transparent interface and running it with some > simply shell scripts; I've been able to get these to work as described. I've > also followed the examples for setting up application-level interface with > shell scripts; that works, but only the migr_command and clean_command appear > to run. When I run example 6, which uses condor in conjunction with > transparent checkpointing, no condor checkpoint files are created.
You set usr2 as the to be used signal and waited at least min_cpu_interval? Still no checkpoint file is created in /home/checkpoint or alike? Can you try sending usr by hand to the complete process group on the node? -- Reuti > I'd love to use checkpointing, and it feels like I'm tantalizingly close to > having things working. Does anyone actually have checkpointing working with > Condor, DMTCP, or any other library using 6.2u5p1? > > Thanks, > Lane > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
