Am 16.03.2011 um 19:35 schrieb Lane Schwartz:

> <snip> 
> The job gets queued up and assigned to run, and the stderr and stdout files 
> are created. When a checkpointable job starts, condor and DMTCP each print a 
> small log message. That log message shows up in the logs. But no output from 
> my program appears. SGE lists my job's status as "r" but when I ssh in to the 
> machine where the job is running and run ps aux, ps lists my job's status as 
> suspended.
>  
> When I launch my checkpointable jobs locally (not using qsub) they run and 
> produce immediate output. When I run those same jobs using qsub, they go into 
> "r" status, but never produce output and appear to not be actually running.
>  
> On a related topic, using 6.2u5p1 I've had mixed results following the 
> checkpointing interface tutorial at 
> http://gridscheduler.sourceforge.net/howto/checkpointing.html. The initial 
> examples describe setting up a transparent interface and running it with some 
> simply shell scripts; I've been able to get these to work as described. I've 
> also followed the examples for setting up application-level interface with 
> shell scripts; that works, but only the migr_command and clean_command appear 
> to run. When I run example 6, which uses condor in conjunction with 
> transparent checkpointing, no condor checkpoint files are created.

You set usr2 as the to be used signal and waited at least min_cpu_interval? 
Still no checkpoint file is created in /home/checkpoint or alike? Can you try 
sending usr by hand to the complete process group on the node?

-- Reuti


> I'd love to use checkpointing, and it feels like I'm tantalizingly close to 
> having things working. Does anyone actually have checkpointing working with 
> Condor, DMTCP, or any other library using 6.2u5p1?
>  
> Thanks,
> Lane
>  
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to