Rayson, In qmon, I selected "Queue control", then in the "Cluster Queues" tab I selected "all.q" and clicked on the "Modify" button. That brings up a "Modify all.q" window; I selected the "Checkpointing" tab. That tab lists MinCpuTime=00:05:00, and lists "condor" under "Available Checkpoint Objects" and also lists "condor" under "Referenced Checkpoint Objects"
So, I think that I have associated the checkpointing interface with all.q Lane On Wed, Mar 9, 2011 at 11:36 AM, Rayson Ho <[email protected]> wrote: > If you have not added the checkpointing interface to the queue > definitions, then SGE would not schedule jobs that request for the > ckpt interface. > > You can get more info in the checkpointing howto: > http://gridscheduler.sourceforge.net/howto/checkpointing.html > > Rayson > > > > On Wed, Mar 9, 2011 at 11:25 AM, Lane Schwartz <[email protected]> wrote: >> Hi, >> >> I would like to use condor's standalone checkpointing to enable >> checkpointing jobs that are run via Sun Grid Engine (SGE). I've >> successfully compiled a toy C program using condor_compile, and I can >> successfully run, stop, and resume the job with its checkpoint file. >> >> When I attempt to run my toy using qsub as an SGE job with >> checkpointing enabled, the job gets queued up but never runs. The job >> runs fine if submitted without checkpointing. Has anyone here >> successfully run SGE jobs using condor checkpointing? >> >> For reference, here's my configuration. Within SGE's qmon utility, I >> defined a checkpoint object called "condor" the following >> configuration: >> >> Name: condor >> Interface: TRANSPARENT >> Checkpoint command: NONE >> Migrate command: NONE >> Clean command: NONE >> Checkpoint directory: /tmp >> Checkpoint When: xsr >> Checkpoint Signal: NONE >> >> To submit the job with checkpointing, I ran this: >> qsub -ckpt condor /home/lane/toy.sh -_condor >> >> Where toy.sh is: >> #!/bin/bash >> >> /usr/bin/setarch x86_64 -R -L /home/lane/toy -_condor_D_ALL >> >> >> The job as submitted above gets a "qw" status, but never runs. If I >> submitting the job without "-ckpt condor" then it runs. >> >> Any pointers to tips would be appreciated. I've done quite a bit of >> research online; it appears that this should be possible, but I just >> haven't had any success figuring out how. >> >> Cheers, >> Lane >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > -- When a place gets crowded enough to require ID's, social collapse is not far away. It is time to go elsewhere. The best thing about space travel is that it made it possible to go elsewhere. -- R.A. Heinlein, "Time Enough For Love" _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
