Am 09.03.2011 um 18:27 schrieb Lane Schwartz: > I'm using SGE v6.2u6. My understanding is that in v6, the
There is a forum at Oracle for the users of their commercial version: http://forums.oracle.com/forums/forum.jspa?forumID=859 -- Reuti > checkpointing object is added to the queue instead of defining a > queue_list in the checkpoint object. > (http://gridscheduler.sourceforge.net/howto/checkpointing.html) > > Just to be sure, I tried defining a queue_list using qconf -Ackpt, and > got the error: > error: unknown attribute name "queue_list" > > Thanks, > Lane > > On Wed, Mar 9, 2011 at 11:38 AM, "Hung-Sheng Tsao (laotsao 老曹 ) Ph.D" > <[email protected]> wrote: >> it seems that you need to add all.q to >> queue_list in your checkpoint object >> >> On 03/ 9/11 11:25 AM, Lane Schwartz wrote: >>> >>> Hi, >>> >>> I would like to use condor's standalone checkpointing to enable >>> checkpointing jobs that are run via Sun Grid Engine (SGE). I've >>> successfully compiled a toy C program using condor_compile, and I can >>> successfully run, stop, and resume the job with its checkpoint file. >>> >>> When I attempt to run my toy using qsub as an SGE job with >>> checkpointing enabled, the job gets queued up but never runs. The job >>> runs fine if submitted without checkpointing. Has anyone here >>> successfully run SGE jobs using condor checkpointing? >>> >>> For reference, here's my configuration. Within SGE's qmon utility, I >>> defined a checkpoint object called "condor" the following >>> configuration: >>> >>> Name: condor >>> Interface: TRANSPARENT >>> Checkpoint command: NONE >>> Migrate command: NONE >>> Clean command: NONE >>> Checkpoint directory: /tmp >>> Checkpoint When: xsr >>> Checkpoint Signal: NONE >>> >>> To submit the job with checkpointing, I ran this: >>> qsub -ckpt condor /home/lane/toy.sh -_condor >>> >>> Where toy.sh is: >>> #!/bin/bash >>> >>> /usr/bin/setarch x86_64 -R -L /home/lane/toy -_condor_D_ALL >>> >>> >>> The job as submitted above gets a "qw" status, but never runs. If I >>> submitting the job without "-ckpt condor" then it runs. >>> >>> Any pointers to tips would be appreciated. I've done quite a bit of >>> research online; it appears that this should be possible, but I just >>> haven't had any success figuring out how. >>> >>> Cheers, >>> Lane >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> >> > > > > -- > When a place gets crowded enough to require ID's, social collapse is not > far away. It is time to go elsewhere. The best thing about space travel > is that it made it possible to go elsewhere. > -- R.A. Heinlein, "Time Enough For Love" > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
