Am 09.03.2011 um 18:27 schrieb Lane Schwartz:

> I'm using SGE v6.2u6. My understanding is that in v6, the

There is a forum at Oracle for the users of their commercial version:

http://forums.oracle.com/forums/forum.jspa?forumID=859

-- Reuti


> checkpointing object is added to the queue instead of defining a
> queue_list in the checkpoint object.
> (http://gridscheduler.sourceforge.net/howto/checkpointing.html)
> 
> Just to be sure, I tried defining a queue_list using qconf -Ackpt, and
> got the error:
> error: unknown attribute name "queue_list"
> 
> Thanks,
> Lane
> 
> On Wed, Mar 9, 2011 at 11:38 AM, "Hung-Sheng Tsao (laotsao 老曹 ) Ph.D"
> <[email protected]> wrote:
>>  it seems that you need to add all.q to
>> queue_list  in your checkpoint object
>> 
>> On 03/ 9/11 11:25 AM, Lane Schwartz wrote:
>>> 
>>> Hi,
>>> 
>>> I would like to use condor's standalone checkpointing to enable
>>> checkpointing jobs that are run via Sun Grid Engine (SGE). I've
>>> successfully compiled a toy C program using condor_compile, and I can
>>> successfully run, stop, and resume the job with its checkpoint file.
>>> 
>>> When I attempt to run my toy using qsub as an SGE job with
>>> checkpointing enabled, the job gets queued up but never runs. The job
>>> runs fine if submitted without checkpointing. Has anyone here
>>> successfully run SGE jobs using condor checkpointing?
>>> 
>>> For reference, here's my configuration. Within SGE's qmon utility, I
>>> defined a checkpoint object called "condor" the following
>>> configuration:
>>> 
>>> Name: condor
>>> Interface: TRANSPARENT
>>> Checkpoint command: NONE
>>> Migrate command: NONE
>>> Clean command: NONE
>>> Checkpoint directory: /tmp
>>> Checkpoint When: xsr
>>> Checkpoint Signal: NONE
>>> 
>>> To submit the job with checkpointing, I ran this:
>>> qsub -ckpt condor /home/lane/toy.sh -_condor
>>> 
>>> Where toy.sh is:
>>> #!/bin/bash
>>> 
>>> /usr/bin/setarch x86_64 -R -L /home/lane/toy -_condor_D_ALL
>>> 
>>> 
>>> The job as submitted above gets a "qw" status, but never runs. If I
>>> submitting the job without "-ckpt condor" then it runs.
>>> 
>>> Any pointers to tips would be appreciated. I've done quite a bit of
>>> research online; it appears that this should be possible, but I just
>>> haven't had any success figuring out how.
>>> 
>>> Cheers,
>>> Lane
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
> 
> 
> 
> -- 
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
>                 -- R.A. Heinlein, "Time Enough For Love"
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to