Then try to submit another job with qsub parameter "-w w" and see if
it tells you anything.

Or you can run qstat -j <job id> or qstat -r to show the scheduler
message, but you will need to turn on "schedd_job_info" first - see
the sched_conf(5)  Manpage:

schedd_job_info
     The default scheduler can keep track why jobs could  not  be
     scheduled  during  the  last  scheduler  run. This parameter
     enables or disables the observation.  The value true enables
     the monitoring false turns it off.

     It is also possible to activate  the  observation  only  for
     certain  jobs.  This will be done if the parameter is set to
     job_list followed by a comma separated list of job ids.
     The user can obtain the collected information with the  com-
     mand qstat -j.

Rayson



On Wed, Mar 9, 2011 at 12:27 PM, Lane Schwartz <[email protected]> wrote:
> I'm using SGE v6.2u6. My understanding is that in v6, the
> checkpointing object is added to the queue instead of defining a
> queue_list in the checkpoint object.
> (http://gridscheduler.sourceforge.net/howto/checkpointing.html)
>
> Just to be sure, I tried defining a queue_list using qconf -Ackpt, and
> got the error:
> error: unknown attribute name "queue_list"
>
> Thanks,
> Lane
>
> On Wed, Mar 9, 2011 at 11:38 AM, "Hung-Sheng Tsao (laotsao 老曹 ) Ph.D"
> <[email protected]> wrote:
>>  it seems that you need to add all.q to
>> queue_list  in your checkpoint object
>>
>> On 03/ 9/11 11:25 AM, Lane Schwartz wrote:
>>>
>>> Hi,
>>>
>>> I would like to use condor's standalone checkpointing to enable
>>> checkpointing jobs that are run via Sun Grid Engine (SGE). I've
>>> successfully compiled a toy C program using condor_compile, and I can
>>> successfully run, stop, and resume the job with its checkpoint file.
>>>
>>> When I attempt to run my toy using qsub as an SGE job with
>>> checkpointing enabled, the job gets queued up but never runs. The job
>>> runs fine if submitted without checkpointing. Has anyone here
>>> successfully run SGE jobs using condor checkpointing?
>>>
>>> For reference, here's my configuration. Within SGE's qmon utility, I
>>> defined a checkpoint object called "condor" the following
>>> configuration:
>>>
>>> Name: condor
>>> Interface: TRANSPARENT
>>> Checkpoint command: NONE
>>> Migrate command: NONE
>>> Clean command: NONE
>>> Checkpoint directory: /tmp
>>> Checkpoint When: xsr
>>> Checkpoint Signal: NONE
>>>
>>> To submit the job with checkpointing, I ran this:
>>> qsub -ckpt condor /home/lane/toy.sh -_condor
>>>
>>> Where toy.sh is:
>>> #!/bin/bash
>>>
>>> /usr/bin/setarch x86_64 -R -L /home/lane/toy -_condor_D_ALL
>>>
>>>
>>> The job as submitted above gets a "qw" status, but never runs. If I
>>> submitting the job without "-ckpt condor" then it runs.
>>>
>>> Any pointers to tips would be appreciated. I've done quite a bit of
>>> research online; it appears that this should be possible, but I just
>>> haven't had any success figuring out how.
>>>
>>> Cheers,
>>> Lane
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>
>>
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
>                 -- R.A. Heinlein, "Time Enough For Love"
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to