Re: [gridengine users] Exclude job from rule.

Guillermo Marco Puche Wed, 28 Aug 2013 23:38:03 -0700

On 08/28/2013 05:57 PM, Dave Love wrote:

Reuti <[email protected]> writes:

        • Job comes back to R status.

Do you use any checkpointing interface, to restart the job? If so, it should output "Rr" 
in `qstat` instead of a plain "R" for the SGE job state.

No, I don't use any checkpointing interface.

Then the state should be "r".

There are some conditions (errors in prolog or pe_starter, I think)
which can cause rescheduling (state Rr), but certainly plain R shouldn't
happen (see sge_status(5) in the current man pages via the URL below).

Thank you Dave I'm gonna take a look at this right now.

Maybe the problem is on my thresholds configuration. I had to setthreshold in all the compute nodes. This is because sometimes computenodes in my Rocks cluster went down due to memory usage (using allmemory + swap).

I would really appreciate a link if there's any specific configurationmanual on how to set correctly thresholds. Maybe I won't experience thisweird behavior with Java jobs and get a better performance overall.



Thank you very much.

Best regards,
Guillermo.

--

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Exclude job from rule.

Reply via email to