On 08/28/2013 05:57 PM, Dave Love wrote:
Reuti <[email protected]> writes:

        • Job comes back to R status.

Do you use any checkpointing interface, to restart the job? If so, it should output "Rr" 
in `qstat` instead of a plain "R" for the SGE job state.


No, I don't use any checkpointing interface.
Then the state should be "r".
There are some conditions (errors in prolog or pe_starter, I think)
which can cause rescheduling (state Rr), but certainly plain R shouldn't
happen (see sge_status(5) in the current man pages via the URL below).

Thank you Dave I'm gonna take a look at this right now.

Maybe the problem is on my thresholds configuration. I had to set threshold in all the compute nodes. This is because sometimes compute nodes in my Rocks cluster went down due to memory usage (using all memory + swap).

I would really appreciate a link if there's any specific configuration manual on how to set correctly thresholds. Maybe I won't experience this weird behavior with Java jobs and get a better performance overall.


Thank you very much.

Best regards,
Guillermo.

--
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to