On 08/28/2013 05:57 PM, Dave Love wrote:
Reuti <[email protected]> writes:
• Job comes back to R status.
Do you use any checkpointing interface, to restart the job? If so, it should output "Rr"
in `qstat` instead of a plain "R" for the SGE job state.
No, I don't use any checkpointing interface.
Then the state should be "r".
There are some conditions (errors in prolog or pe_starter, I think)
which can cause rescheduling (state Rr), but certainly plain R shouldn't
happen (see sge_status(5) in the current man pages via the URL below).
Thank you Dave I'm gonna take a look at this right now.
Maybe the problem is on my thresholds configuration. I had to set
threshold in all the compute nodes. This is because sometimes compute
nodes in my Rocks cluster went down due to memory usage (using all
memory + swap).
I would really appreciate a link if there's any specific configuration
manual on how to set correctly thresholds. Maybe I won't experience this
weird behavior with Java jobs and get a better performance overall.
Thank you very much.
Best regards,
Guillermo.
--
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users