Title: Guillermo Marco Puche
That's one of my queues config. I think they will be many many things bad configured regarding thresholds.

qconf -sq shudra.q
qname                 shudra.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE,[compute-0-2.local=load_avg=8,swap_used=12G], \
                      [compute-0-1.local=load_avg=8,swap_used=12G], \
                      [compute-0-3.local=load_avg=8,swap_used=12G], \
                      [compute-0-5.local=load_avg=8,swap_used=12G], \
                      [compute-0-4.local=load_avg=8,swap_used=12G], \
                      [compute-0-0.local=load_avg=8,swap_used=12G]
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpich orte smp
rerun                 TRUE
slots                 0,[compute-0-2.local=8],[compute-0-3.local=8], \
                      [compute-0-1.local=8],[compute-0-5.local=8], \
                      [compute-0-4.local=8],[compute-0-0.local=8]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        mem_free=30G
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


On 08/29/2013 08:36 AM, Guillermo Marco Puche wrote:
On 08/28/2013 05:57 PM, Dave Love wrote:
Reuti <[email protected]> writes:

	• Job comes back to R status.

Do you use any checkpointing interface, to restart the job? If so, it should output "Rr" in `qstat` instead of a plain "R" for the SGE job state.


No, I don't use any checkpointing interface.
Then the state should be "r".
There are some conditions (errors in prolog or pe_starter, I think)
which can cause rescheduling (state Rr), but certainly plain R shouldn't
happen (see sge_status(5) in the current man pages via the URL below).

Thank you Dave I'm gonna take a look at this right now.

Maybe the problem is on my thresholds configuration. I had to set threshold in all the compute nodes. This is because sometimes compute nodes in my Rocks cluster went down due to memory usage (using all memory + swap).

I would really appreciate a link if there's any specific configuration manual on how to set correctly thresholds. Maybe I won't experience this weird behavior with Java jobs and get a better performance overall.


Thank you very much.

Best regards,
Guillermo.

-- 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


--

Guillermo Marco Puche

Bioinformatician, Computer Science Engineer.
Sistemas Genómicos S.L.
Phone: +34 902 364 669
Fax: +34 902 364 670
www.sistemasgenomicos.com

 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to