According to S the job is suspended. Does `qstat -f` show state C for the queue
(calendar suspended)?
yes compute-0-2 is in C state and the 3 others in aC state.
ps -e f on compute-0-2 give:
12395 ? Sl 25:50 /opt/gridengine/bin/lx26-amd64/sge_execd
6299 ? S 0:00 \_ sge_shepherd-28865 -bg
6301 ? TNs 0:00 | \_ bash
/opt/gridengine/default/spool/compute-0-2/job_scripts/28865
6302 ? TN 0:00 | \_ /bin/csh ./run_roms.csh
25564 ? TN 0:03 | \_ mpirun -np 32 ./roms roms.in
25568 ? TN 0:00 | \_
/opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V compute-0-3.loca
25572 ? TN 0:00 | | \_ /usr/bin/ssh -n -p
38836 compute-0-3.local exec '/opt/gridengine/util
25569 ? TN 0:00 | \_
/opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V compute-0-1.loca
25571 ? TN 0:00 | | \_ /usr/bin/ssh -n -p
44411 compute-0-1.local exec '/opt/gridengine/util
25570 ? TN 0:00 | \_
/opt/gridengine/bin/lx26-amd64/qrsh -inherit -nostdin -V compute-0-14.loc
25573 ? TN 0:00 | | \_ /usr/bin/ssh -n -p
52152 compute-0-14.local exec '/opt/gridengine/uti
25574 ? TN 253:19 | \_ ./roms roms.in
25575 ? TN 253:28 | \_ ./roms roms.in
25576 ? TN 253:26 | \_ ./roms roms.in
25577 ? TN 253:30 | \_ ./roms roms.in
25578 ? TN 253:20 | \_ ./roms roms.in
25579 ? TN 253:27 | \_ ./roms roms.in
25580 ? TN 253:25 | \_ ./roms roms.in
25581 ? TN 253:20 | \_ ./roms roms.in
4666 ? S 0:00 \_ sge_shepherd-28899 -bg
4667 ? Ss 0:00 \_ sshd: forecast [priv]
4672 ? S 0:00 \_ sshd: forecast@notty
4673 ? Ss 0:00 \_
/opt/gridengine/utilbin/lx26-amd64/qrsh_starter
/opt/gridengine/default/spool
4764 ? S 0:00 \_ orted -mca ess env -mca
orte_ess_jobid 858914816 -mca orte_ess_vpid 3 -mc
4765 ? R 188:03 \_ ./roms roms_forecast.in
4766 ? R 190:14 \_ ./roms roms_forecast.in
4767 ? R 189:45 \_ ./roms roms_forecast.in
4768 ? R 190:30 \_ ./roms roms_forecast.in
4769 ? R 190:48 \_ ./roms roms_forecast.in
4770 ? R 190:39 \_ ./roms roms_forecast.in
4771 ? R 190:56 \_ ./roms roms_forecast.in
4772 ? R 189:54 \_ ./roms roms_forecast.in
and top:
4765 forecast 25 0 250m 80m 5008 R 100.1 0.5 190:54.74 roms
4766 forecast 25 0 250m 80m 5020 R 100.1 0.5 193:05.62 roms
4769 forecast 25 0 250m 80m 5232 R 100.1 0.5 193:39.34 roms
4771 forecast 25 0 250m 69m 5012 R 100.1 0.4 193:47.49 roms
4767 forecast 25 0 250m 80m 5236 R 99.8 0.5 192:36.86 roms
4768 forecast 25 0 250m 80m 5240 R 99.8 0.5 193:21.09 roms
4770 forecast 25 0 250m 80m 5240 R 99.8 0.5 193:30.51 roms
4772 forecast 25 0 250m 69m 5028 R 99.8 0.4 192:46.14 roms
while in compute-0-1 &co :
ps - e f give:
11973 ? Sl 7:13 /opt/gridengine/bin/lx26-amd64/sge_execd
25352 ? S 0:00 \_ sge_shepherd-28865 -bg
25353 ? SNs 0:00 | \_ sshd: xavier [priv]
25358 ? SN 0:00 | \_ sshd: xavier@notty
25359 ? SNs 0:00 | \_
/opt/gridengine/utilbin/lx26-amd64/qrsh_starter
/opt/gridengine/default/spool
25450 ? SN 0:00 | \_ orted -mca ess env -mca
orte_ess_jobid 1701052416 -mca orte_ess_vpid 2 -m
25451 ? RN 421:43 | \_ ./roms roms.in
25452 ? RN 421:32 | \_ ./roms roms.in
25453 ? RN 422:02 | \_ ./roms roms.in
25454 ? RN 421:53 | \_ ./roms roms.in
25455 ? RN 422:05 | \_ ./roms roms.in
25456 ? RN 421:55 | \_ ./roms roms.in
25457 ? RN 421:48 | \_ ./roms roms.in
25458 ? RN 422:01 | \_ ./roms roms.in
4544 ? S 0:00 \_ sge_shepherd-28899 -bg
4545 ? Ss 0:00 \_ sshd: forecast [priv]
4550 ? S 0:00 \_ sshd: forecast@notty
4551 ? Ss 0:00 \_
/opt/gridengine/utilbin/lx26-amd64/qrsh_starter
/opt/gridengine/default/spool
4642 ? S 0:00 \_ orted -mca ess env -mca
orte_ess_jobid 858914816 -mca orte_ess_vpid 2 -mc
4643 ? S 187:37 \_ ./roms roms_forecast.in
4644 ? S 189:38 \_ ./roms roms_forecast.in
4645 ? S 190:40 \_ ./roms roms_forecast.in
4646 ? R 188:53 \_ ./roms roms_forecast.in
4647 ? R 161:50 \_ ./roms roms_forecast.in
4648 ? S 189:12 \_ ./roms roms_forecast.in
4649 ? R 158:36 \_ ./roms roms_forecast.in
4650 ? R 184:55 \_ ./roms roms_forecast.in
and top:
4643 forecast 25 0 250m 80m 5028 R 100.6 0.5 188:46.90 roms
4644 forecast 25 0 250m 80m 5028 R 100.6 0.5 190:50.58 roms
4646 forecast 25 0 250m 80m 5240 R 100.6 0.5 190:05.82 roms
4645 forecast 25 0 250m 80m 5228 R 98.6 0.5 191:52.46 roms
4647 forecast 25 0 250m 80m 5248 R 98.6 0.5 162:55.69 roms
4649 forecast 25 0 250m 80m 5020 R 98.6 0.5 159:38.28 roms
4648 forecast 25 0 250m 80m 5236 R 82.8 0.5 190:23.34 roms
4650 forecast 25 0 250m 80m 5016 R 82.8 0.5 186:03.83 roms
25451 xavier 39 19 219m 52m 4896 R 5.9 0.3 421:48.02 roms
25453 xavier 39 19 219m 53m 5052 R 5.9 0.3 422:07.73 roms
25456 xavier 39 19 219m 53m 5060 R 5.9 0.3 422:00.43 roms
25454 xavier 39 19 219m 52m 4924 R 3.9 0.3 421:58.77 roms
25455 xavier 39 19 219m 52m 4948 R 3.9 0.3 422:10.83 roms
25457 xavier 39 19 219m 53m 5056 R 3.9 0.3 421:53.80 roms
25458 xavier 39 19 219m 52m 4840 R 3.9 0.3 422:07.16 roms
25452 xavier 39 19 219m 53m 5060 R 2.0 0.3 421:37.02 roms
Did you check with:
$ ps -e f
on the n ode that all processes are kids of the sge_shepherd? They should have gotten
state "T" then in `ps`.
-- Reuti
but again while compute-0-2 is having a load of 8 (8cpus/nodes) compute-0-1 and
others are overloading at 16...
using SGE 6.2u4 on a ROCKS 5.3 cluster
On 11/10/2012 11:09, Reuti wrote:
Am 11.10.2012 um 11:56 schrieb Xavier:
Hi all,
I have created a calendar queue only available during the day (6am to 1am)
keeping nodes free for the night jobs trough an other queue.
This queue is composed of 4 nodes (32cpus). All jobs used the 32cpus
Good - but what calendar definition did you create in detail?
-- Reuti
what i don't get is that one of the nodes AND ONLY ONE drop its load 0 at 1am.
this node is the one where the job attributed, i.e.
from qstat
JOB1 xavier r 10/03/2012 09:44:18 [email protected] 32
while other 3 nodes keep their load and therefore overload at night.
example of last day load
for compute-0-2
http://nautilus.ciimar.up.pt/ganglia/graph.php?g=load_report&z=large&c=nautilus&h=compute-0-2.local&m=load_one&r=day&s=descending&hc=4&mc=2&st=1349865411
and for compute-0-1
http://nautilus.ciimar.up.pt/ganglia/graph.php?g=load_report&z=large&c=nautilus&h=compute-0-1.local&m=load_one&r=day&s=descending&hc=4&mc=2&st=1349865467
Why does all nodes not behave the same ?
Xavier
--
Universidade da Madeira
CCM - Centro de Ciencias Matematicas
Campus Universitario da Penteada
9000-390 Funchal, Madeira Island
Portugal
(+351) 291 705 186
http://wakes.uma.pt
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
--
Universidade da Madeira
CCM - Centro de Ciencias Matematicas
Campus Universitario da Penteada
9000-390 Funchal, Madeira Island
Portugal
(+351) 291 705 186
http://wakes.uma.pt
--
Universidade da Madeira
CCM - Centro de Ciencias Matematicas
Campus Universitario da Penteada
9000-390 Funchal, Madeira Island
Portugal
(+351) 291 705 186
http://wakes.uma.pt
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users