Thanks for your efforts, we had problems only with qalter job array
resubmission,
normally it works fine, so we shall stay with current SGE version for now.
On 09-Apr-13 11:59, Reuti wrote:
Am 09.04.2013 um 08:54 schrieb Semi:
SGE version 6.08
Well, this is too old by far to make any reliable statement about it. I don't
recall hearing about a problem with SGE in this aspect. Can you try it with a
test installation of a newer version of SGE to see whether it was fixed?
-- Reuti
On 08-Apr-13 23:52, Reuti wrote:
Please keep the list posted.
Am 08.04.2013 um 21:06 schrieb Ehud Barnea:
These 2 lines are there twice probably because of a mistake on my behalf. I copied them
incorrectly (they are just after the first "page", when redirecting the output
of the command to a txt file and opening it with nano).
The jobs were created by themselves. The output I gave is of 3 jobs that I did
not create. The 3rd one was created right after the first 2 finished (and it
was created along with another 30 jobs).
The job IDs were increased one by one, as they usually do when submitting jobs.
Job IDs did not appear twice.
The only commands I ran were
qsub (several times)
qalter (no wrapper, just qalter -q <job id>)
`qresub` is a symbolic link to `qalter`. If you can reproduce this, it seems to
be a wrong interpretation of the intended change of attributes by `qalter`.
Which version of SGE are you using in detail?
But: if the job is resubmitted by accident, then there is no "version:" showing
up at all, as it wasn't modified.
-- Reuti
PS: The "/fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$" is edited, or does it
really end in a "$"?
On Mon, Apr 8, 2013 at 6:22 PM, Reuti <[email protected]> wrote:
Hi,
Am 08.04.2013 um 15:11 schrieb Ehud Barnea:
Thanks for looking at it. I wasn't sure whether to submit to the users group or
just you.
Anyway, the I ran the same thing again and the problem occurred again.
The job that I moved (with qalter) finished quickly so I couldn't check it's
version, but when it finished it spawned another 3 new jobs (with new job ids
and all of them with version 1).
After these 3 finished another 30 jobs were spawned (also with a different job
id and all with version 1).
Was the job number increased directly from the original job id and you ended up
in having the same job id twice in the system, or were new ones created after
any later submitted job? Any `qalter`-wrapper in the way? You edited the output
below, or is:
stderr_path_list: logs
mail_list: barneaeh@sge01
stderr_path_list: logs
mail_list: barneaeh@sge01
really there twice?
-- Reuti
After doing qalter the only commands I ran were qstat, also at first I created
5 job arrays and only did qalter on 1 of them, so I am certain that I did not
accidentally created all these jobs.
I supply here the output of qstat -j. The first 2 belong to the first batch of
3 jobs and the last one is of one of the 30 jobs the spawned later:
(it probably doesn't matter, but it sits on a Dropbox folder, but dropbox isn't
active, so it's just a normal folder)
==============================================================
job_number: 8698176
exec_file: job_scripts/8698176
submission_time: Mon Apr 8 15:55:10 2013
owner: barneaeh
uid: 52647
group: obs
gid: 1009
sge_o_home: /storage/users/barneaeh
sge_o_log_name: barneaeh
sge_o_path: /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$
sge_o_shell: /bin/tcsh
sge_o_workdir: /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
sge_o_host: sge01
account: sge
cwd: /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
path_aliases: /tmp_mnt/ * * /
stderr_path_list: logs
mail_list: barneaeh@sge01
stderr_path_list: logs
mail_list: barneaeh@sge01
notify: FALSE
job_name: run.sh
stdout_path_list: logs
jobshare: 0
hard_queue_list: intel_all.q
shell_list: /bin/sh
env_list:
script_file: run.sh
version: 1
job-array tasks: 1-246:1
==============================================================
job_number: 8698175
exec_file: job_scripts/8698175
submission_time: Mon Apr 8 15:55:10 2013
owner: barneaeh
uid: 52647
group: obs
gid: 1009
sge_o_home: /storage/users/barneaeh
sge_o_log_name: barneaeh
sge_o_path: /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$
sge_o_shell: /bin/tcsh
sge_o_workdir: /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
sge_o_host: sge01
account: sge
cwd: /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
path_aliases: /tmp_mnt/ * * /
stderr_path_list: logs
mail_list: barneaeh@sge01
stderr_path_list: logs
mail_list: barneaeh@sge01
notify: FALSE
job_name: run.sh
stdout_path_list: logs
jobshare: 0
hard_queue_list: intel_all.q
shell_list: /bin/sh
env_list:
script_file: run.sh
version: 1
job-array tasks: 1-246:1
==============================================================
job_number: 8698182
exec_file: job_scripts/8698182
submission_time: Mon Apr 8 16:00:28 2013
owner: barneaeh
uid: 52647
group: obs
gid: 1009
sge_o_home: /storage/users/barneaeh
sge_o_log_name: barneaeh
sge_o_path: /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$
sge_o_shell: /bin/tcsh
sge_o_workdir: /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
sge_o_host: sge01
account: sge
cwd: /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
path_aliases: /tmp_mnt/ * * /
stderr_path_list: logs
mail_list: barneaeh@sge01
stderr_path_list: logs
mail_list: barneaeh@sge01
notify: FALSE
job_name: run.sh
stdout_path_list: logs
jobshare: 0
hard_queue_list: intel_all.q
shell_list: /bin/sh
env_list:
script_file: run.sh
version: 1
job-array tasks: 1-246:1
On Mon, Apr 8, 2013 at 3:43 PM, Reuti <[email protected]> wrote:
Am 08.04.2013 um 10:21 schrieb Semi:
Any ideas about user's question?
I am working with job arrays and encountered something weird. At first a sent
several job arrays to obs.q. Then I took one of the job arrays (that didn't
start executing any task) and sent it to intel_all.q (using qalter).
After that the specific tasks started running, however, the same jobArray was
duplicated about 20-30 times. Now qstat shows me:
8698022 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23
1 1-245:1
8698023 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23
1 1-245:1
8698024 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23
1 1-245:1
8698025 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23
1 1-245:1
8698026 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23
1 1-245:1
8698027 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23
1 1-245:1
8698028 0.50500 run.sh barneaeh qw 04/08/2013 10:25:24
1 1-245:1
8698030 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27
1 1-245:1
8698031 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27
1 1-245:1
8698032 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27
1 1-245:1
`qalter` doesn't change the submission time (only increasing the value of "version:"
in `qstat -j <job_id>`). But above they have different submission times. Was this the
time `qalter` was issued?
All have a value of "version: 1" in `qstat -j <job_id>`?
-- Reuti
This jobArray was the only one with 245, so I am sure it was duplicated...
Now it seems that the job array is run again and again.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users