Thanks for your efforts, we had problems only with qalter job array resubmission,
normally it works fine, so we shall stay with current SGE version for now.

On 09-Apr-13 11:59, Reuti wrote:
Am 09.04.2013 um 08:54 schrieb Semi:

SGE version 6.08
Well, this is too old by far to make any reliable statement about it. I don't 
recall hearing about a problem with SGE in this aspect. Can you try it with a 
test installation  of a newer version of SGE to see whether it was fixed?

-- Reuti


On 08-Apr-13 23:52, Reuti wrote:
Please keep the list posted.

Am 08.04.2013 um 21:06 schrieb Ehud Barnea:

These 2 lines are there twice probably because of a mistake on my behalf. I copied them 
incorrectly (they are just after the first "page", when redirecting the output 
of the command to a txt file and opening it with nano).

The jobs were created by themselves. The output I gave is of 3 jobs that I did 
not create. The 3rd one was created right after the first 2 finished (and it 
was created along with another 30 jobs).
The job IDs were increased one by one, as they usually do when submitting jobs. 
Job IDs did not appear twice.
The only commands I ran were
qsub (several times)
qalter (no wrapper, just qalter -q <job id>)
`qresub` is a symbolic link to `qalter`. If you can reproduce this, it seems to 
be a wrong interpretation of the intended change of attributes by `qalter`. 
Which version of SGE are you using in detail?

But: if the job is resubmitted by accident, then there is no "version:" showing 
up at all, as it wasn't modified.

-- Reuti

PS: The "/fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$" is edited, or does it 
really end in a "$"?


On Mon, Apr 8, 2013 at 6:22 PM, Reuti <[email protected]> wrote:
Hi,

Am 08.04.2013 um 15:11 schrieb Ehud Barnea:

Thanks for looking at it. I wasn't sure whether to submit to the users group or 
just you.
Anyway, the I ran the same thing again and the problem occurred again.
The job that I moved (with qalter) finished quickly so I couldn't check it's 
version, but when it finished it spawned another 3 new jobs (with new job ids 
and all of them with version 1).
After these 3 finished another 30 jobs were spawned (also with a different job 
id and all with version 1).
Was the job number increased directly from the original job id and you ended up 
in having the same job id twice in the system, or were new ones created after 
any later submitted job? Any `qalter`-wrapper in the way? You edited the output 
below, or is:

stderr_path_list:           logs
mail_list:                  barneaeh@sge01
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
really there twice?

-- Reuti


After doing qalter the only commands I ran were qstat, also at first I created 
5 job arrays and only did qalter on 1 of them, so I am certain that I did not 
accidentally created all these jobs.

I supply here the output of qstat -j. The first 2 belong to the first batch of 
3 jobs and the last one is of one of the 30 jobs the spawned later:
(it probably doesn't matter, but it sits on a Dropbox folder, but dropbox isn't 
active, so it's just a normal folder)

==============================================================
job_number:                 8698176
exec_file:                  job_scripts/8698176
submission_time:            Mon Apr  8 15:55:10 2013
owner:                      barneaeh
uid:                        52647
group:                      obs
gid:                        1009
sge_o_home:                 /storage/users/barneaeh
sge_o_log_name:             barneaeh
sge_o_path:                 /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$
sge_o_shell:                /bin/tcsh
sge_o_workdir:              /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
sge_o_host:                 sge01
account:                    sge
cwd:                        /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
path_aliases:               /tmp_mnt/ * * /
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
notify:                     FALSE
job_name:                   run.sh
stdout_path_list:           logs
jobshare:                   0
hard_queue_list:            intel_all.q
shell_list:                 /bin/sh
env_list:
script_file:                run.sh
version:                    1
job-array tasks:            1-246:1
==============================================================
job_number:                 8698175
exec_file:                  job_scripts/8698175
submission_time:            Mon Apr  8 15:55:10 2013
owner:                      barneaeh
uid:                        52647
group:                      obs
gid:                        1009
sge_o_home:                 /storage/users/barneaeh
sge_o_log_name:             barneaeh
sge_o_path:                 /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$
sge_o_shell:                /bin/tcsh
sge_o_workdir:              /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
sge_o_host:                 sge01
account:                    sge
cwd:                        /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
path_aliases:               /tmp_mnt/ * * /
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
notify:                     FALSE
job_name:                   run.sh
stdout_path_list:           logs
jobshare:                   0
hard_queue_list:            intel_all.q
shell_list:                 /bin/sh
env_list:
script_file:                run.sh
version:                    1
job-array tasks:            1-246:1
==============================================================
job_number:                 8698182
exec_file:                  job_scripts/8698182
submission_time:            Mon Apr  8 16:00:28 2013
owner:                      barneaeh
uid:                        52647
group:                      obs
gid:                        1009
sge_o_home:                 /storage/users/barneaeh
sge_o_log_name:             barneaeh
sge_o_path:                 /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$
sge_o_shell:                /bin/tcsh
sge_o_workdir:              /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
sge_o_host:                 sge01
account:                    sge
cwd:                        /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$
path_aliases:               /tmp_mnt/ * * /
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
stderr_path_list:           logs
mail_list:                  barneaeh@sge01
notify:                     FALSE
job_name:                   run.sh
stdout_path_list:           logs
jobshare:                   0
hard_queue_list:            intel_all.q
shell_list:                 /bin/sh
env_list:
script_file:                run.sh
version:                    1
job-array tasks:            1-246:1


On Mon, Apr 8, 2013 at 3:43 PM, Reuti <[email protected]> wrote:
Am 08.04.2013 um 10:21 schrieb Semi:

Any ideas about user's question?

I am working with job arrays and encountered something weird. At first a sent 
several job arrays to obs.q. Then I took one of the job arrays (that didn't 
start executing any task) and sent it to intel_all.q (using qalter).
After that the specific tasks started running, however, the same jobArray was 
duplicated about 20-30 times. Now qstat shows me:
8698022 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:23               
                     1 1-245:1
8698023 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:23               
                     1 1-245:1
8698024 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:23               
                     1 1-245:1
8698025 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:23               
                     1 1-245:1
8698026 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:23               
                     1 1-245:1
8698027 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:23               
                     1 1-245:1
8698028 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:24               
                     1 1-245:1
8698030 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:27               
                     1 1-245:1
8698031 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:27               
                     1 1-245:1
8698032 0.50500 run.sh     barneaeh     qw    04/08/2013 10:25:27               
                     1 1-245:1
`qalter` doesn't change the submission time (only increasing the value of "version:" 
in `qstat -j <job_id>`). But above they have different submission times. Was this the 
time `qalter` was issued?

All have a value of "version: 1" in `qstat -j <job_id>`?

-- Reuti


This jobArray was the only one with 245, so I am sure it was duplicated...
Now it seems that the job array is run again and again.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to