Please keep the list posted. Am 08.04.2013 um 21:06 schrieb Ehud Barnea:
> These 2 lines are there twice probably because of a mistake on my behalf. I > copied them incorrectly (they are just after the first "page", when > redirecting the output of the command to a txt file and opening it with nano). > > The jobs were created by themselves. The output I gave is of 3 jobs that I > did not create. The 3rd one was created right after the first 2 finished (and > it was created along with another 30 jobs). > The job IDs were increased one by one, as they usually do when submitting > jobs. Job IDs did not appear twice. > The only commands I ran were > qsub (several times) > qalter (no wrapper, just qalter -q <job id>) `qresub` is a symbolic link to `qalter`. If you can reproduce this, it seems to be a wrong interpretation of the intended change of attributes by `qalter`. Which version of SGE are you using in detail? But: if the job is resubmitted by accident, then there is no "version:" showing up at all, as it wasn't modified. -- Reuti PS: The "/fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$" is edited, or does it really end in a "$"? > On Mon, Apr 8, 2013 at 6:22 PM, Reuti <[email protected]> wrote: > Hi, > > Am 08.04.2013 um 15:11 schrieb Ehud Barnea: > > > Thanks for looking at it. I wasn't sure whether to submit to the users > > group or just you. > > Anyway, the I ran the same thing again and the problem occurred again. > > The job that I moved (with qalter) finished quickly so I couldn't check > > it's version, but when it finished it spawned another 3 new jobs (with new > > job ids and all of them with version 1). > > After these 3 finished another 30 jobs were spawned (also with a different > > job id and all with version 1). > > Was the job number increased directly from the original job id and you ended > up in having the same job id twice in the system, or were new ones created > after any later submitted job? Any `qalter`-wrapper in the way? You edited > the output below, or is: > > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > really there twice? > > -- Reuti > > > > After doing qalter the only commands I ran were qstat, also at first I > > created 5 job arrays and only did qalter on 1 of them, so I am certain that > > I did not accidentally created all these jobs. > > > > I supply here the output of qstat -j. The first 2 belong to the first batch > > of 3 jobs and the last one is of one of the 30 jobs the spawned later: > > (it probably doesn't matter, but it sits on a Dropbox folder, but dropbox > > isn't active, so it's just a normal folder) > > > > ============================================================== > > job_number: 8698176 > > exec_file: job_scripts/8698176 > > submission_time: Mon Apr 8 15:55:10 2013 > > owner: barneaeh > > uid: 52647 > > group: obs > > gid: 1009 > > sge_o_home: /storage/users/barneaeh > > sge_o_log_name: barneaeh > > sge_o_path: > > /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$ > > sge_o_shell: /bin/tcsh > > sge_o_workdir: > > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > > sge_o_host: sge01 > > account: sge > > cwd: > > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > > path_aliases: /tmp_mnt/ * * / > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > notify: FALSE > > job_name: run.sh > > stdout_path_list: logs > > jobshare: 0 > > hard_queue_list: intel_all.q > > shell_list: /bin/sh > > env_list: > > script_file: run.sh > > version: 1 > > job-array tasks: 1-246:1 > > ============================================================== > > job_number: 8698175 > > exec_file: job_scripts/8698175 > > submission_time: Mon Apr 8 15:55:10 2013 > > owner: barneaeh > > uid: 52647 > > group: obs > > gid: 1009 > > sge_o_home: /storage/users/barneaeh > > sge_o_log_name: barneaeh > > sge_o_path: > > /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$ > > sge_o_shell: /bin/tcsh > > sge_o_workdir: > > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > > sge_o_host: sge01 > > account: sge > > cwd: > > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > > path_aliases: /tmp_mnt/ * * / > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > notify: FALSE > > job_name: run.sh > > stdout_path_list: logs > > jobshare: 0 > > hard_queue_list: intel_all.q > > shell_list: /bin/sh > > env_list: > > script_file: run.sh > > version: 1 > > job-array tasks: 1-246:1 > > ============================================================== > > job_number: 8698182 > > exec_file: job_scripts/8698182 > > submission_time: Mon Apr 8 16:00:28 2013 > > owner: barneaeh > > uid: 52647 > > group: obs > > gid: 1009 > > sge_o_home: /storage/users/barneaeh > > sge_o_log_name: barneaeh > > sge_o_path: > > /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$ > > sge_o_shell: /bin/tcsh > > sge_o_workdir: > > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > > sge_o_host: sge01 > > account: sge > > cwd: > > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > > path_aliases: /tmp_mnt/ * * / > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > stderr_path_list: logs > > mail_list: barneaeh@sge01 > > notify: FALSE > > job_name: run.sh > > stdout_path_list: logs > > jobshare: 0 > > hard_queue_list: intel_all.q > > shell_list: /bin/sh > > env_list: > > script_file: run.sh > > version: 1 > > job-array tasks: 1-246:1 > > > > > > On Mon, Apr 8, 2013 at 3:43 PM, Reuti <[email protected]> wrote: > > Am 08.04.2013 um 10:21 schrieb Semi: > > > > > Any ideas about user's question? > > > > > > I am working with job arrays and encountered something weird. At first a > > > sent several job arrays to obs.q. Then I took one of the job arrays (that > > > didn't start executing any task) and sent it to intel_all.q (using > > > qalter). > > > After that the specific tasks started running, however, the same jobArray > > > was duplicated about 20-30 times. Now qstat shows me: > > > 8698022 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 > > > 1 1-245:1 > > > 8698023 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 > > > 1 1-245:1 > > > 8698024 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 > > > 1 1-245:1 > > > 8698025 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 > > > 1 1-245:1 > > > 8698026 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 > > > 1 1-245:1 > > > 8698027 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 > > > 1 1-245:1 > > > 8698028 0.50500 run.sh barneaeh qw 04/08/2013 10:25:24 > > > 1 1-245:1 > > > 8698030 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27 > > > 1 1-245:1 > > > 8698031 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27 > > > 1 1-245:1 > > > 8698032 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27 > > > 1 1-245:1 > > > > `qalter` doesn't change the submission time (only increasing the value of > > "version:" in `qstat -j <job_id>`). But above they have different > > submission times. Was this the time `qalter` was issued? > > > > All have a value of "version: 1" in `qstat -j <job_id>`? > > > > -- Reuti > > > > > > > This jobArray was the only one with 245, so I am sure it was duplicated... > > > Now it seems that the job array is run again and again. > > > > > > _______________________________________________ > > > users mailing list > > > [email protected] > > > https://gridengine.org/mailman/listinfo/users > > > > > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
