Hi, Am 08.04.2013 um 15:11 schrieb Ehud Barnea:
> Thanks for looking at it. I wasn't sure whether to submit to the users group > or just you. > Anyway, the I ran the same thing again and the problem occurred again. > The job that I moved (with qalter) finished quickly so I couldn't check it's > version, but when it finished it spawned another 3 new jobs (with new job ids > and all of them with version 1). > After these 3 finished another 30 jobs were spawned (also with a different > job id and all with version 1). Was the job number increased directly from the original job id and you ended up in having the same job id twice in the system, or were new ones created after any later submitted job? Any `qalter`-wrapper in the way? You edited the output below, or is: > stderr_path_list: logs > mail_list: barneaeh@sge01 > stderr_path_list: logs > mail_list: barneaeh@sge01 really there twice? -- Reuti > After doing qalter the only commands I ran were qstat, also at first I > created 5 job arrays and only did qalter on 1 of them, so I am certain that I > did not accidentally created all these jobs. > > I supply here the output of qstat -j. The first 2 belong to the first batch > of 3 jobs and the last one is of one of the 30 jobs the spawned later: > (it probably doesn't matter, but it sits on a Dropbox folder, but dropbox > isn't active, so it's just a normal folder) > > ============================================================== > job_number: 8698176 > exec_file: job_scripts/8698176 > submission_time: Mon Apr 8 15:55:10 2013 > owner: barneaeh > uid: 52647 > group: obs > gid: 1009 > sge_o_home: /storage/users/barneaeh > sge_o_log_name: barneaeh > sge_o_path: > /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$ > sge_o_shell: /bin/tcsh > sge_o_workdir: > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > sge_o_host: sge01 > account: sge > cwd: > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > path_aliases: /tmp_mnt/ * * / > stderr_path_list: logs > mail_list: barneaeh@sge01 > stderr_path_list: logs > mail_list: barneaeh@sge01 > notify: FALSE > job_name: run.sh > stdout_path_list: logs > jobshare: 0 > hard_queue_list: intel_all.q > shell_list: /bin/sh > env_list: > script_file: run.sh > version: 1 > job-array tasks: 1-246:1 > ============================================================== > job_number: 8698175 > exec_file: job_scripts/8698175 > submission_time: Mon Apr 8 15:55:10 2013 > owner: barneaeh > uid: 52647 > group: obs > gid: 1009 > sge_o_home: /storage/users/barneaeh > sge_o_log_name: barneaeh > sge_o_path: > /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$ > sge_o_shell: /bin/tcsh > sge_o_workdir: > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > sge_o_host: sge01 > account: sge > cwd: > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > path_aliases: /tmp_mnt/ * * / > stderr_path_list: logs > mail_list: barneaeh@sge01 > stderr_path_list: logs > mail_list: barneaeh@sge01 > notify: FALSE > job_name: run.sh > stdout_path_list: logs > jobshare: 0 > hard_queue_list: intel_all.q > shell_list: /bin/sh > env_list: > script_file: run.sh > version: 1 > job-array tasks: 1-246:1 > ============================================================== > job_number: 8698182 > exec_file: job_scripts/8698182 > submission_time: Mon Apr 8 16:00:28 2013 > owner: barneaeh > uid: 52647 > group: obs > gid: 1009 > sge_o_home: /storage/users/barneaeh > sge_o_log_name: barneaeh > sge_o_path: > /fastspace/users/barneaeh/PCL-1.6.0/bin:/fastspace/$ > sge_o_shell: /bin/tcsh > sge_o_workdir: > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > sge_o_host: sge01 > account: sge > cwd: > /fastspace/users/barneaeh/Dropbox/SGE/srv/6/evaluat$ > path_aliases: /tmp_mnt/ * * / > stderr_path_list: logs > mail_list: barneaeh@sge01 > stderr_path_list: logs > mail_list: barneaeh@sge01 > notify: FALSE > job_name: run.sh > stdout_path_list: logs > jobshare: 0 > hard_queue_list: intel_all.q > shell_list: /bin/sh > env_list: > script_file: run.sh > version: 1 > job-array tasks: 1-246:1 > > > On Mon, Apr 8, 2013 at 3:43 PM, Reuti <[email protected]> wrote: > Am 08.04.2013 um 10:21 schrieb Semi: > >> Any ideas about user's question? >> >> I am working with job arrays and encountered something weird. At first a >> sent several job arrays to obs.q. Then I took one of the job arrays (that >> didn't start executing any task) and sent it to intel_all.q (using qalter). >> After that the specific tasks started running, however, the same jobArray >> was duplicated about 20-30 times. Now qstat shows me: >> 8698022 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 >> 1 1-245:1 >> 8698023 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 >> 1 1-245:1 >> 8698024 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 >> 1 1-245:1 >> 8698025 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 >> 1 1-245:1 >> 8698026 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 >> 1 1-245:1 >> 8698027 0.50500 run.sh barneaeh qw 04/08/2013 10:25:23 >> 1 1-245:1 >> 8698028 0.50500 run.sh barneaeh qw 04/08/2013 10:25:24 >> 1 1-245:1 >> 8698030 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27 >> 1 1-245:1 >> 8698031 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27 >> 1 1-245:1 >> 8698032 0.50500 run.sh barneaeh qw 04/08/2013 10:25:27 >> 1 1-245:1 > > `qalter` doesn't change the submission time (only increasing the value of > "version:" in `qstat -j <job_id>`). But above they have different submission > times. Was this the time `qalter` was issued? > > All have a value of "version: 1" in `qstat -j <job_id>`? > > -- Reuti > > >> This jobArray was the only one with 245, so I am sure it was duplicated... >> Now it seems that the job array is run again and again. >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
