Re: [gridengine users] PE offers 0 slots?
Hi, Am 12.08.2017 um 00:41 schrieb Michael Stauffer: > Hi, > > I'm getting back to this post finally. I've looked at the links and > suggestions in the two replies to my original post a few months ago, but they > haven't helped. Here's my original: > > I'm getting some queued jobs with scheduling info that includes this line at > the end: > > cannot run in PE "unihost" because it only offers 0 slots What I notice below: defining h_vmem/s_vmem on a queue level means per job. Defining it on an exechost level means across all jobs. What is different between: > - > all.q@compute-0-13.local BP0/10/169.14 lx-amd64 > qf:h_vmem=40.000G > qf:s_vmem=40.000G > hc:slots=6 > - > all.q@compute-0-14.local BP0/10/169.66 lx-amd64 > hc:h_vmem=28.890G > hc:s_vmem=30.990G > hc:slots=6 qf = queue fixed hc = host consumable What is the definition of h_vmem/s_vmem in `qconf -sc` and their default consumptions? > 'unihost' is the only PE I use. When users request multiple slots, they use > 'unihost': > > qsub ... -binding linear:2 -pe unihost 2 ... > > What happens is that these jobs aren't running when it otherwise seems like > they should be, or they sit waiting in the queue for a long time even when > the user has plenty of quota available within the queue they've requested, > and there are enough resources available on the queue's nodes per qhost(slots > and vmem are consumables), and qquota isn't showing any rqs limits have been > reached. > > Below I've dumped relevant configurations. > > Today I created a new PE called "int_test" to test the "integer" allocation > rule. I set it to 16 (16 cores per node), and have also tried 8. It's been > added as a PE to the queues we use. When I try to run to this new PE however, > it *always* fails with the same "PE ...offers 0 slots" error, even if I can > run the same multi-slot job using "unihost" PE at the same time. I'm not sure > if this helps debug or not. > > Another thought - this behavior started happening some time ago more or less > when I tried implementing fairshare behavior. I never seemed to get fairshare > working right. We haven't been able to confirm, but for some users it seems > this "PE 0 slots" issue pops up only after they've been running other jobs > for a little while. So I'm wondering if I've screwed up fairshare in some way > that's causing this odd behavior. > > The default queue from global config file is all.q. There is no default queue in SGE. One specifies resource requests and SGE will select an appropriate one. What do you refer to by this? Do you have any sge_request or private .sge_request? -- Reuti > > Here are various config dumps. Is there anything else that might be helpful? > > Thanks for any help! This has been plaguing me. > > > [root@chead ~]# qconf -sp unihost > pe_nameunihost > slots > user_lists NONE > xuser_listsNONE > start_proc_args/bin/true > stop_proc_args /bin/true > allocation_rule$pe_slots > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > accounting_summary FALSE > qsort_args NONE > > [root@chead ~]# qconf -sp int_test > pe_nameint_test > slots > user_lists NONE > xuser_listsNONE > start_proc_args/bin/true > stop_proc_args /bin/true > allocation_rule8 > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > accounting_summary FALSE > qsort_args NONE > > [root@chead ~]# qconf -ssconf > algorithm default > schedule_interval 0:0:5 > maxujobs 200 > queue_sort_method load > job_load_adjustments np_load_avg=0.50 > load_adjustment_decay_time0:7:30 > load_formula np_load_avg > schedd_job_info true > flush_submit_sec 0 > flush_finish_sec 0 > paramsnone > reprioritize_interval 0:0:0 > halftime 1 > usage_weight_list cpu=0.70,mem=0.20,io=0.10 > compensation_factor 5.00 > weight_user 0.25 > weight_project0.25 > weight_department 0.25 > weight_job0.25 > weight_tickets_functional 1000 > weight_tickets_share 10 > share_override_ticketsTRUE > share_functional_shares TRUE > max_functional_jobs_to_schedule 2000 > report_pjob_tickets TRUE > max_pending_tasks_per_job 100 > halflife_decay_list none >
Re: [gridengine users] PE offers 0 slots?
Hi, I'm getting back to this post finally. I've looked at the links and suggestions in the two replies to my original post a few months ago, but they haven't helped. Here's my original: I'm getting some queued jobs with scheduling info that includes this line at the end: cannot run in PE "unihost" because it only offers 0 slots 'unihost' is the only PE I use. When users request multiple slots, they use 'unihost': qsub ... -binding linear:2 -pe unihost 2 ... What happens is that these jobs aren't running when it otherwise seems like they should be, or they sit waiting in the queue for a long time even when the user has plenty of quota available within the queue they've requested, and there are enough resources available on the queue's nodes per qhost(slots and vmem are consumables), and qquota isn't showing any rqs limits have been reached. Below I've dumped relevant configurations. Today I created a new PE called "int_test" to test the "integer" allocation rule. I set it to 16 (16 cores per node), and have also tried 8. It's been added as a PE to the queues we use. When I try to run to this new PE however, it *always* fails with the same "PE ...offers 0 slots" error, even if I can run the same multi-slot job using "unihost" PE at the same time. I'm not sure if this helps debug or not. Another thought - this behavior started happening some time ago more or less when I tried implementing fairshare behavior. I never seemed to get fairshare working right. We haven't been able to confirm, but for some users it seems this "PE 0 slots" issue pops up only after they've been running other jobs for a little while. So I'm wondering if I've screwed up fairshare in some way that's causing this odd behavior. The default queue from global config file is all.q. Here are various config dumps. Is there anything else that might be helpful? Thanks for any help! This has been plaguing me. [root@chead ~]# qconf -sp unihost pe_nameunihost slots user_lists NONE xuser_listsNONE start_proc_args/bin/true stop_proc_args /bin/true allocation_rule$pe_slots control_slaves FALSE job_is_first_task TRUE urgency_slots min accounting_summary FALSE qsort_args NONE [root@chead ~]# qconf -sp int_test pe_nameint_test slots user_lists NONE xuser_listsNONE start_proc_args/bin/true stop_proc_args /bin/true allocation_rule8 control_slaves FALSE job_is_first_task TRUE urgency_slots min accounting_summary FALSE qsort_args NONE [root@chead ~]# qconf -ssconf algorithm default schedule_interval 0:0:5 maxujobs 200 queue_sort_method load job_load_adjustments np_load_avg=0.50 load_adjustment_decay_time0:7:30 load_formula np_load_avg schedd_job_info true flush_submit_sec 0 flush_finish_sec 0 paramsnone reprioritize_interval 0:0:0 halftime 1 usage_weight_list cpu=0.70,mem=0.20,io=0.10 compensation_factor 5.00 weight_user 0.25 weight_project0.25 weight_department 0.25 weight_job0.25 weight_tickets_functional 1000 weight_tickets_share 10 share_override_ticketsTRUE share_functional_shares TRUE max_functional_jobs_to_schedule 2000 report_pjob_tickets TRUE max_pending_tasks_per_job 100 halflife_decay_list none policy_hierarchy OS weight_ticket 0.00 weight_waiting_time 1.00 weight_deadline 360.00 weight_urgency0.10 weight_priority 1.00 max_reservation 0 default_duration INFINITY [root@chead ~]# qconf -sconf #global: execd_spool_dir /opt/sge/default/spool mailer /bin/mail xterm/usr/bin/X11/xterm load_sensor none prolog none epilog none shell_start_mode posix_compliant login_shells sh,bash,ksh,csh,tcsh min_uid 0 min_gid 0 user_lists none xuser_lists none projects none xprojectsnone enforce_project false enforce_user auto load_report_time 00:00:40 max_unheard 00:05:00 reschedule_unknown 02:00:00 loglevel log_warning administrator_mail none set_token_cmdnone pag_cmd none
Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE
Hi, Can anyone cross check it? If so, how? Or, should we ask the system administrator? Thanks, Subashini.K On Fri, Aug 11, 2017 at 4:37 PM, Gowthamwrote: > > Is the folder, /usr/local/gromacs, shared across all nodes in your cluster? > > > On Fri, Aug 11, 2017 at 1:59 AM Subashini K > wrote: > >> Hi, >> >> (1) GROMACS is installed in /usr/local/gromacs/bin/ >> >> Not in root as I mentioned earlier. >> >> (2) When I gave >> >> >> #!/bin/bash >> #$ -S /bin/bash >> #$ -cwd >> #$ -N smp1 >> #$ -l h_vmem=1G >> usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq >> >> I got the same error again. >> >> But, when I run it directly my login node, gmx mdrun -ntmpi 1 -ntomp 8 >> -v -deffnm eq the command works fine. >> >> >> I realize, the problem lies in my scripiting. How to fix it? Is there any >> special method to set the path in the above submit.sh file? >> >> Executable: /usr/local/gromacs/bin/gmx >> Library dir: /usr/local/gromacs/share/gromacs/top >> >> >> Can anyone help me? >> >> Thanks, >> Subashini.K >> >> >> -- > > Gowtham, PhD > Director of Research Computing, IT > Adj. Asst. Professor, ECE and Physics > Michigan Technological University > > (906) 487-4096 > http://it.mtu.edu > http://hpc.mtu.edu > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE
ssh into a node and try "which gmx" Either it will be found in your path, or it won't. Or "ls /usr/local/gromacs/bin/gmx" and see if exists. Best to include the full path to the things you need every time. Using "which" makes things a little easier i.e. GMX=`which gmx` and then use $GMX in your script. Or GROMACSPATH="/usr/local/gromacs" and GROMACSBIN="/usr/local/gromacs/bin" and use those vars as needed. Ian On Fri, Aug 11, 2017 at 8:22 AM, Subashini Kwrote: > Hi, > > > Can anyone cross check it? If so, how? > > Or, should we ask the system administrator? > > Thanks, > Subashini.K > > On Fri, Aug 11, 2017 at 4:37 PM, Gowtham wrote: > >> >> Is the folder, /usr/local/gromacs, shared across all nodes in your >> cluster? >> >> >> On Fri, Aug 11, 2017 at 1:59 AM Subashini K >> wrote: >> >>> Hi, >>> >>> (1) GROMACS is installed in /usr/local/gromacs/bin/ >>> >>> Not in root as I mentioned earlier. >>> >>> (2) When I gave >>> >>> >>> #!/bin/bash >>> #$ -S /bin/bash >>> #$ -cwd >>> #$ -N smp1 >>> #$ -l h_vmem=1G >>> usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq >>> >>> I got the same error again. >>> >>> But, when I run it directly my login node, gmx mdrun -ntmpi 1 -ntomp 8 >>> -v -deffnm eq the command works fine. >>> >>> >>> I realize, the problem lies in my scripiting. How to fix it? Is there >>> any special method to set the path in the above submit.sh file? >>> >>> Executable: /usr/local/gromacs/bin/gmx >>> Library dir: /usr/local/gromacs/share/gromacs/top >>> >>> >>> Can anyone help me? >>> >>> Thanks, >>> Subashini.K >>> >>> >>> -- >> >> Gowtham, PhD >> Director of Research Computing, IT >> Adj. Asst. Professor, ECE and Physics >> Michigan Technological University >> >> (906) 487-4096 >> http://it.mtu.edu >> http://hpc.mtu.edu >> > > > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > > -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE
Is the folder, /usr/local/gromacs, shared across all nodes in your cluster? On Fri, Aug 11, 2017 at 1:59 AM Subashini Kwrote: > Hi, > > (1) GROMACS is installed in /usr/local/gromacs/bin/ > > Not in root as I mentioned earlier. > > (2) When I gave > > > #!/bin/bash > #$ -S /bin/bash > #$ -cwd > #$ -N smp1 > #$ -l h_vmem=1G > usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq > > I got the same error again. > > But, when I run it directly my login node, gmx mdrun -ntmpi 1 -ntomp 8 > -v -deffnm eq the command works fine. > > > I realize, the problem lies in my scripiting. How to fix it? Is there any > special method to set the path in the above submit.sh file? > > Executable: /usr/local/gromacs/bin/gmx > Library dir: /usr/local/gromacs/share/gromacs/top > > > Can anyone help me? > > Thanks, > Subashini.K > > > -- Gowtham, PhD Director of Research Computing, IT Adj. Asst. Professor, ECE and Physics Michigan Technological University (906) 487-4096 http://it.mtu.edu http://hpc.mtu.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE
Hi, you had a typo in your script (the leading slash in the path), you want: #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -N smp1 #$ -l h_vmem=1G /usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq for these things it might be more convenient to ask a colleague who has a little experience with the linux command line. However, something I would try in your case, which could make your life a little easier (if you also need to send different scripts at some point, and given the computers you want to run your stuff on are set up similar to your interactive machine) is to call bash -l instead of bash, like so #!/bin/bash -l #$ -S /bin/bash #$ -cwd #$ -N smp1 #$ -l h_vmem=1G gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq Cheers, Alex Am 11.08.17 um 07:59 schrieb Subashini K: Hi, (1) GROMACS is installed in /usr/local/gromacs/bin/ Not in root as I mentioned earlier. (2) When I gave #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -N smp1 #$ -l h_vmem=1G usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq I got the same error again. But, when I run it directly my login node, gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq the command works fine. I realize, the problem lies in my scripiting. How to fix it? Is there any special method to set the path in the above submit.sh file? Executable: /usr/local/gromacs/bin/gmx Library dir: /usr/local/gromacs/share/gromacs/top Can anyone help me? Thanks, Subashini.K ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Dr. Alexander Hasselhuhn Rahel-Straus-Str. 4 76137 Karlsruhe Tel. +49 176 64066387 smime.p7s Description: S/MIME Cryptographic Signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE
Hi, (1) GROMACS is installed in /usr/local/gromacs/bin/ Not in root as I mentioned earlier. (2) When I gave #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -N smp1 #$ -l h_vmem=1G usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq I got the same error again. But, when I run it directly my login node, gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq the command works fine. I realize, the problem lies in my scripiting. How to fix it? Is there any special method to set the path in the above submit.sh file? Executable: /usr/local/gromacs/bin/gmx Library dir: /usr/local/gromacs/share/gromacs/top Can anyone help me? Thanks, Subashini.K ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users