Re: [gridengine users] PE offers 0 slots?

2017-08-11 Thread Reuti
Hi,

Am 12.08.2017 um 00:41 schrieb Michael Stauffer:

> Hi,
> 
> I'm getting back to this post finally. I've looked at the links and 
> suggestions in the two replies to my original post a few months ago, but they 
> haven't helped. Here's my original:
> 
> I'm getting some queued jobs with scheduling info that includes this line at 
> the end:
> 
> cannot run in PE "unihost" because it only offers 0 slots

What I notice below: defining h_vmem/s_vmem on a queue level means per job. 
Defining it on an exechost level means across all jobs. What is different 
between:

> -
> all.q@compute-0-13.local   BP0/10/169.14 lx-amd64
> qf:h_vmem=40.000G
> qf:s_vmem=40.000G
> hc:slots=6
> -
> all.q@compute-0-14.local   BP0/10/169.66 lx-amd64
> hc:h_vmem=28.890G
> hc:s_vmem=30.990G
> hc:slots=6


qf = queue fixed
hc = host consumable

What is the definition of h_vmem/s_vmem in `qconf -sc` and their default 
consumptions?


> 'unihost' is the only PE I use. When users request multiple slots, they use 
> 'unihost':
> 
> qsub ... -binding linear:2 -pe unihost 2 ...
> 
> What happens is that these jobs aren't running when it otherwise seems like 
> they should be, or they sit waiting in the queue for a long time even when 
> the user has plenty of quota available within the queue they've requested, 
> and there are enough resources available on the queue's nodes per qhost(slots 
> and vmem are consumables), and qquota isn't showing any rqs limits have been 
> reached.
> 
> Below I've dumped relevant configurations.
> 
> Today I created a new PE called "int_test" to test the "integer" allocation 
> rule. I set it to 16 (16 cores per node), and have also tried 8. It's been 
> added as a PE to the queues we use. When I try to run to this new PE however, 
> it *always* fails with the same "PE ...offers 0 slots" error, even if I can 
> run the same multi-slot job using "unihost" PE at the same time. I'm not sure 
> if this helps debug or not.
> 
> Another thought - this behavior started happening some time ago more or less 
> when I tried implementing fairshare behavior. I never seemed to get fairshare 
> working right. We haven't been able to confirm, but for some users it seems 
> this "PE 0 slots" issue pops up only after they've been running other jobs 
> for a little while. So I'm wondering if I've screwed up fairshare in some way 
> that's causing this odd behavior.
> 
> The default queue from global config file is all.q.

There is no default queue in SGE. One specifies resource requests and SGE will 
select an appropriate one. What do you refer to by this?

Do you have any sge_request or private .sge_request?

-- Reuti


> 
> Here are various config dumps. Is there anything else that might be helpful?
> 
> Thanks for any help! This has been plaguing me.
> 
> 
> [root@chead ~]# qconf -sp unihost
> pe_nameunihost
> slots  
> user_lists NONE
> xuser_listsNONE
> start_proc_args/bin/true
> stop_proc_args /bin/true
> allocation_rule$pe_slots
> control_slaves FALSE
> job_is_first_task  TRUE
> urgency_slots  min
> accounting_summary FALSE
> qsort_args NONE
> 
> [root@chead ~]# qconf -sp int_test
> pe_nameint_test
> slots  
> user_lists NONE
> xuser_listsNONE
> start_proc_args/bin/true
> stop_proc_args /bin/true
> allocation_rule8
> control_slaves FALSE
> job_is_first_task  TRUE
> urgency_slots  min
> accounting_summary FALSE
> qsort_args NONE
> 
> [root@chead ~]# qconf -ssconf
> algorithm default
> schedule_interval 0:0:5
> maxujobs  200
> queue_sort_method load
> job_load_adjustments  np_load_avg=0.50
> load_adjustment_decay_time0:7:30
> load_formula  np_load_avg
> schedd_job_info   true
> flush_submit_sec  0
> flush_finish_sec  0
> paramsnone
> reprioritize_interval 0:0:0
> halftime  1
> usage_weight_list cpu=0.70,mem=0.20,io=0.10
> compensation_factor   5.00
> weight_user   0.25
> weight_project0.25
> weight_department 0.25
> weight_job0.25
> weight_tickets_functional 1000
> weight_tickets_share  10
> share_override_ticketsTRUE
> share_functional_shares   TRUE
> max_functional_jobs_to_schedule   2000
> report_pjob_tickets   TRUE
> max_pending_tasks_per_job 100
> halflife_decay_list   none
> 

Re: [gridengine users] PE offers 0 slots?

2017-08-11 Thread Michael Stauffer
Hi,

I'm getting back to this post finally. I've looked at the links and
suggestions in the two replies to my original post a few months ago, but
they haven't helped. Here's my original:

I'm getting some queued jobs with scheduling info that includes this line
at the end:

cannot run in PE "unihost" because it only offers 0 slots

'unihost' is the only PE I use. When users request multiple slots, they use
'unihost':

qsub ... -binding linear:2 -pe unihost 2 ...

What happens is that these jobs aren't running when it otherwise seems like
they should be, or they sit waiting in the queue for a long time even when
the user has plenty of quota available within the queue they've requested,
and there are enough resources available on the queue's nodes per
qhost(slots and vmem are consumables), and qquota isn't showing any rqs
limits have been reached.

Below I've dumped relevant configurations.

Today I created a new PE called "int_test" to test the "integer" allocation
rule. I set it to 16 (16 cores per node), and have also tried 8. It's been
added as a PE to the queues we use. When I try to run to this new PE
however, it *always* fails with the same "PE ...offers 0 slots" error, even
if I can run the same multi-slot job using "unihost" PE at the same time.
I'm not sure if this helps debug or not.

Another thought - this behavior started happening some time ago more or
less when I tried implementing fairshare behavior. I never seemed to get
fairshare working right. We haven't been able to confirm, but for some
users it seems this "PE 0 slots" issue pops up only after they've been
running other jobs for a little while. So I'm wondering if I've screwed up
fairshare in some way that's causing this odd behavior.

The default queue from global config file is all.q.

Here are various config dumps. Is there anything else that might be helpful?

Thanks for any help! This has been plaguing me.


[root@chead ~]# qconf -sp unihost

pe_nameunihost
slots  
user_lists NONE
xuser_listsNONE
start_proc_args/bin/true
stop_proc_args /bin/true
allocation_rule$pe_slots
control_slaves FALSE
job_is_first_task  TRUE
urgency_slots  min
accounting_summary FALSE
qsort_args NONE


[root@chead ~]# qconf -sp int_test

pe_nameint_test
slots  
user_lists NONE
xuser_listsNONE
start_proc_args/bin/true
stop_proc_args /bin/true
allocation_rule8
control_slaves FALSE
job_is_first_task  TRUE
urgency_slots  min
accounting_summary FALSE
qsort_args NONE


[root@chead ~]# qconf -ssconf

algorithm default
schedule_interval 0:0:5
maxujobs  200
queue_sort_method load
job_load_adjustments  np_load_avg=0.50
load_adjustment_decay_time0:7:30
load_formula  np_load_avg
schedd_job_info   true
flush_submit_sec  0
flush_finish_sec  0
paramsnone
reprioritize_interval 0:0:0
halftime  1
usage_weight_list cpu=0.70,mem=0.20,io=0.10
compensation_factor   5.00
weight_user   0.25
weight_project0.25
weight_department 0.25
weight_job0.25
weight_tickets_functional 1000
weight_tickets_share  10
share_override_ticketsTRUE
share_functional_shares   TRUE
max_functional_jobs_to_schedule   2000
report_pjob_tickets   TRUE
max_pending_tasks_per_job 100
halflife_decay_list   none
policy_hierarchy  OS
weight_ticket 0.00
weight_waiting_time   1.00
weight_deadline   360.00
weight_urgency0.10
weight_priority   1.00
max_reservation   0
default_duration  INFINITY


[root@chead ~]# qconf -sconf

#global:
execd_spool_dir  /opt/sge/default/spool
mailer   /bin/mail
xterm/usr/bin/X11/xterm
load_sensor  none
prolog   none
epilog   none
shell_start_mode posix_compliant
login_shells sh,bash,ksh,csh,tcsh
min_uid  0
min_gid  0
user_lists   none
xuser_lists  none
projects none
xprojectsnone
enforce_project  false
enforce_user auto
load_report_time 00:00:40
max_unheard  00:05:00
reschedule_unknown   02:00:00
loglevel log_warning
administrator_mail   none
set_token_cmdnone
pag_cmd  none

Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE

2017-08-11 Thread Subashini K
Hi,


Can anyone cross check it? If so, how?

Or, should we ask the system administrator?

Thanks,
Subashini.K

On Fri, Aug 11, 2017 at 4:37 PM, Gowtham  wrote:

>
> Is the folder, /usr/local/gromacs, shared across all nodes in your cluster?
>
>
> On Fri, Aug 11, 2017 at 1:59 AM Subashini K 
> wrote:
>
>> Hi,
>>
>> (1) GROMACS is installed in /usr/local/gromacs/bin/
>>
>> Not in root as I mentioned earlier.
>>
>> (2) When I gave
>>
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -cwd
>> #$ -N smp1
>> #$ -l h_vmem=1G
>> usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq
>>
>> I got the same error again.
>>
>> But, when I run it directly my login node,   gmx mdrun -ntmpi 1 -ntomp 8
>> -v -deffnm eq the command works fine.
>>
>>
>> I realize, the problem lies in my scripiting. How to fix it? Is there any
>> special method to set the path in the above submit.sh file?
>>
>> Executable:   /usr/local/gromacs/bin/gmx
>> Library dir:  /usr/local/gromacs/share/gromacs/top
>>
>>
>> Can anyone help me?
>>
>> Thanks,
>> Subashini.K
>>
>>
>> --
>
> Gowtham, PhD
> Director of Research Computing, IT
> Adj. Asst. Professor, ECE and Physics
> Michigan Technological University
>
> (906) 487-4096
> http://it.mtu.edu
> http://hpc.mtu.edu
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE

2017-08-11 Thread Ian Kaufman
ssh into a node and try

"which gmx"

Either it will be found in your path, or it won't.

Or "ls /usr/local/gromacs/bin/gmx" and see if exists.

Best to include the full path to the things you need every time. Using
"which" makes things a little easier i.e.

GMX=`which gmx`

and then use $GMX in your script.

Or GROMACSPATH="/usr/local/gromacs" and GROMACSBIN="/usr/local/gromacs/bin"
and use those vars as needed.

Ian

On Fri, Aug 11, 2017 at 8:22 AM, Subashini K 
wrote:

> Hi,
>
>
> Can anyone cross check it? If so, how?
>
> Or, should we ask the system administrator?
>
> Thanks,
> Subashini.K
>
> On Fri, Aug 11, 2017 at 4:37 PM, Gowtham  wrote:
>
>>
>> Is the folder, /usr/local/gromacs, shared across all nodes in your
>> cluster?
>>
>>
>> On Fri, Aug 11, 2017 at 1:59 AM Subashini K 
>> wrote:
>>
>>> Hi,
>>>
>>> (1) GROMACS is installed in /usr/local/gromacs/bin/
>>>
>>> Not in root as I mentioned earlier.
>>>
>>> (2) When I gave
>>>
>>>
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -cwd
>>> #$ -N smp1
>>> #$ -l h_vmem=1G
>>> usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq
>>>
>>> I got the same error again.
>>>
>>> But, when I run it directly my login node,   gmx mdrun -ntmpi 1 -ntomp 8
>>> -v -deffnm eq the command works fine.
>>>
>>>
>>> I realize, the problem lies in my scripiting. How to fix it? Is there
>>> any special method to set the path in the above submit.sh file?
>>>
>>> Executable:   /usr/local/gromacs/bin/gmx
>>> Library dir:  /usr/local/gromacs/share/gromacs/top
>>>
>>>
>>> Can anyone help me?
>>>
>>> Thanks,
>>> Subashini.K
>>>
>>>
>>> --
>>
>> Gowtham, PhD
>> Director of Research Computing, IT
>> Adj. Asst. Professor, ECE and Physics
>> Michigan Technological University
>>
>> (906) 487-4096
>> http://it.mtu.edu
>> http://hpc.mtu.edu
>>
>
>
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>


-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE

2017-08-11 Thread Gowtham
Is the folder, /usr/local/gromacs, shared across all nodes in your cluster?


On Fri, Aug 11, 2017 at 1:59 AM Subashini K 
wrote:

> Hi,
>
> (1) GROMACS is installed in /usr/local/gromacs/bin/
>
> Not in root as I mentioned earlier.
>
> (2) When I gave
>
>
> #!/bin/bash
> #$ -S /bin/bash
> #$ -cwd
> #$ -N smp1
> #$ -l h_vmem=1G
> usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq
>
> I got the same error again.
>
> But, when I run it directly my login node,   gmx mdrun -ntmpi 1 -ntomp 8
> -v -deffnm eq the command works fine.
>
>
> I realize, the problem lies in my scripiting. How to fix it? Is there any
> special method to set the path in the above submit.sh file?
>
> Executable:   /usr/local/gromacs/bin/gmx
> Library dir:  /usr/local/gromacs/share/gromacs/top
>
>
> Can anyone help me?
>
> Thanks,
> Subashini.K
>
>
> --

Gowtham, PhD
Director of Research Computing, IT
Adj. Asst. Professor, ECE and Physics
Michigan Technological University

(906) 487-4096
http://it.mtu.edu
http://hpc.mtu.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE

2017-08-11 Thread Alexander Hasselhuhn

Hi,

you had a typo in your script (the leading slash in the path), you want:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -N smp1
#$ -l h_vmem=1G
/usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq


for these things it might be more convenient to ask a colleague who has 
a little experience with the linux command line. However, something I 
would try in your case, which could make your life a little easier (if 
you also need to send different scripts at some point, and given the 
computers you want to run your stuff on are set up similar to your 
interactive machine) is to call bash -l instead of bash, like so


#!/bin/bash -l
#$ -S /bin/bash
#$ -cwd
#$ -N smp1
#$ -l h_vmem=1G
gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq

Cheers,
Alex

Am 11.08.17 um 07:59 schrieb Subashini K:

Hi,

(1) GROMACS is installed in /usr/local/gromacs/bin/

Not in root as I mentioned earlier.

(2) When I gave

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -N smp1
#$ -l h_vmem=1G
usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq

I got the same error again.

But, when I run it directly my login node,   gmx mdrun -ntmpi 1 -ntomp 
8 -v -deffnm eq the command works fine.



I realize, the problem lies in my scripiting. How to fix it? Is there 
any special method to set the path in the above submit.sh file?


Executable:   /usr/local/gromacs/bin/gmx
Library dir:  /usr/local/gromacs/share/gromacs/top


Can anyone help me?

Thanks,
Subashini.K




___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


--
Dr. Alexander Hasselhuhn
Rahel-Straus-Str. 4
76137 Karlsruhe
Tel. +49 176 64066387



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] RUNNING GROMACS SIMULATIONS THROUGH SCRIPT FILE

2017-08-11 Thread Subashini K
Hi,

(1) GROMACS is installed in /usr/local/gromacs/bin/

Not in root as I mentioned earlier.

(2) When I gave

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -N smp1
#$ -l h_vmem=1G
usr/local/gromacs/bin/gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm eq

I got the same error again.

But, when I run it directly my login node,   gmx mdrun -ntmpi 1 -ntomp 8 -v
-deffnm eq the command works fine.


I realize, the problem lies in my scripiting. How to fix it? Is there any
special method to set the path in the above submit.sh file?

Executable:   /usr/local/gromacs/bin/gmx
Library dir:  /usr/local/gromacs/share/gromacs/top


Can anyone help me?

Thanks,
Subashini.K
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users