Re: [gridengine users] users Digest, Vol 72, Issue 13

John_Tai Tue, 13 Dec 2016 18:07:36 -0800

I switched schedd_job_info to true, these are the outputs you requested:



# qstat -j 95
==============================================================
job_number:                 95
exec_file:                  job_scripts/95
submission_time:            Tue Dec 13 08:50:34 2016
owner:                      johnt
uid:                        162
group:                      sa
gid:                        4563
sge_o_home:                 /home/johnt
sge_o_log_name:             johnt
sge_o_path:                 
/home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.
sge_o_shell:                /bin/tcsh
sge_o_workdir:              /home/johnt/sge8
sge_o_host:                 ibm005
account:                    sge
cwd:                        /home/johnt/sge8
mail_list:                  johnt@ibm005
notify:                     FALSE
job_name:                   xclock
jobshare:                   0
hard_queue_list:            all.q@ibm038
env_list:                   
TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/johnt,SHELL=/bin/tcsh,USER=johnt,LOGNAME=johnt,PATH=/home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.,HOSTTYPE=x86_64-linux,VENDOR=unknown,OSTYPE=linux,MACHTYPE=x86_64,SHLVL=1,PWD=/home/johnt/sge8,GROUP=sa,HOST=ibm005,REMOTEHOST=dsls11,MAIL=/var/spool/mail/johnt,LS_COLORS=no=00:fi=00:di=00;36:ln=00;34:pi=40;33:so=01;31:bd=40;33:cd=40;33:or=40;31:ex=00;31:*.tar=00;33:*.tgz=00;33:*.zip=00;33:*.bz2=00;33:*.z=00;33:*.Z=00;33:*.gz=00;33:*.ev=00;41,G_BROKEN_FILENAMES=1,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,KDE_IS_PRELINKED=1,KDEDIR=/usr,LANG=en_US.UTF-8,LESSOPEN=|/usr/bin/lesspipe.sh
 
%s,HOSTNAME=ibm005,INPUTRC=/etc/inputrc,ASSURA_AUTO_64BIT=NONE,EDITOR=vi,TOP=-ores
 
60,CVSROOT=/home/edamgr/CVSTF,OPERA_PLUGIN_PATH=/usr/java/jre1.5.0_01/plugin/i386/ns7,NPX_PLUGIN_PATH=/usr/java/jre1.5.0_01/plugin/i386/ns7,MANPATH=/home/sge/sg!
 
e8.1.9-1.el5/man:/usr/share/man:/usr/X11R6/man:/usr/kerberos/man,LD_LIBRARY_PATH=/usr/lib:/usr/local/lib:/usr/lib64:/usr/local/lib64,MGC_HOME=/home/eda/mentor/aoi_cal_2015.3_25.16,CALIBRE_LM_LOG_LEVEL=WARN,MGLS_LICENSE_FILE=1717@ibm004:1717@ibm005:1717@ibm041:1717@ibm042:1717@ibm043:1717@ibm033:1717@ibm044:1717@td156:1717@td158:1717@ATD222,MGC_CALGUI_RELEASE_LICENSE_TIME=0.5,MGC_RVE_RELEASE_LICENSE_TIME=0.5,SOSCAD=/cad,EDA_TOOL_SETUP_ROOT=/cad/toolSetup,EDA_TOOL_SETUP_VERSION=1.0,SGE_ROOT=/home/sge/sge8.1.9-1.el5,SGE_ARCH=lx-amd64,SGE_CELL=cell2,SGE_CLUSTER_NAME=p6444,SGE_QMASTER_PORT=6444,SGE_EXECD_PORT=6445,DRMAA_LIBRARY_PATH=/home/sge/sge8.1.9-1.el5/lib//libdrmaa.so
script_file:                xclock
parallel environment:  cores range: 1
binding:                    NONE
job_type:                   binary
scheduling info:            cannot run in queue "pc.q" because it is not 
contained in its hard queue list (-q)
                            cannot run in queue "sim.q" because it is not 
contained in its hard queue list (-q)
                            cannot run in queue "all.q@ibm021" because it is 
not contained in its hard queue list (-q)
                            cannot run in PE "cores" because it only offers 0 
slots
johnt@ibm005: /home/johnt/sge8 #

----------------------------------------------------------------------------------------------------------------------------

# qconf -srqs
No resource quota set found







-----Original Message-----
From: Reuti [mailto:re...@staff.uni-marburg.de]
Sent: Tuesday, December 13, 2016 7:51
To: John_Tai
Cc: Coleman, Marcus [JRDUS Non-J&J]; users@gridengine.org
Subject: Re: [gridengine users] users Digest, Vol 72, Issue 13


> Am 13.12.2016 um 01:55 schrieb John_Tai <john_...@smics.com>:
>
>>> Did you set up and/or request any memory per machine?
>
> No, didn't use this other complex:
>
>        # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038 xclock

It could be coded in the sge_request file and you won't see it on the command 
line. Can you please post the output of

$ qstat -j <its_job_id>


> OTOH: if you submit 2 single CPU jobs to node ibm038, are they scheduled?
>
> As long as I submit with PE request, it doesn't schedule, even if it's just 1 
> core:
>
>        # qsub -V -b y -cwd -now n -pe cores 1 -q all.q@ibm038 xclock

That's strange. Any output in:

$ qconf -srqs


> However if I don't use PE it works fine:
>
>        # qsub -V -b y -cwd -now n -q all.q@ibm038 xclock
>
>
>>> Unless you want to oversubscribe by intention, the above can be set to 
>>> NONE. In fact, it might look ahead of the coming load and together with:
>
> Not sure which to set to NONE. Is it the load threshold? I might need this 
> just in case someone is running a job on the exec host without using SGE.

This is not advisable with SGE, as there maybe already an SGE job running and 
the user starts to use the machine interactively without knowing it. The 
load_thresholds will only avoid to schedule new jobs to it. Running ones will 
continue, unless you look into suspend_thresholds. With this of course, h_rt 
makes not much sense any more.


> FYI:
>
> # qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:15
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              np_load_avg=0.50
> load_adjustment_decay_time        0:7:30
> load_formula                      np_load_avg
> schedd_job_info                   false

The above can be set to true, to get a more detailed output for `qstat -j` 
variants.

-- Reuti


> flush_submit_sec                  0
> flush_finish_sec                  0
> params                            none
> reprioritize_interval             0:0:0
> halftime                          168
> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         0
> weight_tickets_share              0
> share_override_tickets            TRUE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  OFS
> weight_ticket                     0.010000
> weight_waiting_time               0.000000
> weight_deadline                   3600000.000000
> weight_urgency                    0.100000
> weight_priority                   1.000000
> max_reservation                   0
> default_duration                  INFINITY
>
>
>
>
> -----Original Message-----
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: Monday, December 12, 2016 8:34
> To: John_Tai
> Cc: Coleman, Marcus [JRDUS Non-J&J]; users@gridengine.org
> Subject: Re: [gridengine users] users Digest, Vol 72, Issue 13
>
>
>> Am 12.12.2016 um 07:02 schrieb John_Tai <john_...@smics.com>:
>>
>> Thank you all for trying to work this out.
>>
>>
>>
>>>> allocation_rule $fill_up <---work better for parallel jobs
>>
>> I do want my job to run on one machine only
>>
>>>> control_slaves TRUE < ---- you want tight integration with SGE
>>>> job_is_first_task  <----can go either way, unless you are sure your 
>>>> software will control job distro...
>>
>> And the job will be controlled by my software, not SGE. I only need SGE to 
>> keep track of the slots (i.e. CPU cores).
>>
>> -------------------------------------------------
>>
>> There were no messages on qmaster or ibm038. The job I submitted is not in 
>> error, it's just waiting for free slots.
>>
>> -------------------------------------------------
>>
>> I changed queue slots setting and removed all other PE, but I got the same 
>> error.
>>
>>
>> # qconf -sq all.q
>> qname                 all.q
>> hostlist              @allhosts
>> seq_no                0
>> load_thresholds       np_load_avg=1.75
>
> Unless you want to oversubscribe by intention, the above can be set to NONE. 
> In fact, it might look ahead of the coming load and together with:
>
> $ qconf -ssconf
> ...
> job_load_adjustments              np_load_avg=0.50
> load_adjustment_decay_time        0:7:30
>
> it can lead to the effect, that the job can't be scheduled. This can even be 
> adjusted to read:
>
> job_load_adjustments              NONE
> load_adjustment_decay_time        0:0:0
>
> In your current case of course, where 8 slots are defined and you test with 2 
> this shouldn't be a problem though.
>
> Did you set up and/or request any memory per machine?
>
> OTOH: if you submit 2 single CPU jobs to node ibm038, are they scheduled?
>
> -- Reuti
>
>
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              0
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               cores
>> rerun                 FALSE
>> slots                 8
>> tmpdir                /tmp
>> shell                 /bin/sh
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               INFINITY
>> h_stack               INFINITY
>> s_core                INFINITY
>> h_core                INFINITY
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>> # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038 xclock Your
>> job
>> 92 ("xclock") has been submitted # qstat
>> job-ID  prior   name       user         state submit/start at     queue      
>>                     slots ja-task-ID
>> -----------------------------------------------------------------------------------------------------------------
>>    91 0.55500 xclock     johnt        qw    12/12/2016 13:54:02              
>>                       2
>>    92 0.00000 xclock     johnt        qw    12/12/2016 13:55:59              
>>                       2
>> # qalter -w p 92
>> Job 92 cannot run in queue "pc.q" because it is not contained in its
>> hard queue list (-q) Job 92 cannot run in queue "sim.q" because it is
>> not contained in its hard queue list (-q) Job 92 cannot run in queue
>> "all.q@ibm021" because it is not contained in its hard queue list
>> (-q) Job 92 cannot run in queue "all.q@ibm037" because it is not
>> contained in its hard queue list (-q) Job 92 cannot run in PE "cores"
>> because it only offers 0 slots
>> verification: no suitable queues
>>
>>
>>
>> -----Original Message-----
>> From: users-boun...@gridengine.org
>> [mailto:users-boun...@gridengine.org] On Behalf Of Coleman, Marcus
>> [JRDUS Non-J&J]
>> Sent: Monday, December 12, 2016 1:35
>> To: users@gridengine.org
>> Subject: Re: [gridengine users] users Digest, Vol 72, Issue 13
>>
>> Hi
>>
>> I am sure this is your problem....You are submitting a job that requires 2 
>> cores, to a queue that has only 1 slot available.
>> If your host all have the same amount of cores...it is no reason to separate 
>> them via commons. This is only needed if the host have different amount of 
>> slots or you want to manipulate  the slots...
>>
>> slots                 1,[ibm021=8],[ibm037=8],[ibm038=8]
>> slots             8
>>
>>
>> I would only list the pe I am using I am requesting...unless you plan to use 
>> each of those PE's
>> pe_list               make mpi smp cores
>> pe_list               cores
>>
>>
>> Also you mentioned parallel env, I WOULD change allocation to $fill_up 
>> unless your software (not sge) control jobs distribution..
>>
>> qconf -sp core
>> allocation_rule    $pe_slots <---( ONLY USE ONE MACHINE)
>> control_slaves     FALSE <--- (I think you want tight integration)
>> job_is_first_task  TRUE  <----( this is true if the  first job
>> submitted only kicks off other jobs)
>>
>> allocation_rule $fill_up <---work better for parallel jobs
>> control_slaves TRUE < ---- you want tight integration with SGE
>> job_is_first_task  <----can go either way, unless you are sure your software 
>> will control job distro...
>>
>>
>> Also what does qmaster message and the associated node sge message say...
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: users-boun...@gridengine.org
>> [mailto:users-boun...@gridengine.org] On Behalf Of
>> users-requ...@gridengine.org
>> Sent: Sunday, December 11, 2016 9:05 PM
>> To: users@gridengine.org
>> Subject: [EXTERNAL] users Digest, Vol 72, Issue 13
>>
>> Send users mailing list submissions to
>>       users@gridengine.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>       https://gridengine.org/mailman/listinfo/users
>> or, via email, send a message with subject or body 'help' to
>>       users-requ...@gridengine.org
>>
>> You can reach the person managing the list at
>>       users-ow...@gridengine.org
>>
>> When replying, please edit your Subject line so it is more specific than 
>> "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. Re: CPU complex (John_Tai)
>>
>>
>> ---------------------------------------------------------------------
>> -
>>
>> Message: 1
>> Date: Mon, 12 Dec 2016 05:04:33 +0000
>> From: John_Tai <john_...@smics.com>
>> To: Christopher Heiny <christopherhe...@gmail.com>
>> Cc: "users@gridengine.org" <users@gridengine.org>
>> Subject: Re: [gridengine users] CPU complex
>> Message-ID: <EB25FF8EBBD4BC478EF05F2F4C436479021D2BA5A2@shex-d02>
>> Content-Type: text/plain; charset="utf-8"
>>
>> # qconf -sq all.q
>> qname                 all.q
>> hostlist              @allhosts
>> seq_no                0
>> load_thresholds       np_load_avg=1.75
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              0
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               make mpi smp cores
>> rerun                 FALSE
>> slots                 1,[ibm021=8],[ibm037=8],[ibm038=8]
>> tmpdir                /tmp
>> shell                 /bin/sh
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               INFINITY
>> h_stack               INFINITY
>> s_core                INFINITY
>> h_core                INFINITY
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>>
>>
>>
>> From: Christopher Heiny [mailto:christopherhe...@gmail.com]
>> Sent: Monday, December 12, 2016 12:22
>> To: John_Tai
>> Cc: users@gridengine.org; Reuti
>> Subject: Re: [gridengine users] CPU complex
>>
>>
>>
>> On Dec 11, 2016 5:11 PM, "John_Tai" 
>> <john_...@smics.com<mailto:john_...@smics.com>> wrote:
>> I associated the queue with the PE:
>>
>>       qconf -aattr queue pe_list cores all.q The only slots were defined in 
>> the all.q queue, and just the total slots in the PE:
>>
>>>> # qconf -sp cores
>>>> pe_name            cores
>>>> slots              999
>>>> user_lists         NONE
>>>> xuser_lists        NONE
>> Do I need to define slots in another way for each exec host? Is there a way 
>> to check the current free slots for a host, other than the qstat -f below?
>>
>>> # qstat -f
>>> queuename                      qtype resv/used/tot. load_avg arch          
>>> states
>>> ---------------------------------------------------------------------------------
>>> all.q@ibm021<mailto:all.q@ibm021>                   BIP   0/0/8          
>>> 0.02     lx-amd64
>>> ---------------------------------------------------------------------------------
>>> all.q@ibm037<mailto:all.q@ibm037>                   BIP   0/0/8          
>>> 0.00     lx-amd64
>>> ---------------------------------------------------------------------------------
>>> all.q@ibm038<mailto:all.q@ibm038>                   BIP   0/0/8          
>>> 0.00     lx-amd64
>>
>> What is the output of the command
>>   qconf -sq all.q
>> ? (I think that's right one)
>>
>> Chris
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Reuti
>> [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de>
>> ]
>> Sent: Saturday, December 10, 2016 5:40
>> To: John_Tai
>> Cc: users@gridengine.org<mailto:users@gridengine.org>
>> Subject: Re: [gridengine users] CPU complex
>>
>> Am 09.12.2016 um 10:36 schrieb John_Tai:
>>
>>> 8 slots:
>>>
>>> # qstat -f
>>> queuename                      qtype resv/used/tot. load_avg arch          
>>> states
>>> ---------------------------------------------------------------------------------
>>> all.q@ibm021<mailto:all.q@ibm021>                   BIP   0/0/8          
>>> 0.02     lx-amd64
>>> ---------------------------------------------------------------------------------
>>> all.q@ibm037<mailto:all.q@ibm037>                   BIP   0/0/8          
>>> 0.00     lx-amd64
>>> ---------------------------------------------------------------------------------
>>> all.q@ibm038<mailto:all.q@ibm038>                   BIP   0/0/8          
>>> 0.00     lx-amd64
>>> ---------------------------------------------------------------------------------
>>> pc.q@ibm021<mailto:pc.q@ibm021>                    BIP   0/0/1          
>>> 0.02     lx-amd64
>>> ---------------------------------------------------------------------------------
>>> sim.q@ibm021<mailto:sim.q@ibm021>                   BIP   0/0/1          
>>> 0.02     lx-amd64
>>
>> Is there any limit of slots in the exechost defined, or in an RQS?
>>
>> -- Reuti
>>
>>
>>>
>>> ####################################################################
>>> #
>>> #
>>> ######
>>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>> PENDING JOBS 
>>> ############################################################################
>>>   89 0.55500 xclock     johnt        qw    12/09/2016 15:14:25     2
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Reuti
>>> [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de
>>> >
>>> ]
>>> Sent: Friday, December 09, 2016 3:46
>>> To: John_Tai
>>> Cc: users@gridengine.org<mailto:users@gridengine.org>
>>> Subject: Re: [gridengine users] CPU complex
>>>
>>> Hi,
>>>
>>> Am 09.12.2016 um 08:20 schrieb John_Tai:
>>>
>>>> I've setup PE but I'm having problems submitting jobs.
>>>>
>>>> - Here's the PE I created:
>>>>
>>>> # qconf -sp cores
>>>> pe_name            cores
>>>> slots              999
>>>> user_lists         NONE
>>>> xuser_lists        NONE
>>>> start_proc_args    /bin/true
>>>> stop_proc_args     /bin/true
>>>> allocation_rule    $pe_slots
>>>> control_slaves     FALSE
>>>> job_is_first_task  TRUE
>>>> urgency_slots      min
>>>> accounting_summary FALSE
>>>> qsort_args         NONE
>>>>
>>>> - I've then added this to all.q:
>>>>
>>>> qconf -aattr queue pe_list cores all.q
>>>
>>> How many "slots" were defined in there queue definition for all.q?
>>>
>>> -- Reuti
>>>
>>>
>>>> - Now I submit a job:
>>>>
>>>> # qsub -V -b y -cwd -now n -pe cores 2 -q
>>>> all.q@ibm038<mailto:all.q@ibm038> xclock Your job
>>>> 89 ("xclock") has been submitted # qstat
>>>> job-ID  prior   name       user         state submit/start at     queue    
>>>>                       slots ja-task-ID
>>>> -----------------------------------------------------------------------------------------------------------------
>>>>  89 0.00000 xclock     johnt        qw    12/09/2016 15:14:25              
>>>>                       2
>>>> # qalter -w p 89
>>>> Job 89 cannot run in PE "cores" because it only offers 0 slots
>>>> verification: no suitable queues
>>>> # qstat -f
>>>> queuename                      qtype resv/used/tot. load_avg arch          
>>>> states
>>>> ---------------------------------------------------------------------------------
>>>> all.q@ibm038<mailto:all.q@ibm038>                   BIP   0/0/8          
>>>> 0.00     lx-amd64
>>>>
>>>> ###################################################################
>>>> #
>>>> #
>>>> #
>>>> ######
>>>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>>> PENDING JOBS 
>>>> ############################################################################
>>>>  89 0.55500 xclock     johnt        qw    12/09/2016 15:14:25     2
>>>>
>>>>
>>>> ----------------------------------------------------
>>>>
>>>> It looks like all.q@ibm038<mailto:all.q@ibm038> should have 8 free slots, 
>>>> so why is it only offering 0?
>>>>
>>>> Hope you can help me.
>>>> Thanks
>>>> John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Reuti
>>>> [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-Marburg.D
>>>> E
>>>>>
>>>> ]
>>>> Sent: Monday, December 05, 2016 6:32
>>>> To: John_Tai
>>>> Cc: users@gridengine.org<mailto:users@gridengine.org>
>>>> Subject: Re: [gridengine users] CPU complex
>>>>
>>>> Hi,
>>>>
>>>>> Am 05.12.2016 um 09:36 schrieb John_Tai 
>>>>> <john_...@smics.com<mailto:john_...@smics.com>>:
>>>>>
>>>>> Thank you so much for your reply!
>>>>>
>>>>>>> Will you use the consumable virtual_free here instead mem?
>>>>>
>>>>> Yes I meant to write virtual_free, not mem. Apologies.
>>>>>
>>>>>>> For parallel jobs you need to configure a (or some) so called PE 
>>>>>>> (Parallel Environment).
>>>>>
>>>>> My jobs are actually just one process which uses multiple cores, so for 
>>>>> example in top one process "simv" is currently using 2 cpu cores (200%).
>>>>
>>>> Yes, then it's a parallel job for SGE. Although the entries for 
>>>> start_proc_args resp. stop_proc_args can be left untouched to the default, 
>>>> a PE is the paradigm in SGE for a parallel job.
>>>>
>>>>
>>>>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>> 3017 kelly     20   0 3353m 3.0g 165m R 200.0  0.6  15645:46 simv
>>>>>
>>>>> So I'm not sure PE is suitable for my case, since it is not multiple 
>>>>> parallel processes running at the same time. Am I correct?
>>>>>
>>>>> If so, I am trying to find a way to get SGE to keep track of the number 
>>>>> of cores used, but I believe it only keeps track of the total CPU usage 
>>>>> in %. I guess I could use this and and the <total num cores> to get the 
>>>>> <num of cores in use>, but how to integrate it in SGE?
>>>>
>>>> You can specify a necessary number of cores for your job in the -pe 
>>>> parameter, which can also be a range. The granted allocation by SGE you 
>>>> can check in the job script $NHOSTS, $NSLOTS, $PE_HOSTFILE.
>>>>
>>>> Having this setup, SGE will track the number of used cores per machine. 
>>>> The available ones you define in the queue definition. In case you have 
>>>> more than one queue per exechost, we need to setup in addition an overall 
>>>> limit of cores which can be used at the same time to avoid 
>>>> oversubscription.
>>>>
>>>> -- Reuti
>>>>
>>>>> Thank you again for your help.
>>>>>
>>>>> John
>>>>>
>>>>> -----Original Message-----
>>>>> From: Reuti
>>>>> [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.
>>>>> d
>>>>> e
>>>>>> ]
>>>>> Sent: Monday, December 05, 2016 4:21
>>>>> To: John_Tai
>>>>> Cc: users@gridengine.org<mailto:users@gridengine.org>
>>>>> Subject: Re: [gridengine users] CPU complex
>>>>>
>>>>> Hi,
>>>>>
>>>>> Am 05.12.2016 um 08:00 schrieb John_Tai:
>>>>>
>>>>>> Newbie here, hope to understand SGE usage.
>>>>>>
>>>>>> I've successfully configured virtual_free as a complex for telling SGE 
>>>>>> how much memory is needed when submitting a job, as described here:
>>>>>>
>>>>>> https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclk/index.htm
>>>>>> l
>>>>>> #
>>>>>> <https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclk/index.ht
>>>>>> m
>>>>>> l
>>>>>>>
>>>>>> i
>>>>>> 1000029
>>>>>>
>>>>>> How do I do the same for telling SGE how many CPU cores a job needs? For 
>>>>>> example:
>>>>>>
>>>>>>            qsub -l mem=24G,cpu=4 myjob
>>>>>
>>>>> Will you use the consumable virtual_free here instead mem?
>>>>>
>>>>>
>>>>>> Obviously I'd need for SGE to keep track of the actual CPU utilization 
>>>>>> in the host, just as virtual_free is being tracked independently of the 
>>>>>> SGE jobs.
>>>>>
>>>>> For parallel jobs you need to configure a (or some) so called PE 
>>>>> (Parallel Environment). Purpose of this is, to make preparations for the 
>>>>> parallel jobs like rearranging the list of granted slots, prepare shared 
>>>>> directories between the nodes,...
>>>>>
>>>>> These PEs were of higher importance in former times, when parallel 
>>>>> libraries were not programmed to integrate automatically in SGE for a 
>>>>> tight integration. Your submissions could read:
>>>>>
>>>>> qsub -pe smp 4 myjob        # allocation_rule $peslots, control_slaves 
>>>>> true
>>>>> qsub -pe orte 16 myjob        # allovation_rule $round_robin, 
>>>>> control_slaves tue
>>>>>
>>>>> where smp resp. orte is the chosen parallel environment for OpenMP resp. 
>>>>> Open MPI. Its settings are explained in `man sge_pe`, the "-pe" parameter 
>>>>> to in the submission command in `man qsub`.
>>>>>
>>>>> -- Reuti
>>>>> ________________________________
>>>>>
>>>>> This email (including its attachments, if any) may be confidential and 
>>>>> proprietary information of SMIC, and intended only for the use of the 
>>>>> named recipient(s) above. Any unauthorized use or disclosure of this 
>>>>> email is strictly prohibited. If you are not the intended recipient(s), 
>>>>> please notify the sender immediately and delete this email from your 
>>>>> computer.
>>>>>
>>>>
>>>> ________________________________
>>>>
>>>> This email (including its attachments, if any) may be confidential and 
>>>> proprietary information of SMIC, and intended only for the use of the 
>>>> named recipient(s) above. Any unauthorized use or disclosure of this email 
>>>> is strictly prohibited. If you are not the intended recipient(s), please 
>>>> notify the sender immediately and delete this email from your computer.
>>>>
>>>
>>> ________________________________
>>>
>>> This email (including its attachments, if any) may be confidential and 
>>> proprietary information of SMIC, and intended only for the use of the named 
>>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>>> strictly prohibited. If you are not the intended recipient(s), please 
>>> notify the sender immediately and delete this email from your computer.
>>>
>>
>> ________________________________
>>
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>>
>> _______________________________________________
>> users mailing list
>> users@gridengine.org<mailto:users@gridengine.org>
>> https://gridengine.org/mailman/listinfo/users
>>
>> ________________________________
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>> -------------- next part -------------- An HTML attachment was
>> scrubbed...
>> URL:
>> <http://gridengine.org/pipermail/users/attachments/20161212/5666d5d4/
>> a
>> ttachment.html>
>>
>> ------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>>
>>
>> End of users Digest, Vol 72, Issue 13
>> *************************************
>>
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>> ________________________________
>>
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>>
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>
> ________________________________
>
> This email (including its attachments, if any) may be confidential and 
> proprietary information of SMIC, and intended only for the use of the named 
> recipient(s) above. Any unauthorized use or disclosure of this email is 
> strictly prohibited. If you are not the intended recipient(s), please notify 
> the sender immediately and delete this email from your computer.
>

________________________________

This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] users Digest, Vol 72, Issue 13

Reply via email to