Hi,

Am 11.05.2012 um 20:15 schrieb Hung-Sheng Tsao Ph.D.:

> not sure what is the issues here, see in-line
> On 5/11/2012 8:11 AM, iqtcub wrote:
>> Hi,
>> 
>> Following up with the thread with the same subject ( 
>> http://thread.gmane.org/gmane.comp.clustering.opengridengine.user/894/ ).
>> 
>> We're using sge 6.2u5, our setup is 2 machines(its a testing cluster) with 2 
>> cores each machine.
>> -qsub -q v20z.q -pe smp 1 script.sub
>> -wait until the job runs
>> -qsub -q v20z.q -pe smp 1 script.sub

Why are you requesting a PE, as it's only a serial job? There is:

https://blogs.oracle.com/sgrell/entry/grid_engine_scheduler_hacks_least

To set up a round_robin or fill_up distribution of jobs. But this works for 
serial jobs only, not for parallel ones unless you request $pe_slots in the PE 
definition like you do below. Hence it should work for you in this special case.

-- Reuti


>> Each job enters a different node.
> this is not ok for you?
>> However if we do:
>> -for i in 1 2; do qsub -q v20z.q -pe smp 1 script.sub; done
>> 
>> Then both jobs enter into the same node.
> this is ok for you?
>> 
>> Our scheduling conf is as follows:
>> 
>> ----------------------------
>> algorithm                         default
>> schedule_interval                 0:0:15
>> maxujobs                          0
>> queue_sort_method                 load
>> job_load_adjustments              NONE
>> load_adjustment_decay_time        0:7:30
>> load_formula                      slots
>> schedd_job_info                   true
>> flush_submit_sec                  0
>> flush_finish_sec                  0
>> params                            MONITOR=1
>> reprioritize_interval             0:0:0
>> halftime                          168
>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>> compensation_factor               2.000000
>> weight_user                       0.250000
>> weight_project                    0.250000
>> weight_department                 0.250000
>> weight_job                        0.250000
>> weight_tickets_functional         0
>> weight_tickets_share              1000000
>> share_override_tickets            TRUE
>> share_functional_shares           FALSE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets               TRUE
>> max_pending_tasks_per_job         50
>> halflife_decay_list               none
>> policy_hierarchy                  OS
>> weight_ticket                     0.890000
>> weight_waiting_time               0.000000
>> weight_deadline                   3600000.000000
>> weight_urgency                    0.100000
>> weight_priority                   0.010000
>> max_reservation                   50
>> default_duration                  9999:00:00
>> --------------------------------------------
>> 
>> The smp PE config is:
>> pe_name            smp
>> slots              999
>> user_lists         NONE
>> xuser_lists        NONE
>> start_proc_args    /bin/true
>> stop_proc_args     /bin/true
>> allocation_rule    $pe_slots
>> control_slaves     FALSE
>> job_is_first_task  TRUE
>> urgency_slots      min
>> accounting_summary TRUE
>> -------------------------
>> 
>> The config on both nodes are like this:
>> hostname              v20z-03
>> load_scaling          NONE
>> complex_values        mem_free=7891.796875M,slots=2
>> load_values           arch=lx24-amd64,num_proc=2,mem_total=7935.984375M, \
>>                      swap_total=4095.992188M,virtual_total=12031.976562M, \
>>                      h_fsize=9.7G,load_avg=0.180000,load_short=0.080000, \
>>                      load_medium=0.180000,load_long=0.090000, \
>>                      mem_free=7830.246094M,swap_free=4095.992188M, \
>>                      virtual_free=11926.238281M,mem_used=105.738281M, \
>>                      swap_used=0.000000M,virtual_used=105.738281M, \
>>                      cpu=0.000000,m_topology=SCSC,m_topology_inuse=SCSC, \
>>                      m_socket=2,m_core=2,np_load_avg=0.090000, \
>>                      np_load_short=0.040000,np_load_medium=0.090000, \
>>                      np_load_long=0.045000
>> processors            2
>> user_lists            NONE
>> xuser_lists           NONE
>> projects              NONE
>> xprojects             NONE
>> usage_scaling         cpu=12.300000
>> report_variables      NONE
>> -------------------------------
>> The queue config:
>> qname                 v20z.q
>> hostlist              @v20z
>> seq_no                0
>> load_thresholds       np_load_avg=1.75
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:01:00
>> priority              0
>> min_cpu_interval      00:01:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             BLCR
>> pe_list               make smp
>> rerun                 FALSE
>> slots                 2
>> tmpdir                /scratch
>> shell                 /bin/csh
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        split=2
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               INFINITY
>> h_stack               INFINITY
>> s_core                INFINITY
>> h_core                INFINITY
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>> 
>> -----------------------------------
>> 
>> From what i understood, its possible that this method is broken, am i right?
>> 
>> I've also tried the scheduler configuration in the following links, with the 
>> same result:
>> http://article.gmane.org/gmane.comp.clustering.opengridengine.user/1037
>> http://wiki.gridengine.info/wiki/index.php/StephansBlog
>> 
>> Thanks in advance!
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> -- 
> 
> 
> <laotsao.vcf>_______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to