hello people...

i have set up a 3 node cluster (1 head node+2 client nodes). i m having problem set up a batch system for running multiple jobs.
here's the exact problem with all the o/p as shown...

1) when i run the script using qsub, only 1 job runs at a time   ( at this point maui scheduler is running pbs_sched is stopped)
   but pbsnodes -a command shows the same job 2 b running on both the client nodes.

[EMAIL PROTECTED] PBS]$ qsub ./test
372.cluster
[EMAIL PROTECTED] PBS]$ qsub ./test
373.cluster
[EMAIL PROTECTED] PBS]$ qsub ./test
374.cluster
[EMAIL PROTECTED] PBS]$ qstat -a

cluster:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
372.cluster     user1    workq    test          721 1   1    --  10000 R   --
373.cluster     user1    workq    test          --    1   1    --  10000 Q   --
374.cluster     user1    workq    test          --    1   1    --  10000 Q   --
[EMAIL PROTECTED] PBS]$ pbsnodes -a
oscarnode1.djscoe
     state = free
     np = 2
     properties = all
     ntype = cluster
     jobs = 0/372.cluster   
 
oscarnode2.djscoe
     state = free
     np = 2
     properties = all
     ntype = cluster
     jobs = 0/372.cluster   


2) i tried stopping the maui scheduler and starting the pbs_scheduler. then i submitted the scripts again.this time qstat shows 2 jobs 2 b running.
   however "pbsnodes -a" shows both of them 2 b running on same node and status is shown 2 b free.
   here's the o/p..................

[EMAIL PROTECTED] PBS]$ qsub ./test
375.cluster
[EMAIL PROTECTED] PBS]$ qsub ./test
376.cluster
[EMAIL PROTECTED] PBS]$ qsub ./test
377.cluster
[EMAIL PROTECTED] PBS]$ qstat -a

cluster:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
375.cluster     user1    workq    test          780   1   1    --  10000  R   --
376.cluster     user1    workq    test          837   1   1    --  10000  R   --
377.cluster     user1    workq    test          --       1   1    --  10000 Q   --
[EMAIL PROTECTED] PBS]$ pbsnodes -a
oscarnode1.djscoe
     state = free
     np = 2
     properties = all
     ntype = cluster
     jobs = 0/376.cluster, 0/375.cluster

oscarnode2.djscoe
     state = free
     np = 2
     properties = all
     ntype = cluster


here's the file /var/spool/pbs/server_priv/nodes
oscarnode1.djscoe all
oscarnode2.djscoe all

this means that no matter whichever scheduler i use, the job runs on one node only?? (oscarnode2 for maui and oscranode1 for pbs)
I checked the file maui.cfg. As jeremy said, it already had the entries

NODEACCESSPOLICY        DEDICATED
JOBNODEMATCHPOLICY      EXACTNODE
also i would like to know which is the nodes file that maui refers to. coz the node that pbs executes the job on and the one that maui execute the job on are different.

--thanks


>From: Jeremy Enos <[EMAIL PROTECTED]> >To: "maulin pandya" <[EMAIL PROTECTED]> >CC: oscar Users list <[EMAIL PROTECTED]> >Subject: Re: [Oscar-users] pbs problem batch processing not working >Date: Wed, 31 Mar 2004 19:38:31 -0600 > >I wouldn't worry about the tests not working too much if you can run >jobs manually without problems. To prevent them from overlapping on >nodes, I think that's probably a Maui configuration. >Try adding these options to /opt/maui/maui.cfg and then restarting >maui of course: >NODEACCESSPOLICY DEDICATED >JOBNODEMATCHPOLICY EXACTNODE > > Jeremy > >At 07:02 AM 3/31/2004, maulin pandya wrote: > >>hi >> >>initially i had started on the project usin RH7.3..and oscar 2.2.1 >>i know tht higher oscar versions support RH7.3 but am not so sure >>if upgrading will be that easy. so v used oscar v-2.2.1. no other >>specific reasons :) >> >>i have 1 head node and two compute nodes. also now when i made some >>chgs in the nodes file for pbs and submit more jobs, two jobs r >>showm 2 b running (rest r queued) with qstat command but both of >>them get executed on the same node one after the other. >> >>the nodes file has following entries: >> >>oscarnode1 np=2 all >>oscarnode2 np=2 all >> >>--thanks >>--dj >> >> >> >> >> >> >From: Jeremy Enos <[EMAIL PROTECTED]> >To: "maulin pandya" >><[EMAIL PROTECTED]>, [EMAIL PROTECTED] >Subject: >>Re: [Oscar-users] pbs problem batch processing not working >Date: >>Tue, 30 Mar 2004 15:10:29 -0600 > >Sounds like there is a job stuck >>for some reason... and the tests >have an apparent bug in 2.1. Any >>reason you're not using 3.0? > > Jeremy > >At 11:47 AM 3/30/2004, >>maulin pandya wrote: > > > > > > >>hi >>i have set up a 2 node >>oscar(ver 2.2.1) cluster.all the tests in >>test cluster setup run >>successfully.however whenever i submit jobs >>using qsub command >>and then view the status using qstat,only one >>job is running and >>the rest are queued. >>pbsnodes -a shows the job-exclusive status >>for both the nodes and >>it also shows the same job to be runnin on >>both the nodes. >>all the oscar tests have passed >>can anyone pls >>te! ll me where the problem is? >>---thanks >>--dj >> >> >> >>---------- >>Contact brides & grooms FREE! >> >><http://g.msn.com/8HMBENIN/2737??PS=>Only on www.shaadi.com. >> >>Register now! >> >>------------------------------------------------------- This >>SF.Net >>email is sponsored by: IBM Linux Tutorials Free Linux >>tutorial >>presented by Daniel Robbins, President and CEO of GenToo >> >>technologies. Learn everything from fundamentals to system >> >>administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=c >>lick >>_______________________________________________ Oscar-users >>mailing >>list [EMAIL PROTECTED] >> >>https://lists.sourceforge.net/lists/li stinfo/oscar-users >> >> >>---------- >>Easiest Money Transfer to India . Send Money To 6000 Indian Towns. >><http://g.msn.com/8HMAENIN/2731??PS=>Easiest Way To Send Money >>Home!


Apply to 50,000 jobs now. Post your CV on naukri.com today. ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to