If you're launching jobs across your entire cluster, then 2 simultaneous jobs is probably all that should run. In fact, many users aren't even comfortable with having multiple jobs overlap on the same nodes... I think Maui may be missing some configuration options. If these two lines aren't at the end of /opt/maui/maui.cfg, try adding them and restarting Maui.

NODEACCESSPOLICY        DEDICATED
JOBNODEMATCHPOLICY      EXACTNODE

        Jeremy

At 04:11 PM 7/1/2004, Jeremy Hansen wrote:

Ok, so after two jobs start, no others run and they just remain in the
queue.  Here is what checkjob says on the job that's just sitting there:

[EMAIL PROTECTED] mallet]$ checkjob 274


checking job 274

State: Idle  (User: oscartst  Group: oscartst)
WallTime: 0:00:00 of   INFINITY
SubmitTime: Thu Jul  1 14:08:29
  (Time Queued  Total: 0:01:33  Eligible: 0:01:33)

Total Tasks: 9

Req[0]  TaskCount: 9  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Class: [workq 1]  Features: [NONE]


IWD: [NONE] Executable: [NONE] QOS: DEFAULT Bypass: 0 StartCount: 0 PartitionMask: [ALL] Reservation '274' ( INFINITY -> INFINITY Duration: INFINITY) PE: 9.00 StartPriority: 1 job cannot run in partition DEFAULT (insufficient idle procs available: 0 < 9)


Any ideas from this output?

Thanks
-jeremy

On Wed, 30 Jun 2004, Jeremy Enos wrote:

> I can probably help here a bit...
>
> In PBS, qstat will show all jobs and their state.  Keep in mind, that in
> typical OSCAR clusters, it is Maui (the job scheduler) which reads PBS's
> information about nodes and queues, and instructs PBS on when to run a
> given job.  If a job isn't running, Maui may be the place to look.
> #1  Make sure Maui is running.
> #2  Make sure pbs_sched (PBS's included dumbed-down FIFO scheduler) isn't
> running and locking the pbs_server port
> #3  Use Maui utilities (checkjob,showq?) to investigate and find out why
> Maui is or isn't running a given job
>
>          Jeremy
>
>
> At 07:08 PM 6/30/2004, Bernard Li wrote:
> >Hey Jeremy:
> >
> >Is there a way to figure out why PBS isn't running the jobs?
> >
> >In SGE, there is qstat -j <jid> and it tells you why (queue busy, yadda
> >yadda)
> >
> >Cheers,
> >
> >Bernard
> >
> > > -----Original Message-----
> > > From: Jeremy Hansen [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, June 30, 2004 16:57
> > > To: Bernard Li; [EMAIL PROTECTED]
> > > Subject: Re: [Oscar-users] Qsub pbs issues
> > >
> > >
> > > Output from pbsnodes -a
> > >
> > > rlx-back-2-6-10.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-2.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-3.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-4.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-5.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-6.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-7.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-8.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > > rlx-back-2-6-9.blahblahblah.net
> > >      state = job-exclusive
> > >      np = 2
> > >      properties = all
> > >      ntype = cluster
> > >      jobs = 0/263.rlx-2-6-1.blahblahblah.net,
> > > 1/262.rlx-2-6-1.blahblahblah.net
> > >
> > >
> > > It seems that only a max of two jobs will run simultaneously.
> > >
> > > Thanks
> > > -jeremy
> > >
> > >
> > > On 6/30/04 4:50 PM, "Bernard Li" <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hey Jeremy:
> > > >
> > > > Don't remember PBS much but what happens if you do 'pbsnodes -a' ?
> > > >
> > > > I'll let the guys know that the repository is down - thanks for
> > > > letting us know.
> > > >
> > > > Cheers,
> > > >
> > > > Bernard
> > > >
> > > >> -----Original Message-----
> > > >> From: Jeremy Hansen [mailto:[EMAIL PROTECTED]
> > > >> Sent: Wednesday, June 30, 2004 16:39
> > > >> To: Bernard Li; [EMAIL PROTECTED]
> > > >> Subject: Re: [Oscar-users] Qsub pbs issues
> > > >>
> > > >> Hmm, I'm more then willing to try Torque.  It appears the package
> > > >> repository is unavailable at the moment.  I just don't understand
> > > >> though.  I'm sure I'm doing something wrong.
> > > >>  Shouldn't openpbs allocate all available resources?
> > > >> Why would it not do this by default?
> > > >>
> > > >> Thanks
> > > >> -jeremy
> > > >>
> > > >>
> > > >> On 6/30/04 4:16 PM, "Bernard Li" <[EMAIL PROTECTED]> wrote:
> > > >>
> > > >>> Hi Jeremy:
> > > >>>
> > > >>> Not too sure about OpenPBS mailing-list but there is
> > > definitely one
> > > >>> for
> > > >>> Torque:
> > > >>>
> > > >>> http://www.supercluster.org/mailing.shtml
> > > >>>
> > > >>> Torque is basically a 'better' version of OpenPBS with a lot of
> > > >>> patches and bug fixes (or is it a complete re-write
> > > >> now...?) - so if
> > > >>> you want less headaches, I would recommend switching to
> > > >> Torque instead.
> > > >>>
> > > >>> There is a package available for Torque from OPD, and I am sure
> > > >>> someone on the list can help you with the switch over...
> > > >>>
> > > >>> I personally use Sun Grid Engine and loved it.  Switched
> > > over from
> > > >>> OpenPBS long time ago and never looked back ;-)
> > > >>>
> > > >>> Cheers,
> > > >>>
> > > >>> Bernard
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: [EMAIL PROTECTED]
> > > >>>> [mailto:[EMAIL PROTECTED] On Behalf
> > > >> Of Jeremy
> > > >>>> Hansen
> > > >>>> Sent: Wednesday, June 30, 2004 16:03
> > > >>>> To: [EMAIL PROTECTED]
> > > >>>> Subject: [Oscar-users] Qsub pbs issues
> > > >>>>
> > > >>>> Perhaps this isn't appropriate for this list but I don't know if
> > > >>>> OpenPBS even has a list for users.  I tried finding one
> > > >> but too many
> > > >>>> registrations and hassle.
> > > >>>>
> > > >>>> The issue I'm having, I submit jobs to the queue and they sit in
> > > >>>> queue state for no reason even though the nodes are free.
> > > >>>> Why doesn't openpbs run the jobs right away?  How do I
> > > >> force things
> > > >>>> to run and allocate nodes immediately?
> > > >>>>
> > > >>>> -jeremy
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> -------------------------------------------------------
> > > >>>> This SF.Net email sponsored by Black Hat Briefings & Training.
> > > >>>> Attend Black Hat Briefings & Training, Las Vegas July
> > > >> 24-29 - digital
> > > >>>> self defense, top technical experts, no vendor pitches,
> > > unmatched
> > > >>>> networking opportunities. Visit www.blackhat.com
> > > >>>> _______________________________________________
> > > >>>> Oscar-users mailing list
> > > >>>> [EMAIL PROTECTED]
> > > >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>> -------------------------------------------------------
> > > >>> This SF.Net email sponsored by Black Hat Briefings & Training.
> > > >>> Attend Black Hat Briefings & Training, Las Vegas July 24-29
> > > >> - digital
> > > >>> self defense, top technical experts, no vendor pitches, unmatched
> > > >>> networking opportunities. Visit www.blackhat.com
> > > >>> _______________________________________________
> > > >>> Oscar-users mailing list
> > > >>> [EMAIL PROTECTED]
> > > >>> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> > >
> > >
> > >
> >
> >
> >-------------------------------------------------------
> >This SF.Net email sponsored by Black Hat Briefings & Training.
> >Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
> >digital self defense, top technical experts, no vendor pitches,
> >unmatched networking opportunities. Visit www.blackhat.com
> >_______________________________________________
> >Oscar-users mailing list
> >[EMAIL PROTECTED]
> >https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>



-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users



-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to