That runs me dry on good ideas...

You might want to take this problem to the Torque users mailing list.
There are folks from cluster resources who read that list who do not
read this one, who are more familiar with debugging torque problems.

Once we figure out what the problem is, then we can figure out if we
need to do something to OSCAR to prevent it from happening to other
people.  It may be as simple as downloading and installing the latest
version of Torque.  OSCAR tends to lag behind because of our release
cycle.

On 23 Jan 2007 10:06:17 -0500, Jinsong Ouyang <[EMAIL PROTECTED]> wrote:
>
> SMP works fine. "uname -a" gives:
>
> Linux photon.bwh.harvard.edu 2.6.18-1.2239.fc5 #1 SMP Fri Nov 10 12:51:06
> EST 2006 x86_64 x86_64 x86_64 GNU/Linux
>
> Also, /var/spool/pbs/server_priv/nodes has all the cores, 8 on the first
> node and 4 on the second node. That is correct.
>
> The problem is from somewhere else.
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Michael
> Edwards
> Sent: Tuesday, January 23, 2007 9:56 AM
> To: oscar-users@lists.sourceforge.net
> Subject: Re: [Oscar-users] Number of running jobs on a queue
>
> What is the kernel installed (uname -a)?  It sounds like your system
> is not set up for SMP for some reason...
>
> On 1/22/07, Robin Humble <[EMAIL PROTECTED]> wrote:
> > On Mon, Jan 22, 2007 at 08:10:01PM -0600, Michael Edwards wrote:
> > >I suspect that the scripts which OSCAR runs on detect the cpu count
> > >but do not detect multiple cores as separate "processors".  If this is
> > >the case, this is a fairly major issue and needs to be addressed soon.
> > > Multi-core processors are becoming more and more common.  I don't
> > >have any hardware to test this on myself yet, but I should in a couple
> > >months.
> >
> > oscar5 x86_64 centos detected our dual dual-core's just fine.
> > I think it set this up in the 'complete install' thingy.
> >
> > also you can always just edit the right amount of cores into
> > /var/spool/pbs/server_priv/nodes and then restart pbs_server
> >
> > cheers,
> > robin
> >
> > >
> > >On the plus side, it ought to be a fairly easy fix...
> > >
> > >On 22 Jan 2007 20:37:59 -0500, [EMAIL PROTECTED]
> > ><[EMAIL PROTECTED]> wrote:
> > >> Thanks for the message!
> > >>
> > >> Actually, I only have two client nodes. One have 4 dual-core AMD CPUs
> > >> (total 8 logic CPUs). The other host has 2 dual-core CPUs (total 4
> logic
> > >> CPUs).
> > >>
> > >> I use "qsub ./script.sh" to submit jobs. All the jobs use the same
> script.
> > >>
> > >> If I submit 12 jobs, only 6 jobs runs (4 on the first node and 2 on the
> > >> other node) at the same time.
> > >>
> > >> Here is what I get if I do "print server" from qmgr. Do I need to
> change
> > >> anything?
> > >>
> > >> ======================================================
> > >>
> > >> #
> > >> # Create queues and set their attributes.
> > >> #
> > >> #
> > >> # Create and define queue workq
> > >> #
> > >> create queue workq
> > >> set queue workq queue_type = Execution
> > >> set queue workq resources_max.cput = 10000:00:00
> > >> set queue workq resources_max.ncpus = 12
> > >> set queue workq resources_max.nodect = 2
> > >> set queue workq resources_max.walltime = 10000:00:00
> > >> set queue workq resources_min.cput = 00:00:01
> > >> set queue workq resources_min.ncpus = 1
> > >> set queue workq resources_min.nodect = 1
> > >> set queue workq resources_min.walltime = 00:00:01
> > >> set queue workq resources_default.cput = 10000:00:00
> > >> set queue workq resources_default.ncpus = 1
> > >> set queue workq resources_default.nodect = 1
> > >> set queue workq resources_default.walltime = 10000:00:00
> > >> set queue workq resources_available.nodect = 2
> > >> set queue workq enabled = True
> > >> set queue workq started = True
> > >> #
> > >> # Set server attributes.
> > >> #
> > >> set server scheduling = True
> > >> set server default_queue = workq
> > >> set server log_events = 64
> > >> set server mail_from = adm
> > >> set server query_other_jobs = True
> > >> set server resources_available.ncpus = 12
> > >> set server resources_available.nodect = 2
> > >> set server resources_available.nodes = 2
> > >> set server resources_max.ncpus = 12
> > >> set server resources_max.nodes = 2
> > >> set server scheduler_iteration = 60
> > >> set server node_check_rate = 150
> > >> set server tcp_timeout = 6
> > >> set server pbs_version = 2.0.0p8
> > >>
> > >> ===========================================================
> > >>
> > >>
> > >> Here is maui.cfg file
> > >>
> > >> ===========================================================
> > >>
> > >> # maui.cfg 3.2.6p14
> > >>
> > >> SERVERHOST              photon.bwh.harvard.edu
> > >> # primary admin must be first in list
> > >> ADMIN1                root
> > >>
> > >> # Resource Manager Definition
> > >>
> > >> RMCFG[DUAL.EFOCHT.DE] TYPE=PBS
> > >>
> > >> # Allocation Manager Definition
> > >>
> > >> AMCFG[bank]  TYPE=NONE
> > >>
> > >> # full parameter docs at
> > >> http://clusterresources.com/mauidocs/a.fparameters.html
> > >> # use the 'schedctl -l' command to display current configuration
> > >>
> > >> RMPOLLINTERVAL  00:00:10
> > >>
> > >> SERVERPORT            42559
> > >> SERVERMODE            NORMAL
> > >>
> > >> # Admin: http://clusterresources.com/mauidocs/a.esecurity.html
> > >>
> > >>
> > >> LOGFILE               maui.log
> > >> LOGFILEMAXSIZE        10000000
> > >> LOGLEVEL              3
> > >>
> > >> # Job Priority:
> > >> http://clusterresources.com/mauidocs/5.1jobprioritization.html
> > >>
> > >> QUEUETIMEWEIGHT       1
> > >>
> > >> # FairShare: http://clusterresources.com/mauidocs/6.3fairshare.html
> > >>
> > >> #FSPOLICY              PSDEDICATED
> > >> #FSDEPTH               7
> > >> #FSINTERVAL            86400
> > >> #FSDECAY               0.80
> > >>
> > >> # Throttling Policies:
> > >> http://clusterresources.com/mauidocs/6.2throttlingpolicies.html
> > >>
> > >> # NONE SPECIFIED
> > >>
> > >> # Backfill: http://clusterresources.com/mauidocs/8.2backfill.html
> > >>
> > >> BACKFILLPOLICY  ON
> > >> RESERVATIONPOLICY     CURRENTHIGHEST
> > >>
> > >> # Node Allocation:
> > >> http://clusterresources.com/mauidocs/5.2nodeallocation.html
> > >>
> > >> NODEALLOCATIONPOLICY  MINRESOURCE
> > >>
> > >> # QOS: http://clusterresources.com/mauidocs/7.3qos.html
> > >>
> > >> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> > >> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
> > >>
> > >> # Standing Reservations:
> > >> http://clusterresources.com/mauidocs/7.1.3standingreservations.html
> > >>
> > >> # SRSTARTTIME[test] 8:00:00
> > >> # SRENDTIME[test]   17:00:00
> > >> # SRDAYS[test]      MON TUE WED THU FRI
> > >> # SRTASKCOUNT[test] 20
> > >> # SRMAXTIME[test]   0:30:00
> > >>
> > >> # Creds: http://clusterresources.com/mauidocs/6.1fairnessoverview.html
> > >>
> > >> # USERCFG[DEFAULT]      FSTARGET=25.0
> > >> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
> > >> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
> > >> # CLASSCFG[batch]       FLAGS=PREEMPTEE
> > >> # CLASSCFG[interactive] FLAGS=PREEMPTOR
> > >>
> > >> NODEACCESSPOLICY
> > >>
> > >> =====================================================================
> > >>
> > >>
> > >>
> > >> > Check ganglia at http://localhost/ganglia and see where those 6 jobs
> > >> > are.  Make sure in particular they are not all sitting on one node or
> > >> > something silly.  If you have 6 nodes and they are one per node, then
> > >> > the queue is probably set up to reserve an entire node for each
> > >> > process.  There is a flag in the torque config file (I think) that
> > >> > tells it to do this.
> > >> >
> > >> > Could you post the script you are queueing with and the qsub command
> > >> > you use to submit the job?
> > >> >
> > >> > Are you running an smp kernel on your head node, I assume?  If you
> > >> > happened to be running in the non-smp version when you installed
> > >> > torque/maui probably don't know that there is more than one processor
> > >> > available...
> > >> >
> > >> > Hopefully this gives you some thoughts as to where to start
> looking...
> > >> >
> > >> > On 22 Jan 2007 13:13:02 -0500, Jinsong Ouyang
> <[EMAIL PROTECTED]>
> > >> > wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> I am using OSCAR 5.0 & Fedora 5.0 x86_64.  I have total 12 logic
> CPUs on
> > >> >> computing nodes. I use qsub to submit jobs and can only have maximum
> 6
> > >> >> jobs
> > >> >> running simultaneously. Half of the CPUs are not used. Could anyone
> > >> >> please
> > >> >> tell me how to increase the number of running jobs? I tried to set
> > >> >> max_running using qmgr. It does not seem to change anything. Do I
> need
> > >> >> to
> > >> >> change anything in maui.cfg?
> > >> >>
> > >> >>
> > >> >>
> > >> >> Many thanks,
> > >> >>
> > >> >>
> > >> >>
> > >> >> JO
> > >> >>
> -------------------------------------------------------------------------
> > >> >> Take Surveys. Earn Cash. Influence the Future of IT
> > >> >> Join SourceForge.net's Techsay panel and you'll get the chance to
> share
> > >> >> your
> > >> >> opinions on IT & business topics through brief surveys - and earn
> cash
> > >> >>
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > >> >>
> > >> >> _______________________________________________
> > >> >> Oscar-users mailing list
> > >> >> Oscar-users@lists.sourceforge.net
> > >> >> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > >> >>
> > >> >>
> > >> >>
> > >> >
> > >> >
> -------------------------------------------------------------------------
> > >> > Take Surveys. Earn Cash. Influence the Future of IT
> > >> > Join SourceForge.net's Techsay panel and you'll get the chance to
> share
> > >> > your
> > >> > opinions on IT & business topics through brief surveys - and earn
> cash
> > >> >
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > >> > _______________________________________________
> > >> > Oscar-users mailing list
> > >> > Oscar-users@lists.sourceforge.net
> > >> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > >> >
> > >>
> > >>
> > >>
> > >>
> -------------------------------------------------------------------------
> > >> Take Surveys. Earn Cash. Influence the Future of IT
> > >> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> > >> opinions on IT & business topics through brief surveys - and earn cash
> > >>
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > >> _______________________________________________
> > >> Oscar-users mailing list
> > >> Oscar-users@lists.sourceforge.net
> > >> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > >>
> > >
> > >-------------------------------------------------------------------------
> > >Take Surveys. Earn Cash. Influence the Future of IT
> > >Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> > >opinions on IT & business topics through brief surveys - and earn cash
> > >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > >_______________________________________________
> > >Oscar-users mailing list
> > >Oscar-users@lists.sourceforge.net
> > >https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> > opinions on IT & business topics through brief surveys - and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to