Have you tried setting them in the config files themselves? On 9/18/07, Nilesh Mistry <[EMAIL PROTECTED]> wrote: > Hello > > I was checking the pbs server setting (qmgr -c "list server") and notice > the following value: > > resources_assigned.ncpus = 2 > resources_assigned.nodect = 24 > > Could these 2 values be my problem? I have been trying to changes these > as root but getting: > > qmgr obj=master svr=master: Cannot set attribute, read only or > insufficient permission resources_assigned.nodect* > > > * > > Thanks > > Nilesh Mistry > Academic Computing Services > [EMAIL PROTECTED] & TEL Campus > Seneca College Of Applies Arts & Technology > 70 The Pond Road > Toronto, Ontario > M3J 3M6 Canada > Phone 416 491 5050 ext 3788 > Fax 416 661 4695 > http://acs.senecac.on.ca > > > > Nilesh Mistry wrote: > > Hello Michael > > > > I have tried this as well with the same result. > > > > I have even set properties for the quad core nodes and specified them in > > the pbs script to only select those nodes, but still the same problem. > > > > I have created a new queue adding the properties of the quad core nodes > > to resources_max.nodes = 228:ppn=4:i965, however still the same result. > > > > Thanks > > > > Nilesh Mistry > > Academic Computing Services > > [EMAIL PROTECTED] & TEL Campus > > Seneca College Of Applies Arts & Technology > > 70 The Pond Road > > Toronto, Ontario > > M3J 3M6 Canada > > Phone 416 491 5050 ext 3788 > > Fax 416 661 4695 > > http://acs.senecac.on.ca > > > > > > > > Michael Edwards wrote: > > > >> set queue workq resources_max.ncpus = 200 > >> set queue workq resources_max.nodect = 64 > >> set queue workq resources_max.nodes = 200:ppn=4 > >> > >> This should probably be 50*4 + 14*2 = 228 > >> > >> set queue workq resources_max.ncpus = 228 > >> set queue workq resources_max.nodect = 64 > >> set queue workq resources_max.nodes = 228:ppn=4 > >> > >> Though you might want to try making two queues, I don't know how well > >> torque deals with having different numbers of ppn on different nodes. > >> > >> On 9/18/07, Nilesh Mistry <[EMAIL PROTECTED]> wrote: > >> > >> > >>> I the following error after the using qsub > >>> > >>> qsub: Job exceeds queue resource limits > >>> > >>> If I change #PBS -l nodes=64 to #PBS -l nodes=60 i get the job submitted > >>> and running and the if fails > >>> > >>> ################ qstat -f ############################ > >>> > >>> Job Id: 924.master.atar.senecac.on.ca > >>> Job_Name = scaling_test > >>> Job_Owner = [EMAIL PROTECTED] > >>> job_state = R > >>> queue = parallel > >>> server = master.atar.senecac.on.ca > >>> Checkpoint = u > >>> ctime = Tue Sep 18 09:09:45 2007 > >>> Error_Path = > >>> master:/home/faculty/nilesh.mistry/pbs/multi/scaling_test/scal > >>> ing_test.err > >>> exec_host = > >>> atarnode59.atar.senecac.on.ca/1+atarnode59.atar.senecac.on.ca/0 > >>> > >>> +atarnode57.atar.senecac.on.ca/1+atarnode57.atar.senecac.on.ca/0+atarno > >>> > >>> de56.atar.senecac.on.ca/1+atarnode56.atar.senecac.on.ca/0+atarnode55.at > >>> > >>> ar.senecac.on.ca/1+atarnode55.atar.senecac.on.ca/0+atarnode54.atar.sene > >>> > >>> cac.on.ca/1+atarnode54.atar.senecac.on.ca/0+atarnode53.atar.senecac.on. > >>> > >>> ca/1+atarnode53.atar.senecac.on.ca/0+atarnode52.atar.senecac.on.ca/1+at > >>> > >>> arnode52.atar.senecac.on.ca/0+atarnode51.atar.senecac.on.ca/1+atarnode5 > >>> > >>> 1.atar.senecac.on.ca/0+atarnode50.atar.senecac.on.ca/2+atarnode50.atar. > >>> > >>> senecac.on.ca/1+atarnode50.atar.senecac.on.ca/0+atarnode49.atar.senecac > >>> > >>> .on.ca/2+atarnode49.atar.senecac.on.ca/1+atarnode49.atar.senecac.on.ca/ > >>> > >>> 0+atarnode48.atar.senecac.on.ca/2+atarnode48.atar.senecac.on.ca/1+atarn > >>> > >>> ode48.atar.senecac.on.ca/0+atarnode47.atar.senecac.on.ca/2+atarnode47.a > >>> > >>> tar.senecac.on.ca/1+atarnode47.atar.senecac.on.ca/0+atarnode45.atar.sen > >>> > >>> ecac.on.ca/2+atarnode45.atar.senecac.on.ca/1+atarnode45.atar.senecac.on > >>> > >>> .ca/0+atarnode44.atar.senecac.on.ca/2+atarnode44.atar.senecac.on.ca/1+a > >>> > >>> tarnode44.atar.senecac.on.ca/0+atarnode42.atar.senecac.on.ca/2+atarnode > >>> > >>> 42.atar.senecac.on.ca/1+atarnode42.atar.senecac.on.ca/0+atarnode41.atar > >>> > >>> .senecac.on.ca/2+atarnode41.atar.senecac.on.ca/1+atarnode41.atar.seneca > >>> > >>> c.on.ca/0+atarnode40.atar.senecac.on.ca/2+atarnode40.atar.senecac.on.ca > >>> > >>> /1+atarnode40.atar.senecac.on.ca/0+atarnode39.atar.senecac.on.ca/2+atar > >>> > >>> node39.atar.senecac.on.ca/1+atarnode39.atar.senecac.on.ca/0+atarnode38. > >>> > >>> atar.senecac.on.ca/2+atarnode38.atar.senecac.on.ca/1+atarnode38.atar.se > >>> > >>> necac.on.ca/0+atarnode37.atar.senecac.on.ca/2+atarnode37.atar.senecac.o > >>> > >>> n.ca/1+atarnode37.atar.senecac.on.ca/0+atarnode36.atar.senecac.on.ca/2+ > >>> > >>> atarnode36.atar.senecac.on.ca/1+atarnode36.atar.senecac.on.ca/0+atarnod > >>> > >>> e35.atar.senecac.on.ca/2+atarnode35.atar.senecac.on.ca/1+atarnode35.ata > >>> > >>> r.senecac.on.ca/0+atarnode34.atar.senecac.on.ca/1+atarnode34.atar.senec > >>> ac.on.ca/0 > >>> Hold_Types = n > >>> Join_Path = oe > >>> Keep_Files = n > >>> Mail_Points = abe > >>> Mail_Users = nilesh.mistry > >>> mtime = Tue Sep 18 09:09:46 2007 > >>> Output_Path = > >>> master:/home/faculty/nilesh.mistry/pbs/multi/scaling_test/sca > >>> ling_test.log > >>> Priority = 0 > >>> qtime = Tue Sep 18 09:09:45 2007 > >>> Rerunable = True > >>> Resource_List.cput = 10000:00:00 > >>> Resource_List.mem = 64000mb > >>> Resource_List.ncpus = 1 > >>> Resource_List.nodect = 60 > >>> Resource_List.nodes = 60 > >>> Resource_List.walltime = 10000:00:00 > >>> Variable_List = PBS_O_HOME=/home/faculty/nilesh.mistry, > >>> PBS_O_LANG=en_CA.UTF-8,PBS_O_LOGNAME=nilesh.mistry, > >>> > >>> PBS_O_PATH=/usr/kerberos/bin:/opt/lam-7.1.2/bin:/usr/local/bin:/bin:/u > >>> > >>> sr/bin:/usr/X11R6/bin:/opt/env-switcher/bin:/opt/pvm3/lib:/opt/pvm3/lib > >>> > >>> /LINUX:/opt/pvm3/bin/LINUX:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/c3-4 > >>> > >>> /:/home/faculty/nilesh.mistry/bin:/opt/maui/bin:/usr/lib/news/bin:/home > >>> /faculty/nilesh.mistry/scripts, > >>> PBS_O_MAIL=/var/spool/mail/nilesh.mistry,PBS_O_SHELL=/bin/bash, > >>> PBS_O_HOST=master.atar.senecac.on.ca, > >>> PBS_O_WORKDIR=/home/faculty/nilesh.mistry/pbs/multi/scaling_test, > >>> PBS_O_QUEUE=parallel > >>> etime = Tue Sep 18 09:09:45 2007 > >>> > >>> ###################### Log file ############################## > >>> > >>> ------------------------------------------------------ > >>> This job is allocated on 60 cpu(s) > >>> Job is running on node(s): > >>> atarnode59.atar.senecac.on.ca > >>> atarnode59.atar.senecac.on.ca > >>> atarnode57.atar.senecac.on.ca > >>> atarnode57.atar.senecac.on.ca > >>> atarnode56.atar.senecac.on.ca > >>> atarnode56.atar.senecac.on.ca > >>> atarnode55.atar.senecac.on.ca > >>> atarnode55.atar.senecac.on.ca > >>> atarnode54.atar.senecac.on.ca > >>> atarnode54.atar.senecac.on.ca > >>> atarnode53.atar.senecac.on.ca > >>> atarnode53.atar.senecac.on.ca > >>> atarnode52.atar.senecac.on.ca > >>> atarnode52.atar.senecac.on.ca > >>> atarnode51.atar.senecac.on.ca > >>> atarnode51.atar.senecac.on.ca > >>> atarnode50.atar.senecac.on.ca > >>> atarnode50.atar.senecac.on.ca > >>> atarnode50.atar.senecac.on.ca > >>> atarnode49.atar.senecac.on.ca > >>> atarnode49.atar.senecac.on.ca > >>> atarnode49.atar.senecac.on.ca > >>> atarnode48.atar.senecac.on.ca > >>> atarnode48.atar.senecac.on.ca > >>> atarnode48.atar.senecac.on.ca > >>> atarnode47.atar.senecac.on.ca > >>> atarnode47.atar.senecac.on.ca > >>> atarnode47.atar.senecac.on.ca > >>> atarnode45.atar.senecac.on.ca > >>> atarnode45.atar.senecac.on.ca > >>> atarnode45.atar.senecac.on.ca > >>> atarnode44.atar.senecac.on.ca > >>> atarnode44.atar.senecac.on.ca > >>> atarnode44.atar.senecac.on.ca > >>> atarnode42.atar.senecac.on.ca > >>> atarnode42.atar.senecac.on.ca > >>> atarnode42.atar.senecac.on.ca > >>> atarnode41.atar.senecac.on.ca > >>> atarnode41.atar.senecac.on.ca > >>> atarnode41.atar.senecac.on.ca > >>> atarnode40.atar.senecac.on.ca > >>> atarnode40.atar.senecac.on.ca > >>> atarnode40.atar.senecac.on.ca > >>> atarnode39.atar.senecac.on.ca > >>> atarnode39.atar.senecac.on.ca > >>> atarnode39.atar.senecac.on.ca > >>> atarnode38.atar.senecac.on.ca > >>> atarnode38.atar.senecac.on.ca > >>> atarnode38.atar.senecac.on.ca > >>> atarnode37.atar.senecac.on.ca > >>> atarnode37.atar.senecac.on.ca > >>> atarnode37.atar.senecac.on.ca > >>> atarnode36.atar.senecac.on.ca > >>> atarnode36.atar.senecac.on.ca > >>> atarnode36.atar.senecac.on.ca > >>> atarnode35.atar.senecac.on.ca > >>> atarnode35.atar.senecac.on.ca > >>> atarnode35.atar.senecac.on.ca > >>> atarnode34.atar.senecac.on.ca > >>> atarnode34.atar.senecac.on.ca > >>> PBS: qsub is running on master.atar.senecac.on.ca > >>> PBS: originating queue is parallel > >>> PBS: executing queue is parallel > >>> PBS: working directory is > >>> /home/faculty/nilesh.mistry/pbs/multi/scaling_test > >>> PBS: execution mode is PBS_BATCH > >>> PBS: job identifier is 924.master.atar.senecac.on.ca > >>> PBS: job name is scaling_test > >>> PBS: node file is /var/spool/pbs/aux//924.master.atar.senecac.on.ca > >>> PBS: current home directory is /home/faculty/nilesh.mistry > >>> PBS: PATH = > >>> /usr/kerberos/bin:/opt/lam-7.1.2/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/env-switcher/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bi > >>> n/LINUX:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/c3-4/:/home/faculty/nilesh.mistry/bin:/opt/maui/bin:/usr/lib/news/bin:/home/faculty/nilesh.mistry/scripts > >>> ------------------------------------------------------ > >>> Mesh 1 of 60 is alive on atarnode59.atar.senecac.on.ca > >>> Mesh 17 of 60 is alive on atarnode50.atar.senecac.on.ca > >>> Mesh 18 of 60 is alive on atarnode50.atar.senecac.on.ca > >>> Mesh 3 of 60 is alive on atarnode57.atar.senecac.on.ca > >>> Mesh 50 of 60 is alive on atarnode37.atar.senecac.on.ca > >>> Mesh 51 of 60 is alive on atarnode37.atar.senecac.on.ca > >>> Mesh 58 of 60 is alive on atarnode35.atar.senecac.on.ca > >>> Mesh 15 of 60 is alive on atarnode51.atar.senecac.on.ca > >>> Mesh 56 of 60 is alive on atarnode35.atar.senecac.on.ca > >>> Mesh 47 of 60 is alive on atarnode38.atar.senecac.on.ca > >>> Mesh 41 of 60 is alive on atarnode40.atar.senecac.on.ca > >>> Mesh 43 of 60 is alive on atarnode40.atar.senecac.on.ca > >>> Mesh 23 of 60 is alive on atarnode48.atar.senecac.on.ca > >>> Mesh 59 of 60 is alive on atarnode34.atar.senecac.on.ca > >>> Mesh 44 of 60 is alive on atarnode39.atar.senecac.on.ca > >>> Mesh 60 of 60 is alive on atarnode34.atar.senecac.on.ca > >>> Mesh 26 of 60 is alive on atarnode47.atar.senecac.on.ca > >>> Mesh 46 of 60 is alive on atarnode39.atar.senecac.on.ca > >>> Mesh 42 of 60 is alive on atarnode40.atar.senecac.on.ca > >>> Mesh 32 of 60 is alive on atarnode44.atar.senecac.on.ca > >>> Mesh 20 of 60 is alive on atarnode49.atar.senecac.on.ca > >>> Mesh 35 of 60 is alive on atarnode42.atar.senecac.on.ca > >>> Mesh 53 of 60 is alive on atarnode36.atar.senecac.on.ca > >>> Mesh 22 of 60 is alive on atarnode49.atar.senecac.on.ca > >>> Mesh 19 of 60 is alive on atarnode50.atar.senecac.on.ca > >>> Mesh 48 of 60 is alive on atarnode38.atar.senecac.on.ca > >>> Mesh 37 of 60 is alive on atarnode42.atar.senecac.on.ca > >>> Mesh 54 of 60 is alive on atarnode36.atar.senecac.on.ca > >>> Mesh 55 of 60 is alive on atarnode36.atar.senecac.on.ca > >>> Mesh 45 of 60 is alive on atarnode39.atar.senecac.on.ca > >>> Mesh 29 of 60 is alive on atarnode45.atar.senecac.on.ca > >>> Mesh 24 of 60 is alive on atarnode48.atar.senecac.on.ca > >>> Mesh 30 of 60 is alive on atarnode45.atar.senecac.on.ca > >>> Mesh 31 of 60 is alive on atarnode45.atar.senecac.on.ca > >>> Mesh 52 of 60 is alive on atarnode37.atar.senecac.on.ca > >>> Mesh 28 of 60 is alive on atarnode47.atar.senecac.on.ca > >>> Mesh 36 of 60 is alive on atarnode42.atar.senecac.on.ca > >>> Mesh 34 of 60 is alive on atarnode44.atar.senecac.on.ca > >>> Mesh 38 of 60 is alive on atarnode41.atar.senecac.on.ca > >>> Mesh 40 of 60 is alive on atarnode41.atar.senecac.on.ca > >>> Mesh 5 of 60 is alive on atarnode56.atar.senecac.on.ca > >>> Mesh 57 of 60 is alive on atarnode35.atar.senecac.on.ca > >>> Mesh 13 of 60 is alive on atarnode52.atar.senecac.on.ca > >>> Mesh 9 of 60 is alive on atarnode54.atar.senecac.on.ca > >>> Mesh 39 of 60 is alive on atarnode41.atar.senecac.on.ca > >>> Mesh 7 of 60 is alive on atarnode55.atar.senecac.on.ca > >>> Mesh 10 of 60 is alive on atarnode54.atar.senecac.on.ca > >>> Mesh 8 of 60 is alive on atarnode55.atar.senecac.on.ca > >>> Mesh 4 of 60 is alive on atarnode57.atar.senecac.on.ca > >>> Mesh 6 of 60 is alive on atarnode56.atar.senecac.on.ca > >>> Mesh 11 of 60 is alive on atarnode53.atar.senecac.on.ca > >>> Mesh 14 of 60 is alive on atarnode52.atar.senecac.on.ca > >>> Mesh 12 of 60 is alive on atarnode53.atar.senecac.on.ca > >>> Mesh 21 of 60 is alive on atarnode49.atar.senecac.on.ca > >>> Mesh 16 of 60 is alive on atarnode51.atar.senecac.on.ca > >>> Mesh 33 of 60 is alive on atarnode44.atar.senecac.on.ca > >>> Mesh 49 of 60 is alive on atarnode38.atar.senecac.on.ca > >>> Mesh 25 of 60 is alive on atarnode48.atar.senecac.on.ca > >>> Mesh 27 of 60 is alive on atarnode47.atar.senecac.on.ca > >>> Mesh 2 of 60 is alive on atarnode59.atar.senecac.on.ca > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> ERROR: Number of meshes not equal to number of threads > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> ERROR: Number of meshes not equal to number of threads > >>> ERROR: Number of meshes not equal to number of threads > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> ERROR: Number of meshes not equal to number of threads > >>> > >>> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University > >>> > >>> > >>> Thanks > >>> > >>> Nilesh Mistry > >>> Academic Computing Services > >>> [EMAIL PROTECTED] & TEL Campus > >>> Seneca College Of Applies Arts & Technology > >>> 70 The Pond Road > >>> Toronto, Ontario > >>> M3J 3M6 Canada > >>> Phone 416 491 5050 ext 3788 > >>> Fax 416 661 4695 > >>> http://acs.senecac.on.ca > >>> > >>> > >>> > >>> Michael Edwards wrote: > >>> > >>> > >>>> What do you get when you do "qstat -f" on the job? How many nodes is > >>>> it actually getting? > >>>> > >>>> On 9/18/07, Nilesh Mistry <[EMAIL PROTECTED]> wrote: > >>>> > >>>> > >>>> > >>>>> Micheal > >>>>> > >>>>> We have actually moved to a larger cluster of 64 nodes (50 quad core and > >>>>> 14 dual opterons), there fore 220 processors available. We are > >>>>> submitting a job that requires 64 threads but still with the same > >>>>> result. Here are the files you requested. I have already submitted to > >>>>> torque user list. > >>>>> > >>>>> ####### PBS SCRIPT START####### > >>>>> > >>>>> #!/bin/sh -f > >>>>> #PBS -l nodes=64 > >>>>> #PBS -N scaling_test > >>>>> #PBS -e scaling_test.err > >>>>> #PBS -o scaling_test.log > >>>>> #PBS -j oe > >>>>> #PBS -l mem=64000mb > >>>>> #PBS -m abe > >>>>> #PBS -q parallel > >>>>> > >>>>> NCPU=`wc -l < $PBS_NODEFILE` > >>>>> echo ------------------------------------------------------ > >>>>> echo ' This job is allocated on '${NCPU}' cpu(s)' > >>>>> echo 'Job is running on node(s): ' > >>>>> cat $PBS_NODEFILE > >>>>> echo PBS: qsub is running on $PBS_O_HOST > >>>>> echo PBS: originating queue is $PBS_O_QUEUE > >>>>> echo PBS: executing queue is $PBS_QUEUE > >>>>> echo PBS: working directory is $PBS_O_WORKDIR > >>>>> echo PBS: execution mode is $PBS_ENVIRONMENT > >>>>> echo PBS: job identifier is $PBS_JOBID > >>>>> echo PBS: job name is $PBS_JOBNAME > >>>>> echo PBS: node file is $PBS_NODEFILE > >>>>> echo PBS: current home directory is $PBS_O_HOME > >>>>> echo PBS: PATH = $PBS_O_PATH > >>>>> echo ------------------------------------------------------ > >>>>> SERVER=$PBS_O_HOST > >>>>> WORKDIR=$HOME/pbs/multi/scaling_test > >>>>> cd ${WORKDIR} > >>>>> cat $PBS_NODEFILE > nodes.list > >>>>> lamboot -s -H $PBS_NODEFILE > >>>>> mpirun -np $NCPU /opt/fds/fds5_mpi scaling_test.fds > >>>>> lamhalt > >>>>> > >>>>> ####### PBS SCRIPT END ####### > >>>>> > >>>>> ####### MAUI.CFG START ####### > >>>>> # maui.cfg 3.2.6p14 > >>>>> > >>>>> SERVERHOST master.atar.senecac.on.ca > >>>>> # primary admin must be first in list > >>>>> ADMIN1 root > >>>>> ADMIN3 nilesh.mistry > >>>>> > >>>>> > >>>>> # Resource Manager Definition > >>>>> > >>>>> RMCFG[master.atar.senecac.on.ca] TYPE=PBS > >>>>> > >>>>> # Allocation Manager Definition > >>>>> > >>>>> AMCFG[bank] TYPE=NONE > >>>>> > >>>>> # full parameter docs at > >>>>> http://clusterresources.com/mauidocs/a.fparameters.html > >>>>> # use the 'schedctl -l' command to display current configuration > >>>>> > >>>>> RMPOLLINTERVAL 00:01:00 > >>>>> > >>>>> SERVERPORT 42559 > >>>>> SERVERMODE NORMAL > >>>>> > >>>>> # Admin: http://clusterresources.com/mauidocs/a.esecurity.html > >>>>> > >>>>> > >>>>> LOGFILE maui.log > >>>>> LOGFILEMAXSIZE 10000000 > >>>>> LOGLEVEL 4 > >>>>> LOGFACILITY fALL > >>>>> > >>>>> # Job Priority: > >>>>> http://clusterresources.com/mauidocs/5.1jobprioritization.html > >>>>> > >>>>> QUEUETIMEWEIGHT 1 > >>>>> > >>>>> # FairShare: http://clusterresources.com/mauidocs/6.3fairshare.html > >>>>> > >>>>> #FSPOLICY PSDEDICATED > >>>>> #FSDEPTH 7 > >>>>> #FSINTERVAL 86400 > >>>>> #FSDECAY 0.80 > >>>>> > >>>>> # Throttling Policies: > >>>>> http://clusterresources.com/mauidocs/6.2throttlingpolicies.html > >>>>> > >>>>> # NONE SPECIFIED > >>>>> > >>>>> # Backfill: http://clusterresources.com/mauidocs/8.2backfill.html > >>>>> > >>>>> BACKFILLPOLICY ON > >>>>> RESERVATIONPOLICY CURRENTHIGHEST > >>>>> > >>>>> # the following are modified/added by Mehrdad 13 Sept 07 > >>>>> #NODEACCESSPOLICY DEDICATED > >>>>> NODEACCESSPOLICY SHARED > >>>>> JOBNODEMATCHPOLICY EXACTPROC > >>>>> > >>>>> # Node Allocation: > >>>>> http://clusterresources.com/mauidocs/5.2nodeallocation.html > >>>>> > >>>>> NODEALLOCATIONPOLICY MINRESOURCE > >>>>> > >>>>> # QOS: http://clusterresources.com/mauidocs/7.3qos.html > >>>>> > >>>>> # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB > >>>>> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE > >>>>> > >>>>> # Standing Reservations: > >>>>> http://clusterresources.com/mauidocs/7.1.3standingreservations.html > >>>>> > >>>>> # SRSTARTTIME[test] 8:00:00 > >>>>> # SRENDTIME[test] 17:00:00 > >>>>> # SRDAYS[test] MON TUE WED THU FRI > >>>>> # SRTASKCOUNT[test] 20 > >>>>> # SRMAXTIME[test] 0:30:00 > >>>>> > >>>>> # Creds: http://clusterresources.com/mauidocs/6.1fairnessoverview.html > >>>>> > >>>>> # USERCFG[DEFAULT] FSTARGET=25.0 > >>>>> # USERCFG[john] PRIORITY=100 FSTARGET=10.0- > >>>>> # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi > >>>>> # CLASSCFG[batch] FLAGS=PREEMPTEE > >>>>> # CLASSCFG[interactive] FLAGS=PREEMPTOR > >>>>> USERCFG[DEFAULT] MAXJOB=4 > >>>>> ####### MAUI.CFG END ####### > >>>>> > >>>>> ####### QMGR -c "PRINT SERVER MASTER" ######## > >>>>> # > >>>>> # Create queues and set their attributes. > >>>>> # > >>>>> # > >>>>> # Create and define queue serial > >>>>> # > >>>>> create queue serial > >>>>> set queue serial queue_type = Execution > >>>>> set queue serial resources_max.cput = 1000:00:00 > >>>>> set queue serial resources_max.mem = 3000mb > >>>>> set queue serial resources_max.ncpus = 1 > >>>>> set queue serial resources_max.nodect = 1 > >>>>> set queue serial resources_max.nodes = 1:ppn=1 > >>>>> set queue serial resources_max.walltime = 1000:00:00 > >>>>> set queue serial resources_default.cput = 336:00:00 > >>>>> set queue serial resources_default.mem = 900mb > >>>>> set queue serial resources_default.ncpus = 1 > >>>>> set queue serial resources_default.nodect = 1 > >>>>> set queue serial resources_default.nodes = 1:ppn=1 > >>>>> set queue serial enabled = True > >>>>> set queue serial started = True > >>>>> # > >>>>> # Create and define queue workq > >>>>> # > >>>>> create queue workq > >>>>> set queue workq queue_type = Execution > >>>>> set queue workq resources_max.cput = 10000:00:00 > >>>>> set queue workq resources_max.ncpus = 200 > >>>>> set queue workq resources_max.nodect = 64 > >>>>> set queue workq resources_max.nodes = 200:ppn=4 > >>>>> set queue workq resources_max.walltime = 10000:00:00 > >>>>> set queue workq resources_min.cput = 00:00:01 > >>>>> set queue workq resources_min.ncpus = 1 > >>>>> set queue workq resources_min.nodect = 1 > >>>>> set queue workq resources_min.walltime = 00:00:01 > >>>>> set queue workq resources_default.cput = 10000:00:00 > >>>>> set queue workq resources_default.nodect = 1 > >>>>> set queue workq resources_default.walltime = 10000:00:00 > >>>>> set queue workq enabled = True > >>>>> set queue workq started = True > >>>>> # > >>>>> # Create and define queue parallel > >>>>> # > >>>>> create queue parallel > >>>>> set queue parallel queue_type = Execution > >>>>> set queue parallel resources_max.cput = 10000:00:00 > >>>>> set queue parallel resources_max.ncpus = 200 > >>>>> set queue parallel resources_max.nodect = 64 > >>>>> set queue parallel resources_max.nodes = 200:ppn=4 > >>>>> set queue parallel resources_max.walltime = 10000:00:00 > >>>>> set queue parallel resources_min.ncpus = 1 > >>>>> set queue parallel resources_min.nodect = 1 > >>>>> set queue parallel resources_default.ncpus = 1 > >>>>> set queue parallel resources_default.nodect = 1 > >>>>> set queue parallel resources_default.nodes = 1:ppn=1 > >>>>> set queue parallel resources_default.walltime = 10000:00:00 > >>>>> set queue parallel enabled = True > >>>>> set queue parallel started = True > >>>>> # > >>>>> # Set server attributes. > >>>>> # > >>>>> set server scheduling = True > >>>>> set server acl_host_enable = False > >>>>> set server acl_user_enable = False > >>>>> set server default_queue = serial > >>>>> set server log_events = 127 > >>>>> set server mail_from = adm > >>>>> set server query_other_jobs = True > >>>>> set server resources_available.ncpus = 200 > >>>>> set server resources_available.nodect = 64 > >>>>> set server resources_available.nodes = 200 > >>>>> set server resources_default.neednodes = 1 > >>>>> set server resources_default.nodect = 1 > >>>>> set server resources_default.nodes = 1 > >>>>> set server resources_max.ncpus = 200 > >>>>> set server resources_max.nodes = 200 > >>>>> set server scheduler_iteration = 60 > >>>>> set server node_check_rate = 150 > >>>>> set server tcp_timeout = 6 > >>>>> set server default_node = 1 > >>>>> set server pbs_version = 2.0.0p8 > >>>>> > >>>>> > >>>>> > >>>>> Thanks > >>>>> > >>>>> Nilesh Mistry > >>>>> Academic Computing Services > >>>>> [EMAIL PROTECTED] & TEL Campus > >>>>> Seneca College Of Applies Arts & Technology > >>>>> 70 The Pond Road > >>>>> Toronto, Ontario > >>>>> M3J 3M6 Canada > >>>>> Phone 416 491 5050 ext 3788 > >>>>> Fax 416 661 4695 > >>>>> http://acs.senecac.on.ca > >>>>> > >>>>> > >>>>> > >>>>> Michael Edwards wrote: > >>>>> > >>>>> > >>>>> > >>>>>> We'd need your script and the qsub command you used, possibly more > >>>>>> configuration information from maui and torque, to be much help. > >>>>>> > >>>>>> I don't know that we have anyone who is deep with maui or torque right > >>>>>> now, you might also want to ask on the maui or torque lists. > >>>>>> > >>>>>> >From the other posts you have made this error seems to be one of those > >>>>>> general "Something is Broken" messages that could have many causes. > >>>>>> > >>>>>> On 9/17/07, Nilesh Mistry <[EMAIL PROTECTED]> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hello > >>>>>>> > >>>>>>> I am having problems submitting job that requires 23 threads. I keep > >>>>>>> getting the following error: > >>>>>>> > >>>>>>> ERROR: Number of meshes not equal to number of thread > >>>>>>> > >>>>>>> Hardware: > >>>>>>> 10 quad core nodes (therefore 40 processors available) > >>>>>>> > >>>>>>> What do I need to insure in my job queue (qmgr) , maui (maui.cfg) and > >>>>>>> my submit script when using qsub? > >>>>>>> > >>>>>>> Any and all help is greatly appreciated. > >>>>>>> > >>>>>>> -- > >>>>>>> Thanks > >>>>>>> > >>>>>>> Nilesh Mistry > >>>>>>> Academic Computing Services > >>>>>>> [EMAIL PROTECTED] & TEL Campus > >>>>>>> Seneca College Of Applies Arts & Technology > >>>>>>> 70 The Pond Road > >>>>>>> Toronto, Ontario > >>>>>>> M3J 3M6 Canada > >>>>>>> Phone 416 491 5050 ext 3788 > >>>>>>> Fax 416 661 4695 > >>>>>>> http://acs.senecac.on.ca > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> ------------------------------------------------------------------------- > >>>>>>> This SF.net email is sponsored by: Microsoft > >>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>>>> _______________________________________________ > >>>>>>> Oscar-users mailing list > >>>>>>> Oscar-users@lists.sourceforge.net > >>>>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> ------------------------------------------------------------------------- > >>>>>> This SF.net email is sponsored by: Microsoft > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>>> _______________________________________________ > >>>>>> Oscar-users mailing list > >>>>>> Oscar-users@lists.sourceforge.net > >>>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> This SF.net email is sponsored by: Microsoft > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>> _______________________________________________ > >>>>> Oscar-users mailing list > >>>>> Oscar-users@lists.sourceforge.net > >>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users > >>>>> > >>>>> > >>>>> > >>>>> > >>>> ------------------------------------------------------------------------- > >>>> This SF.net email is sponsored by: Microsoft > >>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>> _______________________________________________ > >>>> Oscar-users mailing list > >>>> Oscar-users@lists.sourceforge.net > >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users > >>>> > >>>> > >>>> > >>> ------------------------------------------------------------------------- > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Oscar-users mailing list > >>> Oscar-users@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/oscar-users > >>> > >>> > >>> > >> ------------------------------------------------------------------------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2005. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Oscar-users mailing list > >> Oscar-users@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/oscar-users > >> > >> > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users >
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users