Okay. From the maui docs, it looks like the limit is, as you say, on the order of several thousand. I guess my idea of a 'large number of jobs' is fairly out of scale. At most I would have a couple hundred jobs.
The queue is stalling after 5-6 jobs regardless of whether I am queing 20 jobs, or around 100. This is no where as near any job number limits from the docs I read. Is is possible that I am seeing a LAM/MPI problem propigated into maui? Is one of these programs expecting some exit code, return value, or something? Like I said before, its bothering me because all I have to do to run any given job is stop the stalled version and qsub it again. If it was a short time I would think it was a powersave mode, but it stays going for upwards of six hours of the server being 'idle'. I guess I could switch to Torque, but it seemed to me that openPBS should work fine for the scale of problem and hardware I am using. Original Message ----------------------- Hi Michael: I used to use openPBS/MAUI before and ran into this issue with job numbers - I believe it has a soft/hard limit on the number of jobs that could exist in the queues (you should be able to find out from the maui mailing-list, I remember it was around 4096 or something...) - so as long as you do not exceed that, it is okay. However, we were churning out more jobs than it could handle and therefore opted to run SGE instead. You may have better luck with Torque... Not sure if they have 'addressed' this issue but you can take a look at their docs. Cheers, Bernard > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Michael Edwards > Sent: Tuesday, April 20, 2004 13:24 > To: [EMAIL PROTECTED] > Subject: [Oscar-users] pbs has really short queues? > > I am trying to run a large number of lam-mpi jobs using the > pbs scheduler (well, I guess its actually the maui scheduler, > but whatever -- using pbs). If I run a small number of jobs > it works fine, but when I dump more than about 6 jobs on the > queue at a time, it does the first 5 or 6, then stops. All > other jobs sit in the queue until I take them out. > > I cant imagine its a problem with the scripts, since the > permaqueued jobs run fine if I run them individually. > > Is this a property of the default queue or does PBS just not > work well with massive job lists. If so, is there something > that does because I could easily dump over one hundred jobs > at a time if it would work. > > If this is not the correct forum for this question, I will be > happy to ask it somewhere else. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials Free > Linux tutorial presented by Daniel Robbins, President and CEO > of GenToo technologies. Learn everything from fundamentals to > system administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=ick > _______________________________________________ > Oscar-users mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
