Am 16.10.2012 um 16:58 schrieb Andrew Pearson: > Hi. I have a cluster running Rocks 5.4 that has been working perfectly well > for a long time. Now, suddenly, a problem has emerged. Jobs requesting more > than a few slots fail to run, remaining in qw indefinitely. > > When I do qstat -j <job #> the problem, I get the message " cannot run in PE > "orte_old" because it only offers 25 slots". However, there are 86 cores > available to the queue/PE I'm using, and a sufficient number of them are free > that my job should start immediately. The PE in question has slots set to > 9999 and uses $fill_up.
Do you request anything in addition like memory (or maybe a default is requested by the complex definition, sge_request file or a JSV)? -- Reuti > I've tried restarting the cluster and restarting the qmaster, and nothing > changes. I've also checked that all of the machines can communicate with > each other using qrsh hostname. > > This problem began shortly after a power outage -- is there anything that > doesn't get touched during a node reinstallation that could be corrupted that > would cause this? Thanks for any help you can give. > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
