We recently migrated from PBS to torque, and most of our systems are now running 4.4 . The torque server (a Core2 Duo at 2.4GHz) is only handling about 3x the jobs our 300MHz Sun Ultra 5 could handle before bogging down horribly. This seems a bit odd.
Watching the server logs, it seems there's a lot of time spent waiting for replies on sockets, though it's not clear whether it's on the same system between the scheduler and batch server, or between the batch server and client node processes (pbs_moms). We're beginning to wonder of it's OS-related. Torque uses a lot of sockets, and sets them up and tears them down at a hefty rate. We have the number set to 16K for the scheduler and server processes via ulimit, but we aren't getting much above 1400 between the two processes. Is anyone aware of an issue in 4.4 that might affect this? Thanks, Miles
