I am surprised you are still having timeouts. I would expect the longest anyone waits is 2 seconds. What is your MessageTimeout set to?

Danny

On 10/17/13 10:21, Paul Edmon wrote:

True. I was just contemplating ways to make it more responsive. Multiple copies of the data would do that, I just wasn't sure whether keeping that in sync would be a head ache.

-Paul Edmon-

On 10/17/2013 1:01 PM, Moe Jette wrote:

Sending old data quickly seems very dangerous, especially if there are scripts submitting jobs and then running squeue to look for them.

Quoting Paul Edmon <[email protected]>:


Another way is to use the showq script that we have been working on:

https://github.com/fasrc/slurm_showq

That gives over all statistics as well. However sdiag is a great way to see if the system is running properly and get a good view of it.

I will note that these sorts of queries tend to hang when the ctld is busy. We were discussing it in our group meeting yesterday. It might be good to have the ctld set up a thread dedicated to servicing these requests in a timely manner. When our users see time out messages or slow response they freak out and send in help tickets. To me diagnostic checks like this should respond in a timely manner, even if the data that is contained there is a little out of date.

-Paul Edmon-

On 10/17/2013 11:57 AM, Moe Jette wrote:

There is no faster way to get job counts, but you might find the sdiag command helpful.

Quoting Damien François <[email protected]>:


Hello,

what is the most efficient way of finding how many jobs are currently running, pending, etc in the system ?

At the moment, I use squeue and wc -l but that sometimes gets slow.

Is there a command/flag I would have missed that would output that information quickly ?

Thanks

damien=



Reply via email to