Fair enough :). I just like being able to contribute to the discussion
and other people figuring out what's wrong, but I will send them into
bugzilla from now on.
-Paul Edmon-
On 10/25/2013 12:53 PM, David Bigagli wrote:
Hi Paul, this is David from SchedMD. I will investigate this issue. I
have opened a ticket in bugzilla as bug#486.
It would be best if you could open bugs in our bugzilla instead of
emailing to the list that is the preferred way for
our customers. The list is just a sink :-).
On Fri 25 Oct 2013 06:37:20 AM PDT, Paul Edmon wrote:
So we ran into a situation where our master was under high load due to
alot of jobs exiting and running all at once (basically high traffic).
A user tried to launch a srun interactive job during this period. It
actually scheduled and allocated resources. However, when it tried to
launch the connection it timed out and dropped the job. As might
guess this can be frustrating especially if you have been sitting in
the queue for a while.
Is there a way to prevent this behavior? We've already dialed up the
timeouts.
-Paul Edmon-
--
Thanks,
/David/Bigagli
www.schedmd.com
voice: +1 415 320 2776