Hi,
I saw this issue in the archive and I just wanted to say that we see the same
thing:
04/24/2018 14:16:41|worker|itsrv9|E|debiting 34359738368.00 of job_memory
on host simsrv12.nordicsemi.no for 1 slots would exceed remaining capacity of
0.00
04/24/2018 14:16:41|worker|itsrv9|E|reso
On Tue, 17 Apr 2018, Joshua Baker-LePain wrote:
As an alternative to fixing our current setup, I'd be most interested to
hear if/how other folks are handling GPUs in their SoGE setups. I was
considering changing the slot count in gpu.q to match the number of GPUs
in a host (rather than CPU core
As an alternative to fixing our current setup, I'd be most interested to
hear if/how other folks are handling GPUs in their SoGE setups. I was
considering changing the slot count in gpu.q to match the number of GPUs
in a host (rather than CPU cores) and have users request slots rather than
the gp
n Fri, 13 Apr 2018 at 1:48am, William Hay wrote
This looks more like the scheduler and qmaster threads of the qmaster
disagreeing about the number of gpu left. This shouldn't persist but
bouncing the qmaster might get them to agree.
That is indeed exactly what it seems like is going on. How
On Fri, 13 Apr 2018 at 1:47am, Reuti wrote
`qstat -f` doesn't shoe any queue instances being disbaled/in alarm state?
No, the queues in question are definitely available to accept jobs. We do
have *some* queues in the cluster that are either 'a' or 'au', but when
this happens there are empt
On Thu, Apr 12, 2018 at 10:15:34AM -0700, Joshua Baker-LePain wrote:
> We're running SoGE 8.1.9 on a smallish (but growing) cluster. We've
> recently added GPU nodes to the cluster. On each GPU node, a consumable
> complex named 'gpu' is defined with the number of GPUs in the node. The
> complex
`qstat -f` doesn't shoe any queue instances being disbaled/in alarm state?
-- Reuti
> Am 12.04.2018 um 21:31 schrieb Joshua Baker-LePain :
>
> On Thu, 12 Apr 2018 at 10:15am, Joshua Baker-LePain wrote
>
>> We're running SoGE 8.1.9 on a smallish (but growing) cluster. We've
>> recently added
On Thu, 12 Apr 2018 at 10:15am, Joshua Baker-LePain wrote
We're running SoGE 8.1.9 on a smallish (but growing) cluster. We've recently
added GPU nodes to the cluster. On each GPU node, a consumable complex named
'gpu' is defined with the number of GPUs in the node. The complex definition
lo