>> The queue is also defined as being "qtype INTERACTIVE"?

Yes both interactive and batch.

>> And only a load of 7.75?

That was the current load.

>> Are there any consumable resource requests? I.e. is the memory perhaps fully 
>> used up by the already running jobs (being it h_vmem, virtual-free or any 
>> other consumable)?

Jobs are not submitted with any consumable requests. Though I have set 
virtual_free as a complex.

>> Did you upgrade all nodes?

I did upgrade all exec hosts.

Here are error messages from the master:

04/20/2017 14:28:07|schedu|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:08|worker|ibm068|E|host load value "virtual_free" exceeded: 
capacity is 20690952192.524288, job 5066074 requests additional 
21474836480.000000
04/20/2017 14:28:08|worker|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:08|worker|ibm068|W|Skipping remaining 32 orders
04/20/2017 14:28:08|schedu|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:09|worker|ibm068|E|host load value "virtual_free" exceeded: 
capacity is 20690952192.524288, job 5066074 requests additional 
21474836480.000000
04/20/2017 14:28:09|worker|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:09|worker|ibm068|W|Skipping remaining 32 orders
04/20/2017 14:28:09|schedu|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:10|worker|ibm068|E|host load value "virtual_free" exceeded: 
capacity is 20690952192.524288, job 5066074 requests additional 
21474836480.000000
04/20/2017 14:28:10|worker|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:10|worker|ibm068|W|Skipping remaining 32 orders
04/20/2017 14:28:10|schedu|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:11|worker|ibm068|E|host load value "virtual_free" exceeded: 
capacity is 20690952192.524288, job 5066074 requests additional 
21474836480.000000
04/20/2017 14:28:11|worker|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run
04/20/2017 14:28:11|worker|ibm068|W|Skipping remaining 33 orders
04/20/2017 14:28:11|schedu|ibm068|E|cannot start job 5066074.1, as resources 
have changed during a scheduling run



-----Original Message-----
From: Reuti [mailto:re...@staff.uni-marburg.de]
Sent: Wednesday, April 19, 2017 7:26
To: John_Tai
Cc: users@gridengine.org
Subject: Re: [gridengine users] Queue dropped because it is full, except it is 
not

Hi,

> Am 19.04.2017 um 09:00 schrieb John_Tai <john_...@smics.com>:
>
> I am trying to submit a job to a specific host in the queue:
>
> # qrsh -verbose -q gui.q@ibm056
> Your job 5049542 ("QRLOGIN") has been submitted waiting for
> interactive job to be scheduled ...
>
>
> However it is in waiting state:
>
> # qstat -u johnt
> job-ID  prior   name       user         state submit/start at     queue       
>                    slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
> 5049542 0.55500 QRLOGIN    johnt        qw    04/19/2017 14:51:19             
>                        1

The queue is also defined as being "qtype INTERACTIVE"?


> # qstat -j 5049542 |grep gui.q
> hard_queue_list:            gui.q@ibm056
>                             queue instance "gui.q@dsbm05" dropped
> because it is full
>
> Here is the current status of the queue:
>
> # qstat -f |grep gui.q
> gui.q@dsbm04                   BIP   0/5/45         8.87     lx24-amd64
> gui.q@dsbm05                   BIP   0/55/55        7.75     lx24-amd64

And only a load of 7.75?


> gui.q@ibm056                   BIP   0/11/30        3.15     lx24-amd64

Are there any consumable resource requests? I.e. is the memory perhaps fully 
used up by the already running jobs (being it h_vmem, virtual-free or any other 
consumable)?


> gui.q@ibm057                   BIP   0/11/30        1.34     lx24-amd64
> gui.q@ibm058                   BIP   0/11/45        3.47     lx24-amd64
>
>
> The same goes for ibm057 and ibm058. It seems that dsbm05 being full blocks 
> all following servers in the queue list. In fact I can submit to dsbm04, 
> which precedes dsbm05.
>
> I recently upgraded from sge6.1 to sge6.2u6, though I can’t be sure that’s 
> the only thing that’s changed. How do I even begin to debug this?

Did you upgrade all nodes?

-- Reuti
________________________________

This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to