Because I am testing with qsub -w v, the jobs is not accepted for
scheduling, job id is not generated, and qstat -j will not work. The
output of qsub is as I showed in the original email:
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu001" because
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu002" because
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu003" because
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu004" because
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu005" because
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu006" because
job requests unknown resource (mem_free)
...
Ilya.
-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load
value of memory type: SGE reports it as unknown resource
From: Feng Zhang <[email protected]>
To: Ilya M <[email protected]>
Date: 1/23/15, 9:27 AM
Llya,
Can you please run:
qstat -j <jobid>
and past the output here? It may be useful for checking the problem
On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <[email protected]> wrote:
Removed the quota limits. To no avail: same problems.
-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load
value of memory type: SGE reports it as unknown resource
From: Reuti <[email protected]>
To: Ilya M <[email protected]>
Date: 1/23/15, 2:33 AM
Can you remove them temporarily? I saw cases where suddenly the "unknown
resource" popped up - and also suddenly vanished again, but it was somehow
connected to RQS was my conclusion.
-- Reuti
Am 23.01.2015 um 00:16 schrieb Ilya M <[email protected]>:
There are two RQS, one is disabled:
{
name limit_for_interns
description "limit to max 5 GPU jobs per intern."
enabled TRUE
limit users {int1,int2} hosts @gpu to slots=5
}
{
name limit_slots
description NONE
enabled FALSE
limit hosts {@gpu} to slots=2
}
-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load
value of memory type: SGE reports it as unknown resource
From: Reuti <[email protected]>
To: Ilya <[email protected]>
Date: 1/21/15, 16:12
Hi,
Am 22.01.2015 um 00:52 schrieb Ilya:
Something happened to the SGE (6.2u5) that had been running fine for
many months, and users can no longer put resource requests for load values
if they are of memory type, e.g.
qsub -l mem_free=5G -w v .... produces the following output:
cannot run in queue "gpu.q@gpu038" because job requests unknown
resource (mem_free)
The resource is available, though, when querying for it:
qhost -F mem_free -h gpu038
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - -
-
gpu038 lx24-amd64 16 2.11 126.1G 15.7G
4.0G 0.0
Host Resource(s): hl:mem_free=110.416G
This was first reported by a user when he tried to request custom "hl"
resource. However, it now appears that all "hl" resources of type "memory"
show this behavior. Integer "hl" are OK.
Do you have any RQS in place?
-- Reuti
I bounced qmaster between master and shadow-master a couple of times,
but it did not resolve the problem.
Additionally, when I added MONITOR=1 to scheduler's configuration, the
file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
::::::::
::::::::
::::::::
Any ideas?
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users