So it is requestable, but not consumable, and there is no default set in the complex. Well, the default is set to zero, but I don't think that is treated as a default.
Is that what was intended - requestable but not consumable? Ian On Fri, Jan 23, 2015 at 12:36 PM, Ilya M <[email protected]> wrote: > Natually, it does: > >> qconf -sc | grep mem_free > mem_free mf MEMORY <= YES NO > 0 0 > > And it is reported on all nodes: > >> qhost -F mem_free -h gpu001 > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > SWAPUS > ------------------------------------------------------------------------------- > global - - - - - - - > gpu001 lx24-amd64 16 3.32 126.1G 37.2G 4.0G > 0.0 > Host Resource(s): hl:mem_free=88.885G > > And everything was working until a week ago. > > Ilya. > > -------- Original Message -------- > Subject: Re: [gridengine users] Cannot request resource if it is a load > value of memory type: SGE reports it as unknown resource > From: Ian Kaufman <[email protected]> > To: Ilya M <[email protected]> > Date: 1/23/15, 11:38 AM >> >> Is mem_free defined in the host complex_values? What does >> >> qconf -sc | grep mem_free >> >> show? Is there a default value defined? >> >> Ian >> >> On Fri, Jan 23, 2015 at 11:30 AM, Ilya M <[email protected]> wrote: >>> >>> Because I am testing with qsub -w v, the jobs is not accepted for >>> scheduling, job id is not generated, and qstat -j will not work. The >>> output >>> of qsub is as I showed in the original email: >>> >>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu001" because >>> job >>> requests unknown resource (mem_free) >>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu002" because >>> job >>> requests unknown resource (mem_free) >>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu003" because >>> job >>> requests unknown resource (mem_free) >>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu004" because >>> job >>> requests unknown resource (mem_free) >>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu005" because >>> job >>> requests unknown resource (mem_free) >>> Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu006" because >>> job >>> requests unknown resource (mem_free) >>> ... >>> >>> Ilya. >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>> value of memory type: SGE reports it as unknown resource >>> From: Feng Zhang <[email protected]> >>> To: Ilya M <[email protected]> >>> Date: 1/23/15, 9:27 AM >>>> >>>> Llya, >>>> >>>> Can you please run: >>>> >>>> qstat -j <jobid> >>>> >>>> and past the output here? It may be useful for checking the problem >>>> >>>> On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <[email protected]> wrote: >>>>> >>>>> Removed the quota limits. To no avail: same problems. >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>>>> value of memory type: SGE reports it as unknown resource >>>>> From: Reuti <[email protected]> >>>>> To: Ilya M <[email protected]> >>>>> Date: 1/23/15, 2:33 AM >>>>>> >>>>>> Can you remove them temporarily? I saw cases where suddenly the >>>>>> "unknown >>>>>> resource" popped up - and also suddenly vanished again, but it was >>>>>> somehow >>>>>> connected to RQS was my conclusion. >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> Am 23.01.2015 um 00:16 schrieb Ilya M <[email protected]>: >>>>>>> >>>>>>> There are two RQS, one is disabled: >>>>>>> >>>>>>> { >>>>>>> name limit_for_interns >>>>>>> description "limit to max 5 GPU jobs per intern." >>>>>>> enabled TRUE >>>>>>> limit users {int1,int2} hosts @gpu to slots=5 >>>>>>> } >>>>>>> { >>>>>>> name limit_slots >>>>>>> description NONE >>>>>>> enabled FALSE >>>>>>> limit hosts {@gpu} to slots=2 >>>>>>> } >>>>>>> >>>>>>> >>>>>>> -------- Original Message -------- >>>>>>> Subject: Re: [gridengine users] Cannot request resource if it is a >>>>>>> load >>>>>>> value of memory type: SGE reports it as unknown resource >>>>>>> From: Reuti <[email protected]> >>>>>>> To: Ilya <[email protected]> >>>>>>> Date: 1/21/15, 16:12 >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Am 22.01.2015 um 00:52 schrieb Ilya: >>>>>>>> >>>>>>>>> Something happened to the SGE (6.2u5) that had been running fine >>>>>>>>> for >>>>>>>>> many months, and users can no longer put resource requests for load >>>>>>>>> values >>>>>>>>> if they are of memory type, e.g. >>>>>>>>> >>>>>>>>> qsub -l mem_free=5G -w v .... produces the following output: >>>>>>>>> >>>>>>>>> cannot run in queue "gpu.q@gpu038" because job requests unknown >>>>>>>>> resource (mem_free) >>>>>>>>> >>>>>>>>> The resource is available, though, when querying for it: >>>>>>>>> qhost -F mem_free -h gpu038 >>>>>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE >>>>>>>>> SWAPTO >>>>>>>>> SWAPUS >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------- >>>>>>>>> global - - - - - - >>>>>>>>> - >>>>>>>>> gpu038 lx24-amd64 16 2.11 126.1G >>>>>>>>> 15.7G >>>>>>>>> 4.0G 0.0 >>>>>>>>> Host Resource(s): hl:mem_free=110.416G >>>>>>>>> >>>>>>>>> >>>>>>>>> This was first reported by a user when he tried to request custom >>>>>>>>> "hl" >>>>>>>>> resource. However, it now appears that all "hl" resources of type >>>>>>>>> "memory" >>>>>>>>> show this behavior. Integer "hl" are OK. >>>>>>>> >>>>>>>> Do you have any RQS in place? >>>>>>>> >>>>>>>> -- Reuti >>>>>>>> >>>>>>>> >>>>>>>>> I bounced qmaster between master and shadow-master a couple of >>>>>>>>> times, >>>>>>>>> but it did not resolve the problem. >>>>>>>>> >>>>>>>>> Additionally, when I added MONITOR=1 to scheduler's configuration, >>>>>>>>> the >>>>>>>>> file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons: >>>>>>>>> :::::::: >>>>>>>>> :::::::: >>>>>>>>> :::::::: >>>>>>>>> >>>>>>>>> Any ideas? >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> [email protected] >>>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>> >>>> >>>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
