Hi, Am 13.06.2011 um 15:12 schrieb Javier Lopez Cacheiro:
> We have found a strange situation where GE 6.2u5 has allocated more resources > in a node than available, leaving a consumable with a value lower than 0 (in > this case the consumable is num_proc). > > This is somehow similar to an issue that was found some time ago in SGE 6.2 > (issue 2091) but in that case it was related to mpi jobs with fillup > allocation rule, and it was already solved in 6.2u3. > > Now this is somehow different because it is not affecting mpi jobs but a > non-mpi job and it is occurring only in certain circumstances that are still > not clear. > > In this case the situation was that at 06:13:57 the node had already 7 jobs > running, consuming 24 units of num_proc. Num_proc it is configured as a > consumable with a value of 24. So at that time the value of num_proc was 0. > But 4 seconds later, at 06:14:01, a new job was started in the node that > requested 24 num_proc, leaving the node with a value of -24 for num_proc. num_proc is (fixed) feature of a node and shouldn't be made consumable. Is there any reason why you don't use slots? Nevertheless: do you request anything else with the -l option? -- Reuti > I don't know if anyone else has come over this same problem with 6.2u5 and if > there is a workaround for it. > > [jlopez@svgd ~]$ qhost -q -j -h c5-11 > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > SWAPUS > ------------------------------------------------------------------------------- > global - - - - - - - > compute-5-11 x86_64 -24 47.92 31.5G 9.0G 8.0G 0.0 > GRID_large BP 0/4/24 > 6667492 1.92242 STDIN compchem015 r 06/10/2011 06:13:30 MASTER > 6667493 1.92241 STDIN compchem015 r 06/10/2011 06:13:41 MASTER > 6667494 1.92241 STDIN compchem015 r 06/10/2011 06:13:47 MASTER > 6667495 1.92241 STDIN compchem015 r 06/10/2011 06:13:57 MASTER > GRID_small BP 0/0/24 > small BPC 0/10/24 > 6652641 11.27961 p1761-7 csebdmfa r 06/10/2011 06:14:01 MASTER > 6655259 10.43999 p577-16 csebdmfa r 06/10/2011 06:12:26 MASTER > 6667942 3.93900 AuLJ139 csmyslfs r 06/10/2011 06:12:46 MASTER > SLAVE > SLAVE > SLAVE > SLAVE > SLAVE > SLAVE > SLAVE > SLAVE > g0-mem_small BPC 0/0/24 > offline BP 0/0/24 > > > Thanks in advance, > Javier > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
