Re: [gridengine users] Raising an old share tree bug...

Ben De Luca Thu, 07 Feb 2013 04:35:02 -0800

Hi all,
     I just tried this same test on OGS/GE 2011.11p1 and it works
perfectly.




On Wed, Feb 6, 2013 at 6:28 PM, Ben De Luca <[email protected]> wrote:

> I have a 8.0.0c cluster in production and an 8.0.0e running for testing.
>
> No one has noticed it, though I have seen it before
> Oh, I just managed to hit another bug, on 8.0.0c
>
>
> I am trying to simulate this.
> I  have 2 users,  (names are changed to protect the innocent)
>
> user1
> echo sleep 10000 | qsub -q linux2 -t 1-100 -tc 1
>
> user2
> echo sleep 10000 | qsub -q linux2 -t 1-2000 -tc 1
>
>
>
> qstat -ext -pri -u user1,user2
>
> job-ID  prior   nurg    npprior ntckts   ppri name       user
> project          department state submit/start at     cpu        mem     io
>      tckts ovrts otckt ftckt stckt share queue
>  slots ja-task-ID
>  115460 2.50000 0.00000 0.00000 1.00000     0 STDIN      user2      NA
>           defaultdep r     02/06/2013 18:21:21 0:00:00:00 0.00000 0.00007
> -1780537303     0     0     0 -1780537303 0.59  linux2@     1 1
>  115461 1.91619 0.00000 0.00000 0.70810     0 STDIN      user1     NA
>           defaultdep r     02/06/2013 18:21:21 0:00:00:00 0.00000 0.00007
> 1780458312     0     0     0 1780458312 0.41  linux2@     1 1
>  115460 0.00000 0.00000 0.00000 0.00000     0 STDIN      user2      NA
>           defaultdep qw    02/06/2013 18:21:05
>    0     0     0     0     0 0.00                                     1
> 2-2000:1
>  115461 0.00000 0.00000 0.00000 0.00000     0 STDIN      user1     NA
>           defaultdep qw    02/06/2013 18:21:13
>    0     0     0     0     0 0.00                                     1
> 2-100:1
>
>
> user2, gets more tickets, and have overflowed into the negative.
>
>
>
>
>
> On Wed, Feb 6, 2013 at 2:05 PM, Orlando Richards <
> [email protected]> wrote:
>
>> Hi Ben,
>>
>>
>> On 06/02/13 13:12, Ben De Luca wrote:
>>
>>> Im fairly sure we are affected by this bug too, I am happy to help in
>>> the hunt and I have looked through the code more than once.
>>>
>>>
>> Are you doing anything to work around it at all? At the moment, we're
>> adjusting the shares to accommodate the over accounting - but that is a
>> very blunt tool and skews our allocations massively. We're reluctant to go
>> for purely functional shares, as our service definition is currently fixed
>> on fair share.
>>
>>
>>  Which version of grid are you trying to fix? I havn't been following
>>> grid dev too closely do we still have multiple forks?
>>>
>>>
>> We notice it most on our current 6.2u5 deployment, which we're moving
>> away from to 8.0.0e from Son Of Grid Engine. That's not to say it isn't
>> present in the 8.0.0e - we still have a lot of the troublesome workload on
>> the 6.2u5 cluster, and I'm sure I've seen it happening on the 8.0.0e
>> cluster (though I now don't have any evidence of that).
>>
>>
>> --
>> Orlando
>>
>>
>>
>>>
>>> On Wed, Feb 6, 2013 at 12:07 PM, Mark Dixon <[email protected]
>>> <mailto:[email protected]>**> wrote:
>>>
>>>     On Wed, 6 Feb 2013, Orlando Richards wrote:
>>>     ...
>>>
>>>         I've had a go at digging through the code, but couldn't really
>>>         make head
>>>
>>>         nor tail of it - no doubt in large part due to my not being much
>>>         of a
>>>         coder :( Any pointers to get me bootstrapped would be most
>>> welcome.
>>>
>>>
>>>     General comments about the source...
>>>
>>>     Don't be intimidated. It's a large code base, but spend a little
>>>     time and it'll start to make sense. Pick a little bit of it to focus
>>>     on initially.
>>>
>>>     Gridengine's source code is layered. The source distribution has a
>>>     few HTML files describing them (some of which still need updating
>>>     from the 6.0 days...). Functions near the very top and very bottom
>>>     of the stack are relatively well commented, but the rest can be a
>>>     little hit and miss.
>>>
>>>     Ignoring most of the layers, you've essentially got:
>>>
>>>     At the bottom you've got the wonderful CULL layer: it's very solid
>>>     and provides gridengine with safe complicated data structures. I'd
>>>     like to pat the person who wrote it on the back, although I admit
>>>     I've yet to get my head round the advanced search functionality.
>>>     State data for jobs and so on tend to use it. Use of it can be
>>>     identified by the data types or functions prefixed with "l".
>>>
>>>     While I'm on data structures, there are also "dstrings" - which
>>>     provide safe string handling.
>>>
>>>     In the middle you've got the GDI, which is the set of libraries used
>>>     by the different components to communicate with each other over the
>>>     network.
>>>
>>>     At the top you've got the qmaster, execd, etc., which can be thought
>>>     of as loosely coupled applications that all use the same underlying
>>>     libraries/layers to coordinate.
>>>
>>>     I've spent most of my time in the execd, which is pretty easy but
>>>     messy [a very large number of special cases - not totally unexpected
>>>     with the number of platforms supported over the years, but ripe for
>>>     some refactoring]. I've had a brief play in the qmaster and my first
>>>     impression is that it's more consistent and "solid" than the execd,
>>>     but more complicated.
>>>
>>>
>>>     General tips for debugging gridengine...
>>>
>>>     1) Play with the loglevel setting in "qconf -sconf" and read the
>>>     messages files.
>>>
>>>
>>>     2) Figure out how to stick gridengine into debug mode.
>>>     https://blogs.oracle.com/__**templedf/entry/using___**
>>> debugging_output<https://blogs.oracle.com/__templedf/entry/using___debugging_output>
>>>
>>>     
>>> <https://blogs.oracle.com/**templedf/entry/using_**debugging_output<https://blogs.oracle.com/templedf/entry/using_debugging_output>
>>> >
>>>
>>>     Essentially something like:
>>>        * Setup sge environment (SGE_ROOT, SGE_QMASTER_PORT, etc.)
>>>        * Execute: . $SGE_ROOT/util/dl.sh
>>>        * Execute: dl 1
>>>        * Execute: $SGE_ROOT/bin/lx-amd64/sge___**execd
>>>
>>>
>>>     The program will not daemonise and will print lots of interesting
>>>     stuff. Different 'dl' values will give you different output. I
>>>     generally find that anything greater than 1 is "too much".
>>>
>>>     This technique will work for pretty much any gridengine component.
>>>     Even qsub.
>>>
>>>
>>>     3) Run gridengine under gdb.
>>>
>>>     I don't know if you've had much experience with gdb but, once you've
>>>     got the hang of it, it's very useful in figuring out what some code
>>>     generally does without actually understanding the details. Once
>>>     you've followed your nose to something that doesn't look right, you
>>>     can then spend time figuring things out.
>>>
>>>     I think some of the gridengine forks try to provide builds with
>>>     enough debugging information for this to work, but I tend to build
>>>     my own gridengine so that I can easily recompile after editing the
>>>     source with potential fixes.
>>>
>>>     Make sure you build with the "-no-opt" and "-debug" flags to aimk
>>>     (disables optimisation and enables debugging symbols) and keep the
>>>     source tree kicking around for gdb to read. I run our production
>>>     gridengine with those flags and haven't noticed any serious
>>>     performance problems.
>>>
>>>     Once you have gridengine running under gdb and playing with
>>>     breakpoints and the rest, you can easily examine interesting data
>>>     structures with commands like "p lWriteList(ptr)", "p
>>>     lWriteElem(ptr)" and
>>>     "p sge_dstring_get_string(ptr)" (where ptr is a lList*, lListElem*
>>>     or dstring*, respectively).
>>>
>>>
>>>     ...
>>>
>>>         At the moment, I'm trying to get a reproducible test case
>>>         together to
>>>         allow for useful debugging - basic tests (sleep 60s) don't show
>>> an
>>>         obvious triggering of the issue, so I'm moving onto more
>>> complicated
>>>         tasks. Certainly, the issue does seem to create
>>> orders-of-magnitude
>>>         differences in reported usage. Current offenders include BLAST
>>>         jobs (run
>>>         by our Biology users) - which are fairly memory heavy.
>>>
>>>     ...
>>>
>>>     Being able to reproduce the problem will obviously make things far,
>>>     far easier! If you cannot, you're probably reduced to littering the
>>>     relevant qmaster code with INFO(())/WARNING(())/ERROR(()) statements
>>>     (and checking that loglevel in "qconf -sconf" is set to the
>>>     appropriate value) and seeing what appears in the messages files in
>>>     production.
>>>
>>>     If you're lucky, the problem might be evident in the usage
>>>     information being sent from the execd to the qmaster. Running the
>>>     execd in debug mode with "dl 1" will reveal what CPU/MEM/IO values
>>>     the qmaster is being given to be used in the accounting file and the
>>>     share tree.
>>>
>>>     If you're unlucky, the problem is in how the qmaster aggregates,
>>>     records and decays the share tree values over time.
>>>
>>>     If you're really unlucky, the problem might only occur if the
>>>     various gridengine components are under severe stress.
>>>
>>>     I find that having a non-production installation of gridengine
>>>     kicking around, perhaps in virtual machines, is very handy :)
>>>
>>>     Hope this helps...
>>>
>>>
>>>     Mark
>>>     --
>>>     ------------------------------**__----------------------------**
>>> --__-----
>>>     Mark Dixon                       Email    : [email protected]
>>>     <mailto:[email protected]>
>>>
>>>     HPC/Grid Systems Support         Tel (int): 35429
>>>     Information Systems Services     Tel (ext): +44(0)113 343 5429
>>>     <tel:%2B44%280%29113%20343%**205429>
>>>
>>>     University of Leeds, LS2 9JT, UK
>>>     ------------------------------**__----------------------------**
>>> --__-----
>>>     ______________________________**___________________
>>>     users mailing list
>>>     [email protected] <mailto:[email protected]>
>>>     
>>> https://gridengine.org/__**mailman/listinfo/users<https://gridengine.org/__mailman/listinfo/users>
>>>     
>>> <https://gridengine.org/**mailman/listinfo/users<https://gridengine.org/mailman/listinfo/users>
>>> >
>>>
>>>
>>>
>>
>> --
>>             --
>>    Dr Orlando Richards
>>   Information Services
>> IT Infrastructure Division
>>        Unix Section
>>     Tel: 0131 650 4994
>>
>> The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>>
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Raising an old share tree bug...

Reply via email to